What is this about? |
The page in the frame below links to PubChem assay tables which have
been preprocessed for use with two popular data analysis systems. The
files contain both the complete assay results and the associated structures. |
How are these files generated? |
This is a pretty simple sample application of the CACTVS
Toolkit. If you have the toolkit installed, you do not really need
this page. You can directly script a single-line command like
table write [table create $aid] aid$aid.table knime {structures 1}
to fetch any PubChem assay, automatically augment it with structures, and write it out as native KNIME table.
The CACTVS Toolkit has extensive PubChem and Entrez interface capabilities via the NCBI PUG and Eutils gateways. Assay retrieval is just a small example of the things you can do. You can even open the full PubChem compound database as a virtual SD-style multi-record file and perform optimized structure queries and downloads on it by simple script commands.
|
Why should I use these table files, and not directly download the CSV data from PubChem? |
Two answers: You get the structures bundled, and the columns and their
proper data types all nicely encapsulated in a native binary file. For
example, in KNIME you can now use its native table reader node to import
the file without worrying about complex I/O set-up. |
The CACTVS files are notably larger. Why? |
They contain significant additional infomation which cannot be represented
in KNIME tables. The two most important extra components are the complete
assay description (as multi-field compound table property T_NCBI_ASSAY_DESCRIPTION ),
and the normalized PubChem compounds in addition to the deposited substances
(as structure object property E_PUBCHEM_COMPOUND of the structures
associated with the table). |
How do you write binary KNIME tables? |
The CACTVS toolkit supports I/O of a lot of native table file formats,
including those of many well-known statistical packages, both for for
input and output. You can read native KNIME output tables for processing
in CACTVS, too. Some of these formats are reasonably well documented,
others were (like KNIME) more or less reverse engineered. The CACTVS KNIME
table I/O module does not use any original KNIME source code. |
I need an assay not listed below! |
Enter the desired assay ID below, and we queue it. If you give us an
email, the software notifes you when it's done. Currently we do not guarantee
reponse times - if it is really urgent, download the toolkit and do it
yourself. A supplied notification email ist not permanently stored or
used for any other purpose. Come back later, reload the frame listing
below, and start your download. |
|
AID:
Your email (optional):
|
Click to display 20K AID sets starting with AID: 1
10001
20001
30001
40001
50001
60001
70001
80001
90001
100001
110001
120001
130001
140001
150001
160001
170001
180001
190001
200001
210001
220001
230001
240001
250001
260001
270001
280001
290001
300001
310001
320001
330001
340001
350001
360001
370001
380001
390001
400001
410001
420001
430001
440001
450001
460001
470001
480001
490001
500001
510001
520001
530001
540001
550001
560001
570001
580001
590001
600001
610001
620001
630001
640001
650001
660001
670001
680001
690001
700001
710001
720001
730001
740001
750001
760001
770001
780001
790001
800001
810001
820001
830001
840001
850001
|
|