PubChem Assays preprocessed for KNIME and CACTVS

Complete assay data with structures

  • What is this about?

    The page in the frame below links to PubChem assay tables which have been preprocessed for use with two popular chemical information processing systems. The data files contain both the complete assay results and the associated structures.

  • How are these files generated?

    This is a pretty simple sample application of the CACTVS Toolkit. If you have the toolkit installed, you do not really need this page. You can directly script a single-line command like
    table write [table create $aid] aid$aid.table knime {structures 1}
    to fetch any PubChem assay, automatically augment it with structures, and write it out as native KNIME table. The CACTVS Toolkit has extensive PubChem and Entrez interface capabilities via the NCBI PUG and Eutils gateways. Assay retrieval is just a small example of the things you can do. You can even open the full PubChem compound database as a virtual SD-style multi-record file and perform optimized structure queries and downloads on it by simple script commands.

  • Why should I use these table files, and not directly download the CSV data from PubChem?

    Two answers: You get the structures bundled, and the columns and their proper data types all nicely encapsulated in a native binary file. For example, in KNIME you can now use its native table reader node to import the file without worrying about complex I/O set-up.

  • The CACTVS files are notably larger. Why?

    They contain significant additional infomation which cannot be represented in KNIME tables. The two most important extra components are the complete assay description (as multi-field compound table property T_NCBI_ASSAY_DESCRIPTION), and the normalized PubChem compounds in addition to the deposited substances (as structure object property E_PUBCHEM_COMPOUND of the structures associated with the table).

  • How do you write binary KNIME tables?

    The CACTVS toolkit supports I/O of a lot of native table file formats, including those of many well-known statistical packages, both for for input and output. You can read native KNIME output tables for processing in CACTVS, too. Some of these formats are reasonably well documented, others were (like KNIME) more or less reverse engineered. The CACTVS KNIME table I/O module does not use any original KNIME source code.

Click to show 10K AID sets starting with AID: 1 10001 20001 30001 40001 50001 60001 70001 80001 90001 100001 110001 120001 130001 140001 150001 160001 170001 180001 190001 200001 210001 220001 230001 240001 250001 260001 270001 280001 290001 300001 310001 320001 330001 340001 350001 360001 370001 380001 390001 400001 410001 420001 430001 440001 450001 460001 470001 480001 490001 500001 510001 520001 530001 540001 550001 560001 570001 580001 590001 600001 610001 620001 630001 640001 650001 660001 670001 680001 690001 700001 710001 720001 730001 740001 750001 760001 770001 780001 790001 800001 810001 820001 830001 840001 850001 860001 870001 880001 900001 910001 920001 930001 940001 950001 960001 970001 980001 990001 1000001 1010001 1020001 1030001 1040001 1050001 1060001 1070001 1080001 1090001 1100001 1110001 1120001 1130001 1140001 1150001 1160001 1170001 1180001 1190001 1200001 1210001 1220001 1230001 1240001 1250001 1260001