There are a number of scenarios when it’s useful to be able to classify protein targets – high level summaries, enrichment calculations and so on. There are a variety of protein classification schemes out there such as PANTHER, SCOP and InterPro. These schemes are based on domains and other structural features. ChEMBL provides it’s own hierarchical classification. Since I use this from time to time, it’s useful to pull all the classifications for a given species, at one go via the SQL below (tested with v17):
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT td.pref_name, description, accession, pfc . * FROM target_dictionary td, target_components tc, component_sequences cs, component_class cc, protein_family_classification pfc WHERE td.tax_id = 9606 AND td.tid = tc.tid AND tc.component_id = cs.component_id AND cc.component_id = cs.component_id AND pfc.protein_class_id = cc.protein_class_id; |