Explanation of query methodology

The KB-Rank tool provides a means to efficiently and effectively identify protein structural chains and annotation categories of interest via text query. The protein structural chains and annotation categories that are retrieved are listed according to their estimated relevance to the queried text. The goal is to present to the user the most germane structures and annotation categories for a queried topic. Informational searches are enabled to learn more about a particular function or disease. Navigational searches provide a means to identify specific protein structural chains that can be used to address a research question, such as which structures to use in a structure-based drug design project.  A query is performed in two stages. At the first stage an initial pool of structures and annotations are retrieved by a text match to annotations associated with the structures. The text fields that are available for the match include those associated with the primary citations of the structures as provided in PubMed. Also included are the Gene Ontology term assignments associated with the structures, the names of constituent InterPro domains, and generic names of drugs associated with the structures as provided in DrugBank. At the second stage, the structures are ranked according on their content of the prevalent annotations found for the structures retrieved. Both text and nontext annotations that are found to be prevalent across the structures of the initial pool are used for the ranking.

For example queries and ways to proceed when using the tool, please refer to the example queries page.


If you utilize KB-Rank for your work, please cite the following reference.
Julfayev ES, McLaughlin RJ, Tao YP, McLaughlin WA. KB-Rank: efficient protein structure and functional annotation identification via text query. J Struct Funct Genomics. 2012. DOI: 10.1007/s10969-012-9125-7.


The following gives a partial list of the data sources on which the tool relies.

PubMed : Text is extracted from the primary citations of the structures that includes the abstract. The text fields include the title, author list, abstract, medical subject headings or MeSH terms, and the substance list

BioCyc : genome and metabolic pathway database

ChEBI : chemical entities of biological interest small molecule entity database

ChEMBL : database of bioactive compounds

DrugBank : detailed drug and target information database

EC2PDB : enzymes with known structures in the Protein Data Bank

GO : Gene Ontology project

Reactome : A Curated Pathway Database

SCOP : Structural Classification of Proteins

SMPDB : Small Molecule Pathway Database

MGI: phenotype descriptions from Mouse Genome Informatics site

RGD: phenotypes, diseases, and pathways from the Rat Genome Database

SIFTs: Mapping resource between protein sequences and structures

InterPro: Sequence domain and motifs and associated GO terms