LitInspector Background
Frisch M, Klocke B, Haltmeier M, Frech K (2009)
LitInspector: literature and signal transduction pathway mining in PubMed abstracts
Nucleic Acids Res.
[PUBMED: 19417065]
[http://nar.oxfordjournals.org/cgi/content/full/gkp303]
The gene recognition in LitInspector is based on the comprehensive gene synonym lists provided by NCBI's Entrez Gene. These synonym lists are complemented by Genomatix' own synonym databases which were assembled over the last years, containing additional synonyms as well as deprecated synonyms which were realized to result in predominantly wrong taggings.
- Example: the synonym "CO2" for "complement component 2" (gene ID: 717) was deprecated because it is mainly used with the meaning "carbon dioxide" in the literature.
Many gene synonyms are ambiguous, i.e. one synonym is used for multiple genes or even in a complete different, "non-gene" context. For instance, the synonym "MBP" is mentioned in about 6800 PubMed abstracts. MBP is a homonym, it is used for three different genes:
- myelin basic protein (gene ID: 4155) in about 3000 papers,
- major basic protein (gene ID: 5553) in about 300 papers,
- mannose binding protein (gene ID: 4153) in about 100 papers.
In addition, "MBP" is used in the scientific literature in a "non-gene" context as an abbreviation for:
- mean blood pressure (in about 500 papers),
- monobutyl phthalate (about 40 papers),
- megabase pairs (about 10 papers).
Even human experts may have difficulties in resolving some homonyms and ambiguities.
Therefore, a main challenge of automatic gene data mining is the disambiguation especially of short gene synonyms. LitInspector uses a combination of automatic disambiguation modules, context databases manually curated by Genomatix and half-automatically generated and manually edited filtering lists. Disambiguation of gene homonyms makes use of the occurrence of further gene synonyms in the same abstract as well as automatically and manually generated context lists.
Although best effort is undertaken with LitInspector to resolve ambiguities, it is unavoidable that automatic data mining programs will show a certain error rate. But the advantage of LitInspector over solely graphical or schedular representations is that the scientist retains full control over the software processed data as the sentences containing the identified and highlighted synonyms are directly verifiable. In many cases a human expert will recognize wrongly assigned synonyms solely by scanning the sentence or abstract context. If you discover erroneously annotated synonyms we would appreciate your feedback at litinspector@genomatix.de, especially if this synonym causes several errors in a larger number of abstracts like the example "CO2" above. This would help us to improve the next LitInspector release.
LitInspector makes use of the organism information annotated by the MeSH consortium provided within the MeSH terms. However, for the most recent abstracts the MeSH annotation is not yet completed and in other publications an organism information is generally not available. For some publications, it is hard to identify an organism information even if the
complete paper is scanned. To make sure that no publications are skipped because there is no organism annotated, LitInspector uses only soft criteria for the organism assignment. In case of the mammalian gene tagging LitInspector uses all abstracts and excludes only those for which a "non-mammalian" organism (e.g. Caenorhabditis, Xenopus, or plants) is annotated in the MeSH terms. Example: for a recent paper the MeSH organism information is not yet annotated. If LitInspector identifies a synonym in the abstract, e.g. "WT1", this synonym will be annotated for all mammalian organisms for which a gene synonym "WT1" is known, Homo sapiens (gene ID 7490), Mus musculus (22431) and Rattus norvegicus (24883). Consistent with that, even if a mammalian organism like Homo sapiens is annotated in the MeSH terms, the abstract is also tagged for all other mammalian organisms like Mus musculus and Rattus norvegicus. Only in papers with a "non-mammalian" MeSH organism annotation like "Xenopus" WT1 is not annotated for the mammalian species. In case of Caenorhabditis only certain journals that contain a Caenorhabditis annotation in the MeSH terms are assigned to this organism.
The tissue and disease tagging is based on Genomatix' proprietary synonym catalogs. The tissue catalog contains over 3,000 entries, the disease catalog over 11,000 entries.
The LitInspector signal transduction pathway mining is based on Genomatix' proprietary and manually curated database of pathway components (Example: WNT) and keywords (Examples: "signal transduction" or "signaling cascade"). Currently, the database comprises nearly 400 signaling pathways and 75 pathway keywords. To most of the signaling pathways canonical pathways from BioCarta, STKE, or KEGG are assigned and hyperlinked. Please note that the graphics provide an overview, they may not necessarily contain the query genes. For pathway mining the PubMed database is scanned for co-occurence of the user input gene and the Genomatix pathway components and keywords at sentence level.
The output table is sorted by the number of references found, since a higher number of references is assumed to provide higher evidence. In addition, the user has full access to the references to verify the software predicted data by clicking the link to NCBI's PubMed.
An identified association of the query gene to a pathway can have several possible meanings:
- the query gene may be part of the signaling pathway,
- it may regulate the pathway,
- it may be regulated by the pathway,
- it may regulate a different pathway which in turn cross talks with the mentioned pathway,
- it is also possible that the query gene was experimentally found to be NOT associated with that pathway.
The advantage of automatic pathway mining compared to manually curated databases and static pathway associations is that the results are always up to date. This advantage is bought by a certain error rate which is inherent to all automatic text mining systems. The pathway mining, moreover, does not indicate a direction of the gene-pathway associations. The LitInspector pathway mining provides an actual overview of possible pathway associations and potential interactions of the query gene. It also provides the literature references which allows direct verification by the scientist.
Example: Signal Transduction Pathway associations and potential interactions of WT1 (Wilms tumor 1).
The result table is sorted by the number of references for a pathway, in case of WT1 the most references were found for WNT (Wingless type) signaling (7 references).
...
For verification the scientist has access to all references by clicking the hyperlinked numbers.
In case of the WT1 example:
7 references were found for WNT signaling.
- Example: PubMed ID 15540161: "It has been shown that the Wnt signaling pathway interacts with Wilms' tumor gene 1 (WT1) in normal kidney development and plays a role in the genesis of some Wilms' tumors."
6 references for Beta catenin signaling.
- Example: PubMed ID 14666652: "The results suggest that WT1 inhibits the transformed phenotype of breast cancer cells and down-regulates the beta-catenin/TCF signaling pathway through destabilization of beta-catenin."
3 references for ABL signaling.
- Example: PubMed ID 17728783: "Our data demonstrate that WT1 expression is induced by oncogenic signalling from BCR/ABL1 and that WT1 contributes to resistance against apoptosis induced by imatinib."
3 references for PKA signaling.
- Example: PubMed ID 17869219: "WT1 functions as a transcriptional regulator and its activity is controlled through phosphorylation by protein kinase A (PKA)."
3 references for TP53 signaling.
- Example: PubMed ID 16403772: "Ras pathway related genes, p53, WT1 and PCNA, were preferentially modulated in Ha-ras-mutated tumors." Please note: this is an example for an erroneously assigned relationship between WT1 and p53!
2 references for BCL2 signaling.
- Example: PubMed ID 14994125: "Bcl-2 was expressed in rhabdomyoblasts, but not in blastemal cells undergoing apoptosis, suggesting that WT1 regulates Bcl-2 positively in the epithelial pathway, but negatively in the myogenic pathway."
| © 1998-2011 Genomatix Software GmbH - All rights
reserved |