Genomatix-Logo
Overview of Help-Pages
LitInspector logo

LitInspector Quality


[LitInspector Help] [LitInspector Background] [LitInspector Quality] [LitInspector Numbers]


Examples of the accuracy of LitInspector compared to other automatic text mining software


Resolving of homonyms

Many gene synonyms are ambiguous, i.e. one synonym is used for different genes. A typical example is MIZ-1 which is used for two different genes:

MIZ-1
Myc-interacting zinc finger-1
(GeneID: 7709)
MIZ-1
Msx-interacting-zinc finger-1
(GeneID: 9063)


text mining software number matches right wrong number matches right wrong unresolved homonyms correctly resolved homonyms (percent)
LitInspector 53 44 2 36 27 2 7 87%
IHOP 36 14 0 31 12 6 35 39%
EBIMed 11 10 1 17 9 8 0 67%
PubGene 45 (9) (1) 34 (2) (8) (0) (55% **)

(**) PubGene provides only 10 example papers for each search, therefore, an evaluation of the complete results is not possible. This evaluation was performed using the 20 (2 * 10) abstracts available.




Disambiguation of gene synonyms that are simultaneously used in a "non-gene" context

The symbol CPAP is a gene synonym for "centrosomal P4.1-associated protein". However, CPAP is also used as abbreviation for "continuous positive airway pressure" in medical context papers that focus on therapy of patients with sleep apnea. LitInspector filters out the wrong context matches.


CPAP
centrosomal P4.1-associated protein
CENPJ, centromere protein J (GeneID: 55835)
text mining software number matches right wrong correctly annotated (percent)
LitInspector 19 19 0 100%
IHOP 80 14 66 17,5%
EBIMed 1880 ~10 ~1870 ~0.5%
PubGene 71 (3) (7) (30% **)

(**) PubGene provides only 10 example papers for each search, therefore, an evaluation of the complete results is not possible. This evaluation was performed using the 10 abstracts available.



"CPAP" example output from LitInspector:

LitInspector CPAP example



"CPAP" example output from IHOP:

IHOP CPAP example



"CPAP" example output from EBImed:

EBImed CPAP example



"CPAP" example output from PubGene:
LIP1 is another synonym for human CPAP. However, in this PubGene example an Arabidopsis gene (light insensitive period 1) is wrongly tagged as the human CPAP gene:

EBImed CPAP example




An improper synonym can cause a high rate of erroneous annotations for a single gene

The symbol "ACR" is an alias for the human gene acrosin (GeneID: 49). However, ACR is used in the literature only exceptional as synonym for acrosin but very frequently as abbreviation for "American College of Rheumatology" in medical context papers which caused most of the wrong taggings in this example. In order to solve this problem the improper assignment for ACR will be manually corrected in the LitInspector synonym database for the next update.


acrosin
(GeneID: 49)
Gene synonym that caused the wrong annotations: "ACR"
text mining software number matches right wrong correctly annotated (percent)
LitInspector 1094 555 539 51%
IHOP 330 96 234 29%
EBIMed 16 16 0 100%
PubGene 729 (8) (2) (80% **)

(**) PubGene provides only 10 example papers for each search, therefore, an evaluation of the complete results is not possible. This evaluation was performed using the 10 abstracts available.