Genomatix-Logo
Overview of Help-Pages

GeneRanker: Characterization of Gene Sets


[Introduction] [Parameters] [Output]

Introduction

GeneRanker is a program allowing characterization of large sets of genes by making use of annotation data from various sources, like Gene Ontology or Genomatix proprietary annotation. Overrepresentation of different biological terms within the input are calculated and listed in the output together with the respective p-value.

The algorithm behind GeneRanker is based on the paper

Gabriel F. Berriz et. al. (2003)
Characterizing gene sets with FuncAssociate
Bioinformatics 19, 2502-2504 (PubMed: 14668247).

Parameters

Parameters
Upload gene set

The gene upload option allows keywords from various namespaces. Supported are

  • Entrez Gene IDs (e.g. 30818) and/or Ensembl Gene IDs (e.g. ENSG00000115041)
  • Gene Symbols/Names (e.g. KCNIP3). microRNA identifiers like hsa-mir-181a will also be recognised.
  • Transcript Accession Numbers (e.g. NM_001034914, ENST00000360990 or AK315437)
  • Affymetrix Probe Set IDs (e.g. 231774_at)
Using the file upload field, you can provide expression values for the input genes. They will be used in the pathway view following the links of the "Signal transduction pathways (canoncical)" annotation type in the analysis result.

Expected format for input in the text area:
The keywords must be seperated by commas or whitespaces. Keywords containing commas or whitespaces must be put in double quotes.

Expected format of the uploaded file:
The file has to be in text format, Excel files are not supported.
The first column must contain the keywords. The optional subsequent columns (tab-delimited) are used for the expression values. These are expected in standard decimal format (e.g.: 1.0). You can provide headings for the columns using the first line as headline and mark it with "//" at the beginning.

Example file:
//label1  label2        label3          label4
90634   -0.13666667     -0.25666666     -0.280000001
5371    1.04384613      1.229230762     0.777692258
23657   0.059999999     0.039999999     0.159999996
.
.
.

Use example gene set

"Inflammation in H.sapiens"

The example data set is from a microarray analysis of Systemic Inflammation in Humans (Calvano et al (2005) Nature 437,1032-7; PMID: 16136080).

Gene expression changes relative to t=0 are displayed at 5 timepoints (2,4,6,9 and 24 hours) after inoculation with bacterial endotoxin.

Organism Please select from which organism the input genes are. Only organisms with genes having annotations at least from one of the available annotation types are listed here. The default organism is Homo sapiens.
Orthologous Mapping If the input genes entered originate from a vertebrate organism other than Homo sapiens, you can try to map them via orthology to their corresponding genes in Homo sapiens using this option. The ranking result will then be based on the Homo sapiens genes. For a detailed description of the mapping see here.
Annotation types

Here you can select which annotation data sets shall be used for the analysis. The following annotation types are available:

  • Pathway Based Networks (Public Sources):
    Gene associations with over 750 canonical pathways from the following sources (retrieved via pathwaycommons):
    All pathway based networks are derived from Homo sapiens. Therefore "Pathway Based Networks (Public Sources)" can only be selected if "Homo sapiens" has been chosen as organism or the mapping from the input genes on the orthologous human genes has been activated.
  • Signal Transduction Networks (Genomatix Literature Mining):
    Signal Transduction Network Associations are obtained by Genomatix with a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to network associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for network annotations within large gene sets. For more background on our literature data mining see LitInspector.
  • Molecular Functions (GO):
    The ontology 'molecular function' from the Gene Ontology Consortium
  • Cellular Components (GO):
    The ontology 'cellular component' from the Gene Ontology Consortium
  • Biological Processes (GO):
    The ontology 'biological process' from the Gene Ontology Consortium
  • Diseases (Genomatix Literature Mining):
    Genomatix has assigned genes to diseases with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to disease associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for disease annotations within large gene sets. For more background on our literature data mining see LitInspector. Disease names and synonyms are based on UMLS (Unified Medical Language System).
  • Diseases (MeSH):
    Genomatix has assigned genes to diseases with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts and their corresponding MeSH (Medical Subject Headings). For more background on our literature data mining see LitInspector.
  • Tissues (Genomatix Literature Mining):
    Genomatix has assigned genes to tissues with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to tissue associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for tissue annotations within large gene sets. For more background on our literature data mining see LitInspector. Tissue names and synonyms are based on UMLS (Unified Medical Language System).
  • Tissues (UniGene):
    Genomatix has assigned UniGene tissue names to a hierarchical tissue ontology. Thus the GeneRanker concept can be applied to Unigene expression data, and groups of genes with significant coexpression profiles can be identified.
  • Co-cited genes (Genomatix Literature Mining):
    Genomatix identified gene to gene associations with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to gene associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for the identification of possible key genes within large gene sets. New genes which were not contained within the input list of genes are marked with an asterisk "*". For more background on our literature data mining see LitInspector.
  • Co-cited Transcription Factors (TFs) (Genomatix Literature Mining):
    Genomatix identified gene to transcription factor associations with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to TF associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for the identification of possible key TFs within large gene sets. New transcription factor genes which were not contained within the input list of genes are marked with an asterisk "*". For more background on our literature data mining see LitInspector.
  • Pharmacological Substances (Genomatix Literature Mining):
    Gene associations with pharmacological substances based on Genomatix literature data mining algorithm. Gene to pharmacological substance associations found on sentence level in the scientific literature (i.e. PubMed abstracts) were filtered for significance to avoid random matches. The significant associations were used for pharmacological substance annotations within large gene sets. For more background on Genomatix literature data mining see LitInspector. Pharmacological substance names and synonyms are based on UMLS (Unified Medical Language System).
  • Clinical Diseases (ClinVar):
    Gene associations with clinical diseases obtained from ClinVar
    Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014 Jan 1;42(1):D980-5. doi: 10.1093/nar/gkt1113. PubMed PMID: 24234437.
p-value

From this drop down box you can select a threshold for the p-value.

Here is a short description of the p-value concept: Let q be the number of genes in the input set; Let m be the number of genes from the input set having annotation A assigned; Then the p-value is the probability (using Fisher's Exact Test) of finding at least m genes in a input list of length q having annotation A (under the assumption that belonging to the input list is independent of having this annotation).

These parameters are hidden by default. Clicking on will reveal them.
Adjusted p-value

From this drop down box you can select the threshold for the adjusted p-value.

GeneRanker estimates an adjusted p-value from the results of 1,000 simulated null hypothesis queries. From these simulations we directly estimate the probability of obtaining at least one false positive for any desired threshold in the hypothesis-wise p-value. This means, the adjusted p-value is the fraction (as a %) of the 1,000 null hypothesis simulations having annotations with the calculated p-value or smaller.
However, the computation of the adjusted p-value may take some time, depending on how large your input gene list is and how many annotation terms the selected annotation type contains. Therefore the computation of the adjusted p-value is deactivated per default. If you need an adjusted p-value for your analysis then just tick the check box on the left side of this parameter.

For a detailed description of the adjusted p-value please refer to the paper mentioned in the introduction.

These parameters are hidden by default. Clicking on will reveal them.
Upload user-defined gene universe

Here you can provide your own gene universe which, in some cases, might be more appropriate than the default gene universe (all genes from the organism of interest having annotation), e. g. when analysing a gene list that originates from a DNA microarray experiment.
You may upload gene keywords from various namespaces. Supported are

  • Entrez Gene IDs (e.g. 30818) and/or Ensembl Gene IDs (e.g. ENSG00000115041)
  • gene symbols/names (e.g. KCNIP3) (microRNA identifiers like hsa-mir-181a will also be recognised)
  • transcript accession numbers (e.g. NM_001034914, ENST00000360990 or AK315437)
  • Affymetrix probe set IDs (e.g. 231774_at)

Expected format of the uploaded file:
The keywords must be seperated by commas or whitespaces. Keywords containing commas or whitespaces must be put in double quotes.

These parameters are hidden by default. Clicking on will reveal them.
Output
Result name (optional) You can enter a name for your result.
Your email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!

Output

At the top of the result page a summary of the used parameters is shown:

Below the list of parameters there is a tab for each chosen annotation type. Each tab contains a table with the ranked annotation. Note, that annotation terms with only one gene (total) assigned won't be shown. The table contains the following information for each annotation:

By clicking on the column headers of the table you can change the sort order within the table and sort by different columns. You are also able to filter the table rows. Just klick on the magnifier icon at the lower left-hand corner of the table and a filter tool will pop up. Here, a filter is defined by adding a filter condition and filling the three input fields (the column to be searched on, the comparison operator that should be applied and the value the rows to be compared to). The availability of operators depends on the type of data contained in the column. Textual columns have operators like contains and numeric columns operators like greater or less. The sets for the conditions can be combined as intersection (all conditions with AND logic) or as union (any condition with OR logic). In order to get to the initial view where all result rows are shown again, the filter can be reset at any time from the filter tool. Next to the magnifier icon there are two buttons for downloading the complete table content in Excel™ or TSV format. If the table contains many rows then only the first 10 will be shown. In order to see the remaining rows you have to use the table paging feature below the table.

2st