Genomatix-Logo
Overview of Help-Pages
GePS Logo

Genomatix Pathway System (GePS)


Please note our Genomatix Pathway System Tutorial (PDF). There are also some introductory video tutorials and application examples using GePS in the tutorial section.

Contents


Introduction

The Genomatix Pathway System (GePS) uses information extracted from public and proprietary databases to create and extend networks based on literature data and genes from canonical pathways.

Genes from more than 500 human pathways can be displayed as networks. Genes were taken from the NCI-Nature Pathway Interaction Database, Biocarta and various other sources (please see Data Sources below for details) which are supplemented with proprietary database content from NetPro and Genomatix in-house curated annotation like:

GePS also allows to create networks from an arbitrary input gene list where connections are based on literature i.e. co-citations. This gene list can be filtered by GeneRanker results, literature mining results and expression data. The resulting gene sets can be combined to new gene sets and serve as filters.

Furthermore networks can be created from scratch without an input gene list. Genes, complexes and interactions can be simply created by clicking and dragging with the mouse.

GePS also allows:

Genomatix Pathway System requires Adobe Flash Player 10.1 or higher to be installed on your computer.


GePS access

GePS can be accessed via one of six entry points.

Characterization of gene sets: Input a gene list, optionally with expression values

This choice will bring up a GeneRanker interface, where a list of genes can be entered via an input box or a file upload option. In the latter case expression values can be included. GeneRanker is a program allowing characterization of large sets of genes by making use of annotation data from various sources, like Gene Ontology or Genomatix proprietary annotation. GePS will be started with the GeneRanker result.

The algorithm behind GeneRanker is based on the paper

Gabriel F. Berriz et. al. (2003)
Characterizing gene sets with FuncAssociate
Bioinformatics 19, 2502-2504 (PubMed: 14668247).
Parameters
Upload gene set

The gene upload option allows keywords from various namespaces. Supported are

  • Entrez Gene IDs (e.g. 30818) and/or Ensembl Gene IDs (e.g. ENSG00000115041)
  • Gene symbols/names (e.g. KCNIP3) (microRNA identifiers like hsa-mir-181a will also be recognised)
  • Transcript accession numbers (e.g. NM_001034914, ENST00000360990 or AK315437)
  • Affymetrix probe set IDs (e.g. 231774_at)
Using the file upload field, you can provide expression values for the input genes. They will be used in the pathway view following the links of the "Signal transduction pathways (canoncical)" annotation type in the analysis result.

Expected format for input in the text area:
The keywords must be seperated by commas or whitespaces. Keywords containing commas or whitespaces must be put in double quotes.

Expected format of the uploaded file:
The file has to be in text format, Excel files are not supported.
The first column must contain the keywords. The optional subsequent columns (tab-delimited) are used for the expression values. These are expected in standard decimal format (e.g.: 1.0). You can provide headings for the columns using the first line as headline and mark it with "//" at the beginning.

Example file:
//label1  label2        label3          label4
90634   -0.13666667     -0.25666666     -0.280000001
5371    1.04384613      1.229230762     0.777692258
23657   0.059999999     0.039999999     0.159999996
.
.
.

Use example gene set

"Inflammation in H.sapiens"

The example data set is from a microarray analysis of Systemic Inflammation in Humans (Calvano et al (2005) Nature 437,1032-7; PMID: 16136080).

Gene expression changes relative to t=0 are displayed at 5 timepoints (2,4,6,9 and 24 hours) after inoculation with bacterial endotoxin.

Organism Please select from which organism the input genes are. Only organisms with genes having annotations at least from one of the available annotation types are listed here. The default organism is Homo sapiens.
Orthologous Mapping If the input genes entered originate from a vertebrate organism other than Homo sapiens, you can try to map them via orthology to their corresponding genes in Homo sapiens using this option. The ranking result will then be based on the Homo sapiens genes. For a detailed description of the mapping see here.
Annotation types

Here you can select which annotation data sets shall be used for the analysis. The following annotation types are available:

  • Pathway Based Networks (Public Sources):
    Gene associations with over 750 canonical pathways from the following sources (retrieved via Pathway Commons):
    All pathway based networks are derived from Homo sapiens. Therefore "Pathway Based Networks (Public Sources)" can only be selected if "Homo sapiens" has been chosen as organism or the mapping from the input genes on the orthologous human genes has been activated.
  • Signal Transduction Networks (Genomatix Literature Mining):
    Signal Transduction Network Associations are obtained by Genomatix with a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to network associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for network annotations within large gene sets. For more background on our literature data mining see LitInspector.
  • Molecular Functions (GO):
    The ontology 'molecular function' from the Gene Ontology Consortium
  • Cellular Components (GO):
    The ontology 'cellular component' from the Gene Ontology Consortium
  • Biological Processes (GO):
    The ontology 'biological process' from the Gene Ontology Consortium
  • Diseases (Genomatix Literature Mining):
    Genomatix has assigned genes to diseases with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to disease associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for disease annotations within large gene sets. For more background on our literature data mining see LitInspector. Disease names and synonyms are based on MeSH terms and the NCI thesaurus.
  • Diseases (MeSH):
    Genomatix has assigned genes to diseases with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts and their corresponding MeSH (Medical Subject Headings). For more background on our literature data mining see LitInspector.
  • Tissues (Genomatix Literature Mining):
    Genomatix has assigned genes to tissues with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to tissue associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for tissue annotations within large gene sets. For more background on our literature data mining see LitInspector. Tissue names and synonyms are based on MeSH terms and the NCI thesaurus.
  • Tissues (UniGene):
    Genomatix has assigned UniGene tissue names to a hierarchical tissue ontology. Thus the GeneRanker concept can be applied to Unigene expression data, and groups of genes with significant coexpression profiles can be identified.
  • Co-cited genes (Genomatix Literature Mining):
    Genomatix identified gene to gene associations with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to gene associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for the identification of possible key genes within large gene sets. New genes which were not contained within the input list of genes are marked with an asterisk "*". For more background on our literature data mining see LitInspector.
  • Co-cited Transcription Factors (TFs) (Genomatix Literature Mining):
    Genomatix identified gene to transcription factor associations with the help of a proprietary literature data mining algorithm based on all available PubMed abstracts. Individual gene to TF associations found on sentence level in the scientific literature were filtered for significance to avoid random matches. The significant associations were used for the identification of possible key TFs within large gene sets. New transcription factor genes which were not contained within the input list of genes are marked with an asterisk "*". For more background on our literature data mining see LitInspector.
  • Pharmacological Substances (Genomatix Literature Mining):
    Gene associations with pharmacological substances based on Genomatix literature data mining algorithm. Gene to pharmacological substance associations found on sentence level in the scientific literature (i.e. PubMed abstracts) were filtered for significance to avoid random matches. The significant associations were used for pharmacological substance annotations within large gene sets. For more background on Genomatix literature data mining see LitInspector. Pharmacological substance names and synonyms are based on UMLS (Unified Medical Language System).
  • Clinical Diseases (ClinVar):
    Gene associations with clinical diseases obtained from ClinVar
    Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014 Jan 1;42(1):D980-5. doi: 10.1093/nar/gkt1113. PubMed PMID: 24234437.
p-value

From this drop down box you can select a threshold for the p-value.

Here is a short description of the p-value concept: Let q be the number of genes in the input set; Let m be the number of genes from the input set having annotation A assigned; Then the p-value is the probability (using Fisher's Exact Test) of finding at least m genes in a input list of length q having annotation A (under the assumption that belonging to the input list is independent of having this annotation).

These parameters are hidden by default. Clicking on will reveal them.
Adjusted p-value

From this drop down box you can select the threshold for the adjusted p-value.

GeneRanker estimates an adjusted p-value from the results of 1,000 simulated null hypothesis queries. From these simulations we directly estimate the probability of obtaining at least one false positive for any desired threshold in the hypothesis-wise p-value. However, the computation of the adjusted p-value may take some time, depending on how large your input gene list is and how many annotation terms the selected annotation type contains. Therefore the computation of the adjusted p-value is deactivated per default. If you need an adjusted p-value for your analysis then just tick the check box on the left side of this parameter.

For a detailed description of the adjusted p-value please refer to the paper mentioned in the introduction.

These parameters are hidden by default. Clicking on will reveal them.
Upload user-defined gene universe

Here you can provide your own gene universe which, in some cases, might be more appropriate than the default gene universe (all genes from the organism of interest having annotation), e. g. when analysing a gene list that originates from a DNA microarray experiment.
You may upload gene keywords from various namespaces. Supported are

  • Entrez Gene IDs (e.g. 30818) and/or Ensembl Gene IDs (e.g. ENSG00000115041)
  • gene symbols/names (e.g. KCNIP3) (microRNA identifiers like hsa-mir-181a will also be recognised)
  • transcript accession numbers (e.g. NM_001034914, ENST00000360990 or AK315437)
  • Affymetrix probe set IDs (e.g. 231774_at)

Expected format of the uploaded file:
The keywords must be seperated by commas or whitespaces. Keywords containing commas or whitespaces must be put in double quotes.

These parameters are hidden by default. Clicking on will reveal them.
Output
Result name (optional) You can enter a name for your result.
Your email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!

If the user has uploaded expression values, the corresponding genes in the network are colored. By default, positive values (e.g. "over-expression" or "up-regulated") are displayed in red, negative values ("under-expression", "down-regulated") are blue, and zero is orange. Genes with missing expression values (annotated as NA) are colored yellow. If no expression values were uploaded, the genes from the input list are also colored yellow by default. All four colors representing expression values or input genes can be changed under color settings.

Display co-cited genes for one input gene

Here, the user can enter a Gene id or symbol/name, assisted by a dynamic drop-down list. After clicking on “Submit Query”, GePS is opened displaying the gene of interest and the 25 most frequently co-cited genes on sentence level. The evidence level as well as the number of added gene can be defined in GePS.

Display co-cited genes for one input term

Here, the user can enter a term e.g. a small molecule or disease name, assisted by a dynamic drop-down list. After clicking on “Submit Query”, GePS is opened displaying the term of interest and the 25 most frequently co-cited genes on sentence level. The evidence level as well as the number of added gene can be defined in GePS.

Generate pathway based networks for one input gene

Here, the user can enter a Gene id or symbol/name, assisted by a dynamic drop-down list. If a gene is entered from an organism which is not Homo sapiens, it is mapped via orthology to its corresponding gene in Homo sapiens, if possible, using the Comparative Genomics data from ElDorado.

If the gene is part of more than one pathway, all pathways are shown in a second list. After selecting the pathway of interest and clicking on “Submit Query”, GePS is opened generating the network for the selected pathway. The gene of interest is colored yellow by default.

Browse human pathway based networks

GePS is opened displaying a list of all pathway based networks ordered by name and source. A search field assists in finding the pathway based network of interest. After selecting, the corresponding network will be generated.

Build networks from scratch

Here, the user only needs to select an organism. Then GePS is opened displaying a blank canvas, in which the user can create his or her own networks from scratch by adding genes and interactions manually or extending the created networks by most frequently co-cited genes.


Display components

Overview

The display can be dividied in the following four components:
  1. GePS sidebar

    The GePS sidebar can contain gene set enrichment results, further subsets from your input genes (depending on how GePS has been started and the selected organism) and imported pathways/networks. From this sidebar you can load networks or create networks from your input genes. Furthermore you can filter the currently displayed network with subsets of genes from your input gene list.

    See below for a more detailed description of the individual functions of the GePS sidebar.

  2. Graphic display

    In this area the networks are displayed. The view of the graphic display can be adjusted with the GePS navigation bar (top). The current pathway or network can be modified directly on the display or via the network navigation bar (bottom). A new network can be loaded or created from the sidebar.

  3. GePS navigation bar (top)

    This navigation bar contains general functions like “zooming” or “undo” and “redo” the last actions on the graphic display.

  4. Network navigation bar (bottom)

    In this navigation bar you can modify the current displayed network and choose different settings for modifiying your network. There are different functions like re-generating the currently displayed network, extending by co-cited genes or choosing a new network layout.


GePS sidebar

The GePS sidebar contains imported pathways and gene lists, which can be used to load or create networks.

Gene lists

If you uploaded a gene list, then the "Filters" section is displayed. It includes your gene set enrichment result from GeneRanker and your input gene list under the last subsection “More gene lists”. The input gene list is selected by default when GePS is started. If expression data is provided, the subsection “More gene lists” contains gene lists with genes which are over- or underexpressed in the respective data points. Futhermore, you can create gene lists from your input genes based on literature-mining. Each gene list can be used as filter or to create networks. A new network is created by clicking on the respective gene list.

Gene lists can be combined by the boolean operators “and” or “or”. This way you can create new gene lists by intersection and union of selected gene lists. Each selected gene list will be combined with the chosen boolean operator. The resulting gene list as well as the selected gene lists can be found under . Clicking on the “Generate network” button generates a new network out of the resulting gene list. The resulting gene list serves also as filter, so that in the graphic display only genes are highlighted which are in the gene list.

The input gene list is selected by default after GePS has started. So all input genes can pass the filter and are in the resulting gene list. Clicking on the “Generate network” button after start-up, GePS generates a network out of the genes from the whole input list. If the number of input genes exceeds the “genes in generated network” parameter, then an algorithm selects genes automatically based on citations and connectivity.

GeneRanker result (gene set enrichment result)

The GeneRanker results are listed separately for each selected annotation type. For each annotation the p-value, the number of observed genes and the number of total genes of the annotation is noted. Clicking on the information button gives further information. Selecting a result will display a network containing all genes in the gene list.

Clicking on the hierarchy button (available for: GO-terms, diseases(MeSH), tissues(unigene)) shows the term in a hierarchy view:

The tree will initially expand down to the term the hierarchy was loaded for, which is indicated by a bold font. The intensity of the green background corresponds to the significance of the term:
  • p-value = 0 -> full intensity
  • p-value = threshold (0.01) -> no intensity

Networks can be generated by double clicking on the desired term.

Expression gene lists

If expression data is provided, over- and underexpressed genes for each data point are listed under “More gene lists”. A threshold can be set under for each data point. The threshold classifies the genes as over- or underexpressed depending on their expression value.

The threshold for over- and underexpression can be set for the average and each datapoint filter. It can be also set for all expression filter (average and single datapoints) by enabling the “Apply to all filters” checkbox in the expression filter settings panel.

In this example a gene is classified as overexpressed (for single datapoint and average filter), if it has an expression value higher than 1. A gene is classified as underexpressed (for single datapoint and average filter), if it has an expression value lower than -1.

Individual gene lists

Here you can add your own gene lists. Clicking on the button and selecting “Add an individual gene list/filter” opens the gene list panel. You can define the name of the gene list and the containing genes from your input gene list. These individual gene lists will be listed under “More gene lists”.

Free text gene lists (based on literature-mining)

You can add gene lists based on literature-mining results. Clicking on the button and selecting “Add a free text gene list/filter” opens the free text filter panel. You can enter a term e.g. “apoptosis”. Each publication listed in PubMed is scanned for the full term and for the occurence of your input genes. If the term occurs together with an input gene, the input gene is added to the new filter. You can also set a limit of how often a gene has to occur in different publications before it is added to the gene list. Furthermore you can insert more complex queries by combining your search terms with “and” or “or” (e.g. “apoptosis and inflammataion”). You can also use wildcards like “?” and “*”. “?” will match a single character and “*” will match multiple characters. “*” can only be used in the middle or at the end of a term. E.g. “immun*” will match “immune” and “immunology”, but “immun?” but will only match “immune”. Note that the time for scanning the PubMed results strongly depends on your query text and the number of input genes. The search can take several minutes for general queries like “cancer”.

More functions

You can sort the display of gene lists and networks by name and by p-value (if the gene list is from a GeneRanker result). You can also limit the displayed pathways and gene lists with a search term in the input field at the bottom of the sidebar. You can toggle between the search for an annotation name or a gene symbol via the toggle button. Furthermore all selected gene lists can be unselected with a single click on . Clicking on the checkboxes behind the annotation types or “More gene lists” deselects all gene lists in the respective category.

Graphic display

Imported pathways and literature-based networks are displayed as networks of nodes and edges.

Node information

The nodes of a network can have different colors and shapes as well as small extensions next to them. These are listed below.

Gene By default a gene product is drawn as rounded rectangle, it is filled grey by default if it has an assigned NCBI EntrezGene ID. Protein functions (if known) are indicated by different shapes:
 
kinase node kinase
phosphatase node phosphatase
receptor node receptor
transporter node transporter
co-factor node co-factor
epigenetic factor node epigenetic factor
  A RNA function (if known) is indicated by a different shape:
 
co-factor node RNA
 

If the gene is present in the selected gene list, the body of the box is colored by default:

present node Yellow for present or NA as expression value
If expression data is provided, the body of the box is colored by default:
upregulated npde Red for up-regulated
down-regulated node Blue for down-regulated
present node Orange for non-regulated

The color transitions mirrors the value. E.g. the more red a gene is colored, the more up-regulated it is. The four colors for input genes can be changed under color settings.

 
014 A gene product may have numerous small molecules and drugs associated with it, the number is shown in the white star (please note that small molecule and drug data are received from different sources, thus same associations may be counted several times).
 
014 A gene may have numerous interactions within a network, the number is shown in the lower right corner (including connections currently not depicted).
 
014 Nodes representing a so-called gene family will be labeled with "GF" in the upper right corner. Gene families are generic terms for multiple genes, e.g. "metallothionein" for genes like MT1A, MT1B, MT1E, etc. This concept was introduced by Genomatix in order to include co-citations with imprecise notations in the literature/abstracts.
 
16 If a gene product has a known DNA-binding specificity, the graphical representation of one known matrix is displayed. In many cases multiple matrices are available, to see all matrices click on 'Transcription factor binding site descriptions' in the gene information box tab 'Transcription Factor Facts'.
If a gene/protein/RNA in a network cannot be associated with a GeneID, its form will be filled with white. Other network participants are:
20 A Biological Process, Disease or a Pathway can be affected by the current pathway.
22 DNA or RNA
24 A small molecule can be a component of a pathway.
26 A complex of two or more genes or small molecules.

Edge Information

28 2 genes are associated by co-citation.
28 2 genes are associated by experimental validation.
30 2 genes are associated by expert-curation.
32 Gene A activates Gene B.
34 Gene A inhibits Gene B.
36 Gene A regulates Gene B.
36 Gene A is a transcription factor and Gene B has a corresponding predicted or experimentally validated binding site.
38 Gene A alters the state of Gene B.
40 If gene A is transcription factor and gene B has a corresponding experimental validated binding site in one of its promoters the arrow is filled black.
40 If gene A has a known TF binding site matrix and gene B has a corresponding predicted binding site in one of its promoters the arrow is filled grey. For interactions that involve a complex, this arrow type is never used. To look for promoter bindings in this case, double-click on the edge and select the interaction of interest.
42 There is no promoter binding noted
42 Interaction added by the user

Gene information

Double-clicking on a network node opens up a box with information about the gene, identified by gene symbol and Entrez GeneID.

The information in the different categories is collected from various Genomatix and third-party databases and includes links to these:

Term information

Double-clicking on a network term (e.g. small molecule or disease) opens up a box with a description, synonyms and an external link to more information.

Interaction information

Double-clicking on an interaction line opens up a box with information about the interaction between the two connected nodes. This includes

Adding elements and interactions

Elements can be added to the current network by double-clicking an empty area in the graphic display. A panel is opened in which the symbol or name of a gene, small molecule and disease can be inserted. This means that, while you type a name within the field, a drop-down list appears. You then can select your element of interest from that list by left-clicking the item and the item will be added to the upper list. Clicking on the right button gives you the choice to select an element type and add it directly to the upper list. You can also specify the location of the elements in the list. Unless otherwise stated the location of the elements is automatically determined based on the subcellular locations annotated in the DBSubLoc or Uniprot database.

Genes can also be added via list of genes in plain text. The genes have to be separated by spaces, commas or tabs.

It is recommended to select elements from the drop-down list instead of adding them directly by selecting the element type or plain text. If the name can not be assigned to a gene or term id, then the gene or term is added as a node with white background and without associated gene or term id. Between elements without associated id no evidence-supported interactions can be generated.

Elements like genes or small molecules can be dragged in a complex while holding the shift key.

An interaction can be added by clicking on the source node and then dragging to the target node ('esc' quits this process). A box opens in which the edge layout can be determined. Unless otherwise stated the edge layout is automatically determined by expert curated interactions.

Measurement slider

If multiple data points were uploaded together with the list of genes, you can use this slider to progress through them. Dragging the slider between the data points will generate the column header as a tooltip.

Network overview

This panel gives an overview of the currently loaded network.

GePS navigation bar (top)

This navigation bar contains general functions:

48 Toggle sidebar: Shows or hides the sidebar with networks and gene lists.
48 Toggle overview: Shows or hides the network overview.
48 Toggle 'progress through measurement data': Shows or hides the measurement slider.
48 Save, load or delete a GePS session: Opens a panel to save the current state of your GePS session for a GeneRanker result or load a saved session. All information on the loaded network, the settings and gene lists/filters will be saved. Please note, that your imported data from local (e.g. metadata) will not be saved.
48

Color settings: Opens a panel to define the colors for the different elements and interaction types.

If you uploaded a gene list with expression values, you can determine the color for over-, under- and non-expressed genes. If no expression data was provided, only the color for input genes can be determined.

The color can be chosen for also each of the three interaction types (canonical, generated and user) as well as the color for the filled arrow head.

Additionally, the button "Save as user default" allows saving a user-defined combination of colors across GePS sessions.

50 Trash: Removes the selected elements and interactions.
52 Undo/Redo: The last step of changing the network is undone or redone. Your settings and selected filters stay the same.
54 Fit network to window: Fits the network to the available screen size.
56 Zoom: Zooming is possible via this slider, alternatively via the mouse wheel. If Ctrl is pressed while the mouse wheel is moved, the display will zoom into the selected area.
56 Find element in network: In this combo box all genes of the current network are listed. Selecting such a gene will result in location of the gene in the graphic display.
56 Export/Import: The currently displayed network and the genes in the current filter can be exported in a number of formats. Additionaly a network in GePS format can be re-imported. For further information see Export/Import.

Export/Import

Metadata import

This import option allows you to import metadata for genes from a tab-separated file. You can provide for each gene multiple entries/rows consisting of an identifier, a tooltip text and optional a data series. GePS displays up to six rows on the side of the genes as small circles, which are colored according to their associated value. Additional entries are displayed in the tooltip of the last circle. A slider can be used to progress through the data series. Please note that the total number of metadata entries is limited to 200,000. Further entries will be skipped during the import process.

Metadata of the first datapoint for the gene A1BG as provided in the lower example file.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file.
Metadata of the second datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series.
Metadata of the second datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series of the addional rows.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series of the addional rows.

Expected format of the uploaded file:
The file has to be in text format with tab-separated columns. Excel files are not supported.
The first column contains the gene ID. The second column contains an ID for the data. The third colummn contains the text of the tooltip, which is diplayed hovering the mouse over the circles. The optional subsequent columns are used for the data values.

Example file:

1	Gx1	Tooltip for Genomatix ID 1	1.0	-1.0
1	Gx2	Tooltip for Genomatix ID 2	2.0	-2.0
1	Gx3	Tooltip for Genomatix ID 3	3.0	-3.0
1	Gx4	Tooltip for Genomatix ID 4	4.0	-4.0
1	Gx5	Tooltip for Genomatix ID 5	5.0	-5.0
1	Gx6	Tooltip for Genomatix ID 6	6.0	-6.0
1	Gx7	Tooltip for Genomatix ID 7	7.0	-7.0

KEGG pathway import

Pathways and networks can only be imported in GePS format, which is based on the graphML format. However, a KEGG pathway can be converted with the KEGGtranslator from the Cognitive Systems chair of the University Tübingen into the graphML format. After the KEGG pathway has been downloaded and converted into the graphML format with the KEGGtranslator, the file extension has to be changed manually from graphML to GePS. Then the pathway can be imported into GePS.

Please note that GePS preserves the positions of the imported pathway elements and so the elements might be overlapping. Therefore a new layout might be necessary.


Network navigation bar (bottom)

This navigation bar mainly contains tools to modify the currently loaded network with different settings. Additionally, the navigation bar contains options to open external tools with the genes from the network.

Settings

The settings box contains options for evidence filtering, network generation and extension as well as for the removal of filtered genes.

The network and extension algorithms choose the interactions based on the number of evidences in all evidence levels above the selected minimum evidence level. The default minimum evidence level is the sentence level. If you change it e.g. to validated regulatory level (and uncheck the 'exclude validated regulatory evidences' checkbox), then next time the interactions are chosen only based on the validated regulatory level and the expert level.

You can choose between five different evidence levels as your minimum evidence level.

The sentence and function word level count a sentence as evidence. However, the abstract and expert level count an abstract as evidence. The validated regulatory level counts an interaction in a cell type as evidence.

The evidence filter specifies the minimum number of evidences for generated interactions. If you set the evidence filter to 10 and the evidence level to sentence level, then the network generation algorithm only adds interactions to the network with at least 10 evidences in the sentence-level or better (function word, validated regulatory and expert level).

The exclude validated regulatory evidences checkbox allows you to decide whether or not the validated regulatory evidences will be used when the evidence filter gets applied. Per default the validated regulatory evidences won't be used.

The literature date filter specifies the date range of the published literature which is used as interaction evidence. Please be aware that the network generation and extension will significantly take longer with this filter.

There are two types of network generation algorithms which can be chosen:

In the section Network generation algorithms is a more detailed explanation of the algorithms.

The additional interactions per gene parameter defines the maximum number of interactions that are drawn for an added gene. The interaction with most co-citations will be visible, while all other interactions will be hidden. A single click on an gene displays all added interactions that were selected for the network generation or extension. A single click on such an interaction fixes the interaction in the network so that it stays visible even if the focus is changed. Note that the network generation algorithm can draw more visible interactions, if it is nesessary to fullfill its goals like connectivity or shortest paths.

The genes in generated network parameter defines the maximal number of genes which are included in a generated network. The genes are chosen based on the number of citations and if they can be connected to the other network genes. A more detailed description can be found under Restriction of network genes.

The hide extension settings option determines if the additional extension settings box opens after pressing the extension button.

The layout after removal option in the “Removal of filtered genes” section defines if the layout algorithm should be applied automatically after the removal of the filtered genes. The generate network after removal defines if the network should be re-generated.

Network generation

This function re-generates a new network from the genes of the currently displayed network (independent from the selected filters). The parameters of the network generation algorithm (network type, evidence level, number of co-citations and number of interactions) are applied from "Settings". More information about network generation can be found unter Network generation and extension.

Network extension

A network can be extended with either genes, transcription factors, transcriptional targets, microRNAs or terms like small molecules and diseases. More information about the extension algorihtms can be found unter Network extension algorithms.

There are three options which can be modified for each extension algorithm (unless ‘Hide extension settings’ is selected in ‘Settings’). The maximal number of genes added to the network, Use all genes from the input list and the connectivity for the added genes can be defined. The option Use all genes from the input list adds for one selected gene all co-cited input genes to the network. The connectivity for added genes requires an added gene to be connected to that many genes in the current network. A fourth option is available for the extension by genes and transcription factors. The fourth option allows to choose between the extension from genes or terms. The extension from genes and terms at the same time is not possible.

You can also extend only a subset of genes or terms by selecting them before the extension step. If no gene is selected, all genes or terms in the network are used for extending. The number of distinct selected genes or terms is displayed.

Connecting genes and terms with shortest paths

This options connects the selected genes and terms via shortest paths of co-cited genes. The algorithm considers all genes as opposed to the simple network generation, which only considers network genes. More details can be found under Connecting selected genes with shortest paths. Please note that at most ten genes and terms can be connected with shortest path.

Network processing

Removing filtered genes: Pressing this button removes all filtered genes (genes painted grey) from the network. Depending on your settings, the network is re-generated and/or layouted.

Removing elements without interactions: This option removes all elements without an interaction.

Adding interactions between selected genes: This option adds all interactions between the selected genes to the network.

Layout

There are three types of layout to choose from:

hierarchical layout Hierarchical layout: The hierarchical layout highlights the main direction or information flow of the network.
centric layout Centric layout: The centric layout emphasizes on highly connected proteins.
cellular layout Cell layout: The cell layout structures the network in accordance to the cellular locations of the proteins.

You can also change the layout of the network manually by moving nodes around. Select any graph element by clicking on it with the left mouse button. Once an element is selected you can drag it around. You can select multiple items by holding down the ctrl key (cmd key on MacOS X), or by dragging over an area while pressing the left mouse button. Holding the the left mouse button pressed over an empty area in the graphic display while moving the mouse shifts the network.

The last button allows the user to toggle between the input species and the target species if the option “Use orthologous genes in human for the analysis instead of the input genes.” was activated in the initial GeneRanker interface. By default, the gene symbols of the target species are shown. Toggling shows all orthologous gene symbols if available. Additionally, the gene information box shows informations to the orthologous gene. Note that all interactions remain the same.

External programs / Genomatix tasks

This option opens an external program from Genomatix. There are two options to open a program with a set of genes. The genes that are in the currently displayed network or the genes that could pass the filter.

The Gene-TF Analysis uses the parameters ‘evidence level’ and ‘minimum number of evidences’ configured in GePS. More information on the Gene-TF Analysis can be found here.

The option GePS can be used to start GePS with a set of genes. GePS is started with the same parameters as the current GePS session. If you provided expression values for your input genes, all non-input genes in the new set are assigned NA as expression value. Please note, if you provided transcript accessions or Affymetrix Probe Set IDs, GePS will be started with the corresponding genes and average expression values.

If a gene list has been uploaded and thus a GeneRanker result is available, you can examine the GeneRanker result in detail by clicking on the button in the right corner.


Network generation and extension

Networks can be generated in three ways. A network can be generated directly from a gene list by clicking on a filter/gene list in the sidebar. You can also select several gene lists, clicking on the ‘Generate network’ button will then generate a network of the combined gene lists (‘and‘/‘or‘). Furthermore the interactions of a displayed network can be re-generated by clicking on the button in the network navigation bar e.g. with a different evidence level or filter. A displayed network can be also extended by genes, transcription factors, transcriptional targets, microRNAs or terms like small molecules and diseases.

Gene lists can contain a very large number of genes and many of these genes can be co-cited. To maintain reasonable network views, there is an upper limit to the number of genes and interactions displayed.

Restriction of network genes

Displaying all genes from a large gene list can result in an unreadable network. Therefore the maximal number of genes for a generated network is defined under settings. If you generate a network from a gene list which has more genes, the restriction algorithm chooses the most cited genes, builds the network and replaces all network genes, that could not be connected, with the next most cited genes. This procedure is repeated until no more network genes are unconnected.

If you want to ensure that certain genes are contained in the generated network, you can define mandatory genes in a dialog box. The dialog box opens, if you want to generate a network from a gene list which exceeds the maximal number of genes.

Network generation algorithms

Literature-based networks in GePS contain very large numbers of interactions between genes. Displaying all these interactions in the network view would render it unreadable. Therefore a strategy is needed to reduce the number of displayed interactions without losing relevant information. To achieve this, GePS uses the simple network or the shortest path algorithm to calculate the optimal set of interactions for a network.

The simple network algorithm initially creates a network by starting with a plain list of genes. Then it iterates three times over all interactions in descending order by their number of co-citations that pass the co-citation filter. In the first iteration the algorithms adds an interaction for two genes if both do not have any interaction yet. In the second iteration the algorithm adds an interaction for two genes if they are not connected by a path. This step avoids unconnected subnetworks. In the last iteration the algorithm adds invisible interactions if the parameter additional interactions per gene is not yet fullfilled.

The shortest path algorithm initially creates a network with all genes and all interactions that could pass the co-citation filter. Then the algorithm searches for shortest paths from the selected genes to all other genes and removes all interactions that are not in those paths.
If no genes have been selected by the user, the algorithm selects a gene from each connected component (subnetwork in which any two genes are connected to each other by paths) with the highest number of interactions.

In GePS the weight of an interaction between two genes is determined by the number of co-citations supporting the connection - the more evidence the shorter the connection. However, as opposed to the road map example, it makes a difference wether a relation is direct or indirect in biological networks. As the number of ‘hops’ between two genes is not taken into account by the algorithm we needed to find a way to make use of this information to make sure that direct relations between two genes are always preferred over indirect connections.

Network extension algorithms

A network can be extended with co-cited genes by the following algorithms:

Connecting selected genes and terms with shortest paths

The algorithm computes all shortest paths between the selected genes and terms. Then it iterates over the shortest paths in descending order by their weight. If the shortest path connects two unconnected selected genes or terms, then the shortest path will be added to the network.

The weight of an interaction is defined as described for the shortest path algorithm. Shortest paths are preferentially computed with input genes or more preferentially with genes passing the current filter.


Data sources

Pathway data were collected from Pathway Commons. Here is a list of projects that we incorporated:

The basal pathway and network data are supplemented by information collected from

The current versions of the used data sources can be found on the GePS start page.

The visual notation in the Genomatix Pathway System adheres, where possible, to the standardized graphical notation put forward by the Systems Biology Graphical Notation project.


Data content

The current version of GePS contains the following number of entries:

Category # of entries
Pathways 752
Pathway Interaction Database 220
BioCarta 249
Reactome 122
NetPath 27
PANTHER 134
LitInspector (all interactions on abstract level) 11,492,323
Genomatix (expert curated interactions) 185,887
NetPro™ protein-protein interactions 67,115
Protein-small molecules interactions 7,835,688
Gene IDs with subcellular location 129,903