Genomatix-Logo
Overview of Help-Pages
GePS Logo

Genomatix Pathway System (GePS)


Contents


Introduction

The Genomatix Pathway System (GePS) uses information extracted from public and proprietary databases to display canonical pathways or to create and extend networks based on literature data.

More than 400 human pathways can be displayed based on data from the NCI-Nature Pathway Interaction Database, Biocarta and various other sources (please see Data Sources below for details) which are supplemented with proprietary database content from NetPro and Genomatix in-house curated annotation like:

GePS also allows to create networks from an arbitary input gene list where connections are based on literature i.e. co-citations. This gene list can be filtered by GeneRanker results, literature mining results and expression data. The resulting gene sets can be combined to new gene sets and serve as filters.

Furthermore networks can be created from scratch without an input gene list. Genes, complexes and interactions can be simply created by clicking and dragging with the mouse.

Please note that the term canonical pathways is often shortend to pathways. In contrast, the term network is a more general description that covers canonical pathways and literature-based networks.

GePS also allows:

Genomatix Pathway System requires Adobe Flash Player 10.1 or higher to be installed on your computer.


GePS access

GePS can be accessed via one of five entry points:

  1. Characterization of gene sets: Input a gene list, optionally with expression values

    This choice will bring up a GeneRanker interface, where a list of genes can be entered via an input box or a file upload option. In the latter case expression values can be included. For additional information on the data upload and formats, please refer to the GeneRanker help.

    In GePS all pathways are derived from Homo sapiens. Therefore “Signal Transduction Pathway (canonical)” can only be selected if “Homo sapiens” has been chosen as organism or the mapping from the input genes on the orthologous human genes has been activated.

    If the user has uploaded expression values, the corresponding genes in the network are colored. As default, positive values (e.g. "over-expression" or "up-regulated") are displayed in red, negative values ("under-expression", "down-regulated") are blue, and zero is orange. Genes with missing expression values (annotated as NA) are colored yellow. If no expression values were uploaded, the genes from the input list are also colored yellow by default. All four colors representing expression values or input genes can be changed under color settings.

  2. Display co-cited genes for one input gene

    Here, the user can enter a Gene id or symbol/name, assisted by a dynamic drop-down list. After clicking on “Submit Query”, GePS is opened displaying the gene of interest and the 25 most frequently co-cited genes on sentence level. The co-citation level as well as the number of added gene can be defined in GePS.

  3. Display canonical pathways for one input gene

    Here, the user can enter a Gene id or symbol/name, assisted by a dynamic drop-down list. If a gene is entered from an organism which is not Homo sapiens, it is mapped via orthology to its corresponding gene in Homo sapiens, if possible, using the Comparative Genomics data from ElDorado.

    If the gene is part of more than one pathway, all pathways are shown in a second list. After selecting the pathway of interest and clicking on “Submit Query”, GePS is opened displaying the selected pathway. The gene of interest is colored yellow by default.

  4. Browse human pathways

    GePS is opened displaying a list of all pathways ordered by pathway name and source. A search field assists in finding the pathway of interest. After selecting the pathway of interest, it is displayed.

  5. Build networks from scratch

    Here, the user only needs to select an organism. Then GePS is opened displaying a blank canvas, in which the user can create his or her own networks from scratch by adding genes and interactions manually or extending the created networks by most frequently co-cited genes.


Display components

Overview

The display can be dividied in the following four components:
  1. GePS sidebar

    The GePS sidebar can contain canonical pathways, gene set enrichment results and further subsets from your input genes (depending on how GePS has been started and the selected organism). From this sidebar you can load pathways or create networks from your input genes. Furthermore you can filter the currently displayed network with subsets of genes from your input gene list.

    See below for a more detailed description of the individual functions of the GePS sidebar.

  2. Graphic display

    In this area the pathways and networks are displayed. The view of the graphic display can be adjusted with the GePS navigation bar (top). The current pathway or network can be modified directly on the display or via the network navigation bar (bottom). A new network can be loaded or created from the sidebar.

  3. GePS navigation bar (top)

    This navigation bar contains general functions like “zooming” or “undo” and “redo” the last actions on the graphic display.

  4. Network navigation bar (bottom)

    In this navigation bar you can modify the current displayed network and choose different settings for modifiying your network. There are different functions like re-generating the currently displayed network, extending by co-cited genes or choosing a new network layout.


GePS sidebar

The GePS sidebar contains pathways and gene lists, which can be used to load or create networks.

Pathways and gene lists

If you chose “Homo sapiens” as organism or activate the mapping from the input genes on the orthologous human genes, then the GePS sidebar contains Signal Transduction Pathways (canonical), which can be loaded in the graphic display. There are two ways of loading a pathway:

If you uploaded a gene list, then the "Filters" section is displayed additionaly. It includes your gene set enrichment result from GeneRanker and your input gene list under the last subsection “More gene lists”. The input gene list is selected by default when GePS is started. If expression data is provided, the subsection “More gene lists” contains gene lists with genes which are over- or underexpressed in the respective data points. Futhermore, you can create gene lists from your input genes based on literature-mining. Each gene list can be used as filter or to create networks. A new network is created by clicking on the respective gene list.

Gene lists can be combined by the boolean operators “and” or “or”. This way you can create new gene lists by intersection and union of selected gene lists. Each selected gene list will be combined with the chosen boolean operator. The resulting gene list as well as the selected gene lists can be found under . Clicking on the “Generate network” button generates a new network out of the resulting gene list. The resulting gene list serves also as filter, so that in the graphic display only genes are highlighted which are in the gene list.

The input gene list is selected by default after GePS has started. So all input genes can pass the filter and are in the resulting gene list. Clicking on the “Generate network” button after start-up, GePS generates a network out of the genes from the whole input list. If the number of input genes exceeds the “genes in generated network” parameter, then an algorithm selects genes automatically based on citations and connectivity.

GeneRanker result (gene set enrichment result)

The GeneRanker results are listed separately for each selected annotation type. For each annotation the p-value, the number of observed genes and the number of total genes of the annotation is noted. Clicking on the information button gives further information. Selecting a result will display a network containing all genes in the gene list.

Expression gene lists

If expression data is provided, over- and underexpressed genes for each data point are listed under “More gene lists”. A threshold can be set under for each data point. The threshold classifies the genes as over- or underexpressed depending on their expression value.

The threshold for over- and underexpression can be set for the average and each datapoint filter. It can be also set for all expression filter (average and single datapoints) by enabling the “Apply to all filters” checkbox in the expression filter settings panel.

In this example a gene is classified as overexpressed (for single datapoint and average filter), if it has an expression value higher than 1. A gene is classified as underexpressed (for single datapoint and average filter), if it has an expression value lower than -1.

Literature-mining gene lists (determined by free text searches)

You can add gene lists based on literature-mining results. Clicking on the button opens the free text filter box. You can enter a text e.g. “apoptosis”. Each publication listed in PubMed is scanned for this text and for the occurence of your input genes. If the text occurs together with an input gene, the input gene is added to the new filter. You can also set a limit of how often a gene has to occur in different publications before it is added to the gene list. Furthermore you can insert more complex queries by combining your search terms with “and” or “or” (e.g. “apoptosis and inflammataion”). Note that the time for scanning the PubMed results strongly depends on your query text and the number of input genes. The search can take several minutes for general terms like “cancer”.

More functions

You can sort the display of gene lists and pathways by name and by p-value (if the gene list is from a GeneRanker result). You can also limit the displayed pathways and gene lists with a search term in the input field at the bottom of the sidebar. Futhermore all selected gene lists can be unselected with a single click on . Clicking on the checkboxes behind the annotation types or “More gene lists” deselects all gene lists in the respective category.


Graphic display

Canonical pathways and literature-based networks are displayed as networks of nodes and edges.

Node information

The nodes of a network can have different colors and shapes as well as small extensions next to them. These are listed below.

Gene By default a gene product is drawn as rounded rectangle, it is filled gray by default if it has an assigned NCBI EntrezGene ID. A gray border indicates that it belongs to the currently loaded canonical pathway. Protein functions (if known) are indicated by different shapes:
 
kinase node kinase
phosphatase node phosphatase
receptor node receptor
transporter node transporter
co-factor node co-factor
  A RNA function (if known) is indicated by a different shape:
 
co-factor node RNA
 

If the gene is present in the input list, the body of the box is colored by default:

present node Yellow for present or NA as expression value
If expression data is provided, the body of the box is colored by default:
upregulated npde Red for up-regulated
down-regulated node Blue for down-regulated
present node Orange for non-regulated

The color transitions mirrors the value. E.g. the more red a gene is colored, the more up-regulated it is. The four colors for input genes can be changed under color settings.

 
014 A gene product may have a number of chemicals associated with it, the number of sources for a chemical association is shown in the white star. (Please note that this reflects the number of sources not the number of distinct chemicals).
 
014 A gene may have numerous interactions within a network, the number is shown in the lower right corner (including connections currently not depicted).
 
16 If a gene product has a known DNA-binding specificity, the graphical representation of one known matrix is displayed. In many cases multiple matrices are available, to see all matrices click on 'Transcription factor binding site descriptions' in the gene information box tab 'Transcription Factor Facts'.
If a gene/protein/RNA in a network cannot be associated with a GeneID, its form will be filled with white. Other network participants are:
20 A Biological Process or a Pathway can be affected by the current pathway.
22 DNA or RNA
24 A small molecule can be a component of a pathway.
26 A complex of two or more genes or small molecules.

Edge Information

28 2 genes are associated by co-citation.
30 2 genes are associated by expert-curation.
32 Gene A activates Gene B.
34 Gene A inhibits Gene B.
36 Gene A modulates Gene B.
38 Gene A alters the state of Gene B.
40 If gene A has a known TF binding site matrix and gene B has a corresponding binding site in one of its promoters the arrow is filled black. For interactions that involve a complex, this arrow type is never used. To look for promoter bindings in this case, double-click on the edge and select the interaction of interest.
42 There is no promoter binding noted
42 Interaction added by the user

Gene information

Double-clicking on a network node opens up a box with information about the gene, identified by gene symbol and Entrez GeneID.

The information in the different categories is collected from various Genomatix and third-party databases and includes links to these:

Interaction information

Double-clicking on an interaction line opens up a box with information about the interaction between the two connected nodes. This includes

Adding elements and interactions

A single element can be added to the current network by double-clicking an empty area in the graphic display. A box is opened in which the type of the node like gene or complex can be set. You can also specify the location of the node. Unless otherwise stated the location of the node is automatically determined based on the subcellular locations annotated in the Uniprot database.

Elements like genes or small molecules can be dragged in a complex while holding the shift key.

An interaction can be added by clicking on the source node and then dragging to the target node ('esc' quits this process). A box opens in which the edge layout can be determined. Unless otherwise stated the edge layout is automatically determined by expert curated interactions.

Measurement slider

If multiple data points were uploaded together with the list of genes, you can use this slider to progress through them.

Network overview

This panel gives an overview of the currently loaded network.


GePS navigation bar (top)

This navigation bar contains general functions:

48 Shows or hides the sidebar.
48 Shows or hides the overview.
48 Shows or hides the measurement slider.
48

Opens a box to define the colors for the different elements and interaction types. If you uploaded a gene list with expression values, you can determine the color for over-, under- and non-expressed genes. If no expression data was provided, only the color for input genes can be determined.

Additionally, the color can be chosen for each of the three interaction types (canonical, generated and user) as well as the color for the filled arrow head.

50 Removes the selected elements and interactions.
52 The last step of changing the network is undone or redone. Your settings and selected filters stay the same.
54 Fits the network to the available screen size.
56 Zooming is possible via this slider, alternatively via the mouse wheel. If Ctrl is pressed while the mouse wheel is moved, the display will zoom into the selected area.
56 In this combo box all genes of the current network are listed. Selecting such a gene will result in location of the gene in the graphic display.
56 The currently displayed network and the genes in the current filter can be exported in a number of formats. Additionaly a network in GePS format can be re-imported. For further information see Export/Import.

Export/Import

Metadata import

This import option allows you to import metadata for genes from a tab-separated file. You can provide for each gene multiple entries/rows consisting of an identifier, a tooltip text and optional a data series. GePS displays up to six rows on the side of the genes as small circles, which are colored according to their associated value. Additional entries are displayed in the tooltip of the last circle. A slider can be used to progress through the data series. Please note that the total number of metadata entries is limited to 200,000. Further entries will be skipped during the import process.

Metadata of the first datapoint for the gene A1BG as provided in the lower example file.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file.
Metadata of the second datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series.
Metadata of the second datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series of the addional rows.
Metadata of the first datapoint for the gene A1BG as provided in the lower example file. The tooltip shows the provided text and the data series of the addional rows.

Expected format of the uploaded file:
The file has to be in text format with tab-separated columns. Excel files are not supported.
The first column contains the gene ID. The second column contains an ID for the data. The third colummn contains the text of the tooltip, which is diplayed hovering the mouse over the circles. The optional subsequent columns are used for the data values.

Example file:

1	Gx1	Tooltip for Genomatix ID 1	1.0	-1.0
1	Gx2	Tooltip for Genomatix ID 2	2.0	-2.0
1	Gx3	Tooltip for Genomatix ID 3	3.0	-3.0
1	Gx4	Tooltip for Genomatix ID 4	4.0	-4.0
1	Gx5	Tooltip for Genomatix ID 5	5.0	-5.0
1	Gx6	Tooltip for Genomatix ID 6	6.0	-6.0
1	Gx7	Tooltip for Genomatix ID 7	7.0	-7.0

KEGG pathway import

Pathways and networks can only be imported in GePS format, which is based on the graphML format. However, a KEGG pathway can be converted with the KEGGtranslator from the Cognitive Systems chair of the University Tübingen into the graphML format. After the KEGG pathway has been downloaded and converted into the graphML format with the KEGGtranslator, the file extension has to be changed manually from graphML to GePS. Then the pathway can be imported into GePS.

Please note that GePS preserves the positions of the imported pathway elements and so the elements might be overlapping. Therefore a new layout might be necessary.


Network navigation bar (bottom)

This navigation bar mainly contains tools to modify the currently loaded network with different settings. Additionally, the navigation bar contains options to open external tools with the genes from the network.

Settings

The settings box contains options for network generation and extension as well as for the removal of filtered genes.

There are two types of network generation algorithms which can be chosen:

In the section Network generation algorithms is a more detailed explanation of the algorithms.

GePS includes four different co-citation levels.

The sentence and function level count the number of sentences. However, the abstract and expert level count the number of different abstract.

The co-citation filter specifies the minimum number of co-citations for gene-gene interactions. If you set the co-citatation filter to 10 and the co-citation level to sentence level, then the network generation algorithm only adds gene-gene-interactions to the network with at least 10 co-citations on sentence-level.

The interactions per gene parameter defines the maximum number of interactions that are drawn for an added gene. The interaction with most co-citations will be visible, while all other interactions will be hidden. A single click on an gene displays all added interactions that were selected for the network generation or extension. A single click on such an interaction fixes the interaction in the network so that it stays visible even if the focus is changed. Note that the network generation algorithm can draw more visible interactions, if it is nesessary to fullfill its goals like connectivity or shortest paths.

The genes in generated network parameter defines the maximal number of genes which are included in a generated network. The genes are chosen based on the number of citations and if they can be connected to the other network genes. A more detailed description can be found under Restriction of network genes.

The hide extension settings option determines if the additional extension settings box opens after pressing the extension button.

The layout after removal option in the “Removal of filtered genes” section defines if the layout algorithm should be applied automatically after the removal of the filtered genes. The generate network after removal defines if the network should be re-generated.

Network generation

This function re-generates a new network from the genes of the currently displayed network (independent from the selected filters). The parameters of the network generation algorithm (network type, co-citation level, number of co-citations and number of interactions) are applied from "Settings". More information about network generation can be found unter Network generation and extension.

Network extension

A network can be extended with either genes, transcription factors and transcriptional targets. More information about the extension algorihtms can be found unter Network extension algorithms.

There are three options which can be modified for each extension algorithm (unless ‘Hide extension settings’ is selected in ‘Settings’). The maximal number of genes added to the network, Use all genes from the input list and the connectivity for the added genes can be defined. The option Use all genes from the input list adds for one selected gene all co-cited input genes to the network. The connectivity for added genes requires an added gene to be connected to that many genes in the current network.

You can also extend only a subset of genes by selecting them before the extension step. If no gene is selected, all genes in the network are used for extending. The number of distinct selected genes is displayed.

Network processing

Removing filtered genes: Pressing this button removes all filtered genes (genes painted grey) from the network. Depending on your settings, the network is re-generated and/or layouted.

Removing elements without interactions: This option removes all elements without an interaction.

Adding interactions between selected genes: This option adds all interactions between the selected genes to the network.

Layout

There are three types of layout to choose from:

hierarchical layout Hierarchical layout: The hierarchical layout highlights the main direction or information flow of the network.
centric layout Centric layout: The centric layout emphasizes on highly connected proteins.
cellular layout Cell layout: The cell layout structures the network in accordance to the cellular locations of the proteins.

You can also change the layout of the network manually by moving nodes around. Select any graph element by clicking on it with the left mouse button. Once an element is selected you can drag it around. You can select multiple items by holding down the ctrl key (cmd key on MacOS X), or by dragging over an area while pressing the left mouse button. Holding the shift key while moving the mouse shifts the network.

The last button allows the user to toggle between the input species and the target species if the option “Use orthologous genes in human for the analysis instead of the input genes.” was activated in the initial GeneRanker interface. By default, the gene symbols of the target species are shown. Toggling shows all orthologous gene symbols if available. Additionally, the gene information box shows informations to the orthologous gene. Note that all interactions remain the same.

External programs / Genomatix tasks

This option opens an external program from Genomatix. There are two options to open a program with a set of genes. The genes that are in the currently displayed network or the genes that could pass the filter.

The Gene-TF Analysis determines for each gene and each transcription factor (TF) from the passed gene set if the TF has a binding site (+) on the gene's promoter or not (-). In addition, it determines for each gene if they are sufficiently co-cited and uses different colors to highlight such an interaction. For this display, the parameters ‘co-citation level’ and ‘minimum number of co-citations’ from GePS are used. The result is represented in a table where each column is labeled with an TF and each row is labeled with one of the four genes. Each cell represents if the TF has a binding site (+) on the gene's promoter or not (-). Furthermore, each cell represents if the TF and the gene are co-cited in terms of the co-citation filter and level. If a TF and a gene are co-cited and the TF has a binding site on the gene's promoter, then the cell's background is green. If both are just co-cited, then the cell's background is just yellow. If they are not co-cited, then the cell's background is grey.

The Gene-TF Analysis allows to export the table as Excel and as TSV file. The content of the Excel file is identical to the HTML output except for the order of genes. A TSV file contains just tab-separated values and no background colors. Therefore a cell in a TSV has just an entry if the TF and the gene are co-cited. TFs and genes, which are not co-cited, have an empty cell no matter whether they have a binding site.

The Gene-TF Analysis has been started with the genes ARHGEF7, CREBBP, NFKBIA and RELA. Each column is labeled with an TF (CREBBP and NFKBIA) and each row is labeled with one of the four genes.

The option GePS can be used to start GePS with a set of genes. GePS is started with the same parameters as the current GePS session. If you provided expression values for your input genes, all non-input genes in the new set are assigned NA as expression value.

If a gene list has been uploaded and thus a GeneRanker result is available, you can examine the GeneRanker result in detail by clicking on the button in the right corner.


Network generation and extension

Networks can be generated in three ways. A network can be generated directly from a gene list by clicking on a filter/gene list in the sidebar. You can also select several gene lists, clicking on the ‘Generate network’ button will then generate a network of the combined gene lists (‘and‘/‘or‘). Furthermore the interactions of a displayed network can be re-generated by clicking on the button in the network navigation bar e.g. with a different co-citation level or filter. A displayed network can be also extended by genes, transcription factors or transcriptional targets.

Gene lists can contain a very large number of genes and many of these genes can be co-cited. To maintain reasonable network views, there is an upper limit to the number of genes and interactions displayed.

Restriction of network genes

Displaying all genes from a large gene list can result in an unreadable network. Therefore the maximal number of genes for a generated network is defined under settings. If you generate a network from a gene list which has more genes, the restriction algorithm chooses the most cited genes, builds the network and replaces all network genes, that could not be connected, with the next most cited genes. This procedure is repeated until no more network genes are unconnected.

If you want to ensure that certain genes are contained in the generated network, you can define mandatory genes in a dialog box. The dialog box opens, if you want to generate a network from a gene list which exceeds the maximal number of genes.

Network generation algorithms

Literature-based networks in GePS contain very large numbers of interactions between genes. Displaying all these interactions in the pathway view would render it unreadable. Therefore a strategy is needed to reduce the number of displayed interactions without losing relevant information. To achieve this, GePS uses the simple network or the shortest path algorithm to calculate the optimal set of interactions for a network.

The simple network algorithm initially creates a network by starting with a plain list of genes. Then it iterates three times over all interactions in descending order by their number of co-citations that pass the co-citation filter. In the first iteration the algorithms adds an interaction for two genes if both do not have any interaction yet. In the second iteration the algorithm adds an interaction for two genes if they are not connected by a path. This step avoids unconnected subnetworks. In the last iteration the algorithm adds invisible interactions if the parameter interactions per gene is not yet fullfilled.

The shortest path algorithm initially creates a network with all genes and all interactions that could pass the co-citation filter. Then the algorithm searches for shortest paths from the selected genes to all other genes and removes all interactions that are not in those paths.
If no genes have been selected by the user, the algorithm selects a gene from each connected component (subnetwork in which any two genes are connected to each other by paths) with the highest number of interactions.

In GePS the weight of an interaction between two genes is determined by the number of co-citations supporting the connection - the more evidence the shorter the connection. However, as opposed to the road map example, it makes a difference wether a relation is direct or indirect in biological networks. As the number of ‘hops’ between two genes is not taken into account by the algorithm we needed to find a way to make use of this information to make sure that direct relations between two genes are always preferred over indirect connections.

Network extension algorithms

A network can be extended with co-cited genes by three different algorithms:

The algorithm Extend by genes extends the network by examining all genes, that are not yet included in the current network. The algorithm starts with genes passing the current filter, then genes from the input list and then all other annotated genes. If the option ‘Use all genes from the input list’ has been selected (only available for one selected gene), the algorithm uses only the input genes. In each step the genes are ranked primarily according to the number of network genes with which they are co-cited and secondarily according to the sum of all of co-citations. The best genes are added to the network and initially for each added gene only the interaction with the highest number of co-citations is shown.

Extend by transcription factors extends the network by adding transcription factors. In principle, it follows the same logic as ‘Extend by genes’, but considers only transcription factors.

The algorithm Extend by transcriptional targets extends the network by adding transcriptional downstream targets. In principle, it follows the same logic as ‘Extend by genes’, but has additional restrictions. First, from the network, only transcription factors that have a known TF binding site matrix are considered. Second, an interaction between a gene and a network transcription factor is only considered if both are co-cited and the transcription factor has a binding site in a promoter of the gene. Thus, for networks that do not contain a transcription factor with a known matrix this option cannot be applied.


Data sources

Pathway data were collected from the Pathway Interaction Database. This includes a number of pathways imported from BioCarta.
These data and their generation are described in detail in

Cancer Cell Map is a collection of selected human-focused cellular pathways implicated in cancer, created by the Memorial Sloan-Kettering Cancer Center.

INOH (Integrating Network Objects with Hierarchies) is a database of higher order functional knowledge such as relationships among multiple bio-molecules that constitute signal transduction pathways or biological events in general.

The basal pathway data are supplemented by information collected from

The visual notation in the Genomatix Pathway System adheres, where possible, to the standardized graphical notation put forward by the Systems Biology Graphical Notation project.