Genomatix-Logo
Overview of Help-Pages
ElDorado-Logo

Comparative Genomics


[Introduction] [CompGen promoters and promoter sets] [Table Output] [Graphical Overview]

Introduction

The Comparative Genomics section in ElDorado allows analysis of the transcripts known for a group of orthologous genes (vertebrates or plants). Incomplete or misleading annotation for one genome is identified by comparison of the information available from the other genomes. The program evaluates which of the available promoter regions correspond to each other and extends the annotation by additional promoter regions for which no transcripts are annotated so far (CompGen promoter). All of the promoter regions can be selected by the user for further analysis.

For the requested input gene all Locus IDs for a group of known orthologous genes and their alternative transcripts are retrieved from the database. The transcripts are sorted by organism and aligned by the most 5' conserved exon. Corresponding promoter regions from different organisms are grouped into promoter sets. The analysis is based on the comparison of the exon/intron structure of two transcripts and on the sequence similarity of the corresponding sequence regions.

For the assignment of orthologous genes all transcripts annotated in ElDorado are aligned against each other exhaustively. The pairwise sequence similarity is used to build homology groups for vertebrate and plant genomes.

The output will show a table with all transcripts, a task menu for further analyses of the promoters and a graphical overview of the aligned transcripts.


CompGen promoters and promoter sets

We developed an algorithm to identify corresponding regulatory regions in the genomic sequence of different organisms which are responsible for the transcriptional control of orthologous genes.

Our approach correlates the transcripts known for a gene in two different organisms by mapping the first exon of a transcript in one organism to the genomic sequence of a second organism. If the mapped exon overlaps with the first exon of a transcript originally annotated in the second organism the two transcripts are defined as corresponding.

Background: Since the first exons are usually non-coding (5'UTR) they are not highly conserved between species. Thus, we defined potential target regions of several thousand base pairs for mapping in the second organism: The transcripts in both organisms were analyzed for conserved exons by comparing nucleotide sequence and length of the exons. Based on conserved 'anchors' the potential target region is determined (ca. 20.000 bp) and the first exons of the first organisms are mapped to the target region.

This approach was executed exhaustively for a large group of organisms for which orthologous genes have been identified before. The outcome of the genome wide analysis is a collection of promoter sets. Each of the promoter sets represents a group of regulatory sequences that are responsible for the transcription of corresponding transcripts from different organisms. Promoters in such a promoter set, that "only" have a transcript mapped from a different organism are called CompGen promoters.

Output

The result page contains several sections:

Graphical overview of the aligned transcripts

The overview is a graphical representation of the transcripts listed in the table above and their genomic context. To allow an alignment of transcripts from different loci they are all displayed on the plus strand. Alternative transcripts from a single locus are simply arranged by their genomic location. Transcripts from orthologous loci are aligned against each other by their most 5' conserved exon. If there are no corresponding exons found the transcripts are aligned by their assumed TSS.

In this example (ACTN4 gene) we depicted two transcripts for the human genome. They are transcribed from two independent promoters.
Also two transcripts from the mouse genome are shown. The second corresponds to the first transcript of the human genome (promoter set 1 = first orange box).
For the rat genome the first three depicted transcripts also correspond to the longer transcript known from the human genome (promoter set 1). In the fourth rat sequence, the second promoter set (orange box) in the first mouse sequence was used to predict a new promoter (=> second orange box with yellow promoter region).

output

The graphic consists of four parts:

The main sequence panel

The main sequence panel contains all transcripts as listed in the table view of the output. For each transcript, a separate sequence is depicted as a line with all annotated elements. Thus, the transcripts appear below each other with differently annotated exons, introns or UTRs.

The colored field to the right of the transcript labels indicates the quality of the transcript (gold, silver, bronze).

The navigation panel

The view on the sequence can be changed by using the zoom- and scroll element in the lower right part of the graphics. The navigation panel contains a scaled down version of the sequence and a red box which marks the currently selected part of the sequence that is visible in the main sequence panel above. By default, the whole sequence is displayed.

slider
To zoom in or out, adjust the red box's size in the navigation panel by dragging its vertical edges (indicated by a different mouse pointer). The sequence in the main panel will adjust to the selected window.

If you want to scroll along the sequence, move the red box within the navigation panel by dragging it with the mouse to the desired position.


The selection panel

The selection panel contains a tree with all elements that can be toggled on or off for display. Tree hierarchies can be shown or hidden by clicking on the label or the ^ - symbol. Checkboxes provide the possibility to toggle the visibility of

Individual transcripts can be removed from the main sequence panel by clicking on the checkboxes.

The toolbar

zoom/navigation exports the graphics to a certain format (JPG, PNG, TIFF), based on the current settings of zoom and element selection