Genomatix-Logo
Overview of Help-Pages
ElDorado-Logo

Comparative Genomics


[Introduction] [CompGen promoters and promoter sets] [Table Output] [Graphical Overview]

Introduction

The Comparative Genomics section in ElDorado allows analysis of the transcripts known for a group of orthologous genes (human, mouse, rat, chicken, dog, chimpanzee, rhesus macaque, cow, horse, pig, opossum, zebrafish and platypus). Incomplete or misleading annotation for one genome is identified by comparison of the information available from the other genomes. The program evaluates which of the available promoter regions correspond to each other and extends the annotation by additional promoter regions for which no transcripts are annotated so far (CompGen promoter). All of the promoter regions can be selected by the user for further analysis.

For the requested input gene all Locus IDs for a group of known orthologous genes and their alternative transcripts are retrieved from the database. The transcripts are sorted by organism and aligned by the most 5' conserved exon. Corresponding promoter regions from different organisms are grouped into promoter sets. The analysis is based on the comparison of the exon/intron structure of two transcripts and on the sequence similarity of the corresponding sequence regions.

For the assignment of orthologous genes all transcripts annotated in ElDorado are aligned against each other exhaustively. The pairwise sequence similarity is used to build homology groups covering up to 13 vertebrate genomes.

The output will show a table with all transcripts, a task menu for further analyses of the promoters and a graphical overview of the aligned transcripts.


CompGen promoters and promoter sets

We developed an algorithm to identify corresponding regulatory regions in the genomic sequence of different organisms which are responsible for the transcriptional control of orthologous genes.

Our approach correlates the transcripts known for a gene in two different organisms by mapping the first exon of a transcript in one organism to the genomic sequence of a second organism. If the mapped exon overlaps with the first exon of a transcript originally annotated in the second organism the two transcripts are defined as corresponding.

Background: Since the first exons are usually non-coding (5'UTR) they are not highly conserved between species. Thus, we defined potential target regions of several thousand base pairs for mapping in the second organism: The transcripts in both organisms were analyzed for conserved exons by comparing nucleotide sequence and length of the exons. Based on conserved 'anchors' the potential target region is determined (ca. 20.000 bp) and the first exons of the first organisms are mapped to the target region.

This approach was executed exhaustively for a large group of organisms for which orthologous genes have been identified before. The outcome of the genome wide analysis is a collection of promoter sets. Each of the promoter sets represents a group of regulatory sequences that are responsible for the transcription of corresponding transcripts from different organisms. Promoters in such a promoter set, that "only" have a transcript mapped from a different organism are called CompGen promoters.

Relevant transcripts

For comparative promoter analysis (e.g. FrameWorker analysis) it is important to use the correct promoters of a number of alternative transcripts as input. Genomatix marks transcripts that are probably not relevant for producing the main product of a gene as less relevant, thus allowing a focus on the main promoters for a gene for subsequent analysis. The tag less relevant is based on an evaluation of the exon/intron structure of all alternative transcripts within a locus.

Output

The result page contains three sections: