Genomatix-Logo
Overview of Help-Pages
ElDorado-Logo

Source of elements annotated in ElDorado

Elements Source
Primary transcripts
Exons
Introns
Transcripts and their exon/intron structure are determined by the mapping of cDNA sequences from different sources (e.g. RefSeq, GenBank, Ensembl). Transcripts annotated in ElDorado are divided into 3 quality levels:
  1. The lowest quality bronze is assigned to transcripts derived from the mapping of cDNAs for which no experimental evidence about the 5' completeness is available
  2. The quality silver is assigned to a transcript if its promoter region overlaps with a PromoterInspector prediction.
  3. The quality gold is assigned to transcripts derived from the mapping of cDNAs for which experimental evidence about the 5' completeness is available ( e.g. by oligo-capping). The quality of transcripts initially assigned to bronze/silver is increased to gold if the transcript correlates with at least 3 CAGE tags up to 3bp upstream/downstream of the transcript start.
UTRs calculated by determining the longest open reading frame (ORF) for the transcript
PromoterInspector-Predictions calculated by PromoterInspector
Promoter regions Promoters available in ElDorado are evaluated in a 3 step process:
  1. For each of the transcripts (independent of quality) the promoter is set to 500/100bp up/downstream of the TSS.
  2. The promoters of two or more transcripts are merged into larger promoter regions if they satisfy all of the following conditions:
    • belong to the same locus
    • the promoters and the first exons of the two transcripts are overlapping, respectively
  3. The annotation available from orthologous loci is evaluated. Promoter regions are extended if the first exons of corresponding transcripts differ in length. Based on the comparison of the exon/intron structure of two transcripts and on the sequence similarity of the corresponding sequence regions additional promoter regions are annotated (CompGen promoters). The genome annotation so far contains no transcripts for these promoter regions.
TSR
(transcriptional start regions)
TSRs are defined as regions of genomic sequence for which experimental evidence for transcription initiation is available. Information about transcription initiation is derived from individual full-length cDNAs and from CAGE tags. Both data sources make use of the oligo-capping method. The 5' ends of full-length transcripts and CAGE tags are taken as experimentally verified transcription start sites (TSS). TSSs separated by less than 40bp are grouped in a TSR.
Conserved Regions Conserved regions are calculated by a proprietary algorithm for genome wide sequence alignments. Conserved regions have a minimum length of 50 basepairs and a minimum similarity of 80%. The algorithms accounts for point mutations only, but not insertions/deletions. The elements annotated in ElDorado are based on comparisons of genomes from distant species only, i.e. closely related species like rat/mouse are not considered.

Note: Conserved regions are available up to ElDorado 12-2008.

MicroRNAs microRNAs are based on the sequences available in the miRBase at the Sanger Institute.
Probes Each single probe from gene expression arrays from Affymetrix and Illumina is mapped against the corresponding genome. All perfect matches are annotated.
SNPs derived from dbSNP (NCBI)
SMARs calculated by SMARTest
Repeats The following genomic repeats are calculated by ModelInspector: ALUs, L1 elements, THEs, and B1 elements.
Modules calculated for promoter regions by ModelInspector (Promoter Module Library)

Descriptive information about genetic loci was derived from NCBI's Entrez Gene.
Bibliosphere data is based on the analysis of abstracts from NCBI's PubMed.