| |
Gene2Promoter provides access to promoter sequences of all genes annotated in the available genomes. Promoter regions are thoroughly annotated and validated according to highest scientific standards, including Genomatix proprietary technology (e.g. PromoterInspector, oligo-capping, comparative genomics).
Generally, there are two ways of accessing Genomatix promoters via Gene2Promoter:
Gene2Promoter large scale output includes a file with the promoter sequences in FASTA or GenBank format, an Excel-readable file containing gene <-> promoter correlation and optionally Excel-readable files containing all transcription factor binding sites and promoter modules located in the promoters. After computation, the result files can be downloaded from our server.
"Promoter" sequences available from other sources are usually based on the 5' upstream regions of annotated genes. This is often misleading, since eukaryotic genes usually have 5' untranslated regions (5' UTRs). Since the 5' UTR may also be split over several exons the real regulatory region for a gene frequently is far away from the coding sequence (up to several kb). Gene2Promoter contains the precise annotation of the promoter sequences.
More than 50% of all genes do have alternative transcripts. These genes, additional to alternative splicing, are frequently regulated by different promoters. Only Gene2Promoter includes such alternative promoters.
In the input form you can choose from a list of organisms. You can retrieve all promoters or orthologous promoters for complete organisms or for a customized list of Gene Ids, Locus Ids or Promoter Ids. Additionally, you can apply a TF binding site filter, i.e. you can select up to four TF binding site matrix families all of which must have at least one match in the promoter sequences.
| Parameters | |
|---|---|
| Organism selection | You may choose an organism
from the list. The list of available organisms
depends on the ElDorado database version which you selected on your
Personal Page.
If you select an organism, all promoter regions for this organism will be extracted
(optionally filtered by occurrences of matrix family matches, as
described below). |
| List or file upload | You can use the file upload field and/or
the text field to upload a customized list containing Gene Ids, Locus Ids,
Promoter Ids or cDNA Accession numbers from GenBank, RefSeq or Ensembl.
The input file/list may contain ids of different types. To distinguish between these, the following rules are applied:
If you are interested in promoters of a single organism only, you can use the popup menu to specify an organism for filtering your list of input ids. These parameters are hidden by default. You can use the
Of course, the search for orthologous promoters and
the TF binding site filter is also applicable
for input Ids.
General note: Applying different filters on input lists can lead to an empty result. Consider uploading a list of vertebrate and plant genes and filtering
|
| Search for orthologous promoters | This option is activated as soon as at
least one vertebrate organism from the list is selected, i.e. at least one of the checkboxes
is checked. There are two ways to treat orthologous promoters:
Add orthologous promoters from each of the selected organismsWith this option, the promoters resulting from the input organism, resp. upload list/file, are examined. If there are orthologously related promoters in any of the selected organisms, these are added to the result. If the TF binding site filter is activated, it is also applied to the orthologous promoters. Restrict output to promoters which are orthologously conserved between ALL selected organismsIn this mode, the orthologous promoters are used as a filter. First, any orthologous promoters from the selected organisms are added to the promoter list, but then any homology group which does not contain at least one promoter from each of the selected organisms, is deleted. This means that any promoter in the result
In the result promoter file, any affiliation of a promoter to a homolgy group is denoted only if the search for orthologous promoters is activated. So if you are interested in a single organism, but want to have the homology groups/promoter sets, to which the promoters belong, you must select the same organism from the checkboxes for the "search for orthologous" option (see also the examples section). Note: The list of organisms to search for orthologs is automatically adjusted whenever the input organism changes. |
| TF binding site filter | You can use the drop-down menus
to specify up to four matrix families from the Genomatix
matrix library. These matrix families are used for filtering the promoter sequences. All promoters are scanned for matches of these families. If the sequence does not contain at least one match for ALL of the filter matrix families, it is rejected. In other words, ALL resulting promoter regions contain at least one match for EACH of the filter matrix families. The output file containing gene information will list the number of filter matrix family matches for each promoter. Keep in mind that there are different matrix families for the organism groups. When you select a matrix family to filter the promoters, the matrix library the family belongs to MUST match with the group to which the selected organism(s) belong(s), e.g. if you selected an insect organism, all filter matrix families must be from the insect matrix library. An exception to this rule is C. elegans. For this organism you may select matrix families from either the vertebrate or the nematode matrix library. The library can be recognized by the name of the matrix family:
Note: The list of matrix families is automatically adjusted whenever the input organism changes. Important notice: Applying a matrix family filter will considerably slow down the computation of the statistical data. |
After selecting organism(s) and/or uploading a list of Ids and specifying TF matrix families on the input form, a statistical overview for your result is computed and displayed. Also, you can specify further options for your result files.
First on this page is a listing of the organism(s) and the TF binding site filter families which you selected in the input form.
If you used the upload option, the parameter listing will include a distribution of your input Ids over the organisms. If any Ids could not be assigned to an organism (this can depend on the ElDorado database version in use), you will be notified about this.
The table shows the number of Genomatix loci and promoters which satisfy your search conditions.
With the upload option, the table shows, in which organisms your input Ids were found. If you selected organisms for filtering, all selected organisms will appear in the table, also those, for which no promoters were found.
Before you start the extraction of the promoter regions, you should check if you want to apply some of the download options.
| Download options | |
|---|---|
| Sequence format | By clicking the radio button you can choose the sequence format for the extracted promoter regions. Available formats are |
| Additional output | Optionally you can include an analysis
of the resulting promoter sequences. Selecting the "Transcription factor binding sites" checkbox will create an additional output file containing Genomatix MatInspector matrix matches. The matrices used for searching are hereby automatically selected on the basis of the organism(s) which you selected, respectively to which your input Ids belong (if you used the list upload feature). In particular, vertebrate TF site matrices are used for human, chimp, rhesus macaque, mouse, opossum, rat, dog, horse, cow, Platypus, chicken and Zebra fish promoters, plant TF site matrices for Arabidopsis and rice promoters, insect TF site matrices for Drosophila, Anopheles and honeybee promoters, vertebrate and nematode TF site matrices for C. elegans). In an analogous manner, if you select "Promoter modules", the promoter regions will be analyzed with Genomatix ModelInspector. The promoter modules involved are also chosen depending on the organism. Important note: Promoter modules are not available for all organisms. Currently, there is no promoter module library for insects, i.e. for:
For C. elegans, the modules from the Vertebrate Promoter Module library are used. |
| Email address | Extracting the promoter regions is a long running job. You will be notified via email when your result files are available (this should take at most one day). Thus you must provide an email address where the notification will be sent to. The email will contain a link to the HTML result page. |
Gene2Promoter large scale creates several output files (clicking an item below will show a sample output):
Result HTML page containing
links to download the output files.
After computation/extraction of the promoter data you will receive an email.
This mail contains a link to the HTML result page. On this page, you will
find
ASCII file containing
the promoter sequences in the chosen format.
Annotation of the sequences is represented using Genomatix
syntax. Orthologously related promoter regions can be identified by
the common
"homgroup" value in the annotation.
Excel readable file
with promoter/gene information
This file contains the following information (left to right):
| Gene Symbol | Gene symbol. If the gene symbol is not available, the string "n/a" is denoted. |
|---|---|
| Gene Id | (NCBI) Gene Id. If the gene Id is not available, the string "n/a" is denoted. |
| Locus Id | Genomatix Locus Id, a string of the form "GXL_" followed by a number. |
| Accession number | A comma-separated list of cDNA accession numbers associated with the promoter. |
| Organism | The organism the gene belongs to. |
| Promoter Id | Genomatix Promoter Id, a string of the form "GXP_" followed by a number. |
| TF filter | If you selected the TF binding site filter option, there will be a column for each filter matrix family, containing the number of matches of this family within the promoter. |
A Genomatix promoter might correspond to several gene symbols (and vice versa). This means that the same Promoter Id can occur several times in the "Promoter Id" column.
Excel readable file
with transcription factor binding sites and their position (with TF
binding site option)
This file contains the following information (left to right):
| Promoter Id | Genomatix Promoter Id, a string of the form "GXP_" followed by a number |
|---|---|
| Family | Name of the Matrix Family |
| Matrix | Name of the Matrix |
| Start | Start position of the matrix match (relative to promoter start position) |
| End | End position of the matrix match (relative to promoter start position) |
| Strand | Strand of the matrix match (relative to the promoter strand) |
| Core sim. | Core similarity of the match |
| Matrix sim. | Matrix similarity of the match |
| Sequence | Matching sequence within the promoter (this is a substring of the promoter sequence in the promoter output file) |
Excel readable file
with promoter modules and their position (with promoter
module option)
This file contains the following information (left to right):
| Promoter Id | Genomatix Promoter Id, a string of the form "GXP_" followed by a number |
|---|---|
| Model | Name of the promoter module |
| Start | Start position of the model match (relative to promoter start position) |
| End | End position of the model match (relative to promoter start position) |
| Strand | Strand of the model match (relative to the promoter strand). The value '+/-' indicates that the model was found on both strands simultaneously. |
The following examples demonstrate the effect of the different input/filter options of Gene2Promoter large scale.
| © 1998-2010 Genomatix Software GmbH - All rights
reserved Corporate Information • Privacy Policy • Trademarks |