![]() |
![]() |
ModelInspector uses a library of predefined models or models defined with FastM or FrameWorker to scan DNA sequences for matches to these models. A model consists of various individual elements (like transcription factor binding sites, repeats, hairpins), their strand orientation, their sequential order, and their distance ranges.
ModelInspector uses a proprietary scoring algorithm to allow inclusion of very different element types into the composite scoring of matches. Thus, IUPAC sequence elements can be successfully combined with different types of weight matrices and structural elements (e.g. hairpins) in the assessment of match quality.
The ModelInspector and FastM algorithm is described in Frech et al., 1997 (JMB), and Klingenhoff et al., 1999 (Bioinformatics).
The following predefined libraries are available:
All models of the Genomic Repeat and Long Terminal Repeat Library show a very high specificity.
| Sequence Input | |
|---|---|
| Choose from your previously uploaded sequences | Select a sequence file from the list of your personal sequence files. |
| or enter the formatted DNA sequence(s) | Enter your correctly formatted sequence(s) directly into the
form, e.g. with copy and paste. The following formats are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped! |
| or upload a file containing sequence(s) (max. 100 MB) | If your browser supports this option, a sequence file can be uploaded. If you use this option, the file should contain the sequence(s) in either one of the following formats: Please note, that the size for uploaded files is limited to 100MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1). |
| or enter accession number(s) |
If you are interested in one or several special
sequences from a database section, you can supply a list of correct accession
numbers in the form. If you want to select more than one accession number,
please separate the accession numbers by commas or spaces.
On the Genomatix server accession numbers from the following databases can be entered:
|
| Database input | |
|---|---|
| Select one of these database-sections | On the Genomatix server the following databases are available:
In case you have selected a section from the GenBank database you may also restrict the analysis to sequences containing user-defined keywords in their annotation. You can enter keywords which will be searched in
The keyword searches can be combined with "AND" or "OR". Please note that the keywords cannot contain blanks (all blanks will be skipped). These parameters are hidden by default. You can use the |
| Model Library Selection | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Library version | Here you can select a previous version
of the promoter module library. This can be helpful for re-producing old
results. By default, the latest promoter module library is selected (please
see the Module Library Release Notes).
Each version of the module library corresponds to a specific version of
the matrix library (the matrix library version with which the module definitions
have been generated):
The module library version selected also affects searches for user-defined models (i.e. the matrix library version corresponding to the module library version selected is used for the model searches). It is necessary to change the module library version (and thus the matrix library version) when user-defined models contain matrix families that have been removed or renamed in newer library versions. |
||||||||||||||||||||
| Model groups | Please choose one
or several of the available Genomatix model libraries. If you have created your own models with FastM or FrameWorker, they can be found in the "User-defined models" library. You can decide if you want to
In the third case, there will be a separate page with a list of all models in the chosen libraries and you can select your model subset by clicking the checkboxes for each model. If you started ModelInspector directly from a previous FastM session, the list will only contain the model you just created with FastM. |
||||||||||||||||||||
| Search Parameters | |||||||||||||||||||||
| Max. number of matches | Enter the maximum number of matches in
the output file. In case the output is filtered for matches occurring in selected annotated sequence regions, only the filtered matches are considered for the maximum number of matches. Hint for user-defined models: |
||||||||||||||||||||
| Threshold |
Enter a threshold for
the output of model matches. This value gives the minimum score that a match has to reach to appear in the output file. The value is given in percent of the number of individual elements of the model. Default is 100 % (i.e. all elements of the model have to be present). These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Strand |
If this option is checked,
only the top strand of the input or database sequences is scanned for
model matches. Per default both strands are searched.
These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Annotation filter | Generally, all matches are listed in the output.
Alternatively, the output can be filtered for matches located in
These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Ranking | In case one of the Genomatix promoter databases is scanned with a model, the search results are evaluated by calculation of p-values for Gene Ontology groups. This evaluation shows whether genes identified by the model are functionally related. As the Gene Ontology ranking takes some time, it can be switched off. | ||||||||||||||||||||
| Output Parameters | |||||||||||||||||||||
| Offset for match positions |
Enter an offset (in number of
basepairs, can also be negative) that will be added to each position
in the output file. This feature can be used i.e. in cases where the transcription start site is known and positions should be given relative to the TSS. E.g. if the TSS is at position 500 in a sequence, the offset should be "-500" for relative positions. These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Statistics |
If the statistics option
is checked, only the statistics output is created.
In this case the number of matches is unlimited.
This option is useful if you are interested in the total number of matches in a large data set (e.g. the Human Genome) as the number of matches shown in the match overview is limited. These parameters are hidden by default. You can use the
|
Alternative matches |
If the alternative matches
option is checked, alternative model matches are displayed additionally
in the detailed output.
For overlapping model matches (i.e. model matches where the start and end positions are identical or differ by less than 5 base pairs), only the model match with the highest score of individual elements is shown. The alternative model matches can be displayed optionally in the detailed output in order to check the positions and scores of the individual elements. These parameters are hidden by default. You can use the
|
||||||||||||||||||
| Output sorted by |
The output can be sorted
These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Output filtered for |
The output can be filtered for
sequences in which at least the specified number of different model matches
occur. Per default, all sequences with at least one model match are shown
in the output.
These parameters are hidden by default. You can use the
|
||||||||||||||||||||
| Email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! |
||||||||||||||||||||
ModelInspector generates three output files, the match overview, the detailed output, and the statistics file.
The first output file of ModelInspector contains:
| Extraction Options | |
|---|---|
| Sequence Extraction | You can extract the
|
| GeneID Extraction | The button "Extract GeneIDs" extracts
the GeneIDs of the matching sequences for each model separately. The
extracted GeneIDs can be used e.g. as input for GePS. The button "Extract GeneIDs by Chromosome" extracts the GeneIDs of the matching sequences for each chromosome separately. |
| Match Extraction | The button "Export Matches in EXCEL format" allows
to export all information available in the match overview (like model
name, sequence name and position of the model match) to a tab-delimited
file. This file is saved to your local disk and can be opened directly
with Microsoft Excel. The button "Export Matches in BED file format" allows to export genomic model matches to a BED file (e.g. for upload into RegionMiner). The button is only available when a genome search has been performed or when the input sequences include chromosomal positions in their annotation. |
| Compare results | |
|---|---|
| Enter GeneIDs |
Enter a list of GeneIDs separated by spaces, returns, or commas. |
When you press the "Compare" button you will get the information which GeneIDs are common and which GeneIDs are specific either for your input list or for the ModelInspector result.
| Further Evaluation of Matches | |
|---|---|
| Search PubMed for |
PubMed can be searched for entries containing the gene name which is automatically extracted from the description line of the matching sequences and further keywords that can be entered by the user. Default for these keywords is ("promoter" OR "transcription factor"). |
When you press the "Extract gene names" button you will receive a list with the extracted gene names and links to the corresponding PubMed queries.
Note: The gene name extraction works only for genomic eukaryotic DNA sequences.
| Sequence | Model Name | Position | Strand | Select Match |
|---|---|---|---|---|
|
ep029015 [E11078] (1 - 600) |
YY1F_SRFF_02 | 257 - 275 | (+) | |
|
humactga [M19283] (1 - 575) |
YY1F_SRFF_01 | 378 - 397 | (+) | |
| YY1F_SRFF_02 | 396 - 378 | (-) | ||
|
musga [L21996] (1 - 601) |
YY1F_SRFF_01 | 403 - 422 | (+) | |
| YY1F_SRFF_02 | 421 - 403 | (-) |
For this example, the experimentally verified module "CDEF_CHRF_01" from the Genomatix Promoter Module Library which is involved in cell cycle regulation was searched in all human promoters.

The second output file of ModelInspector contains detailed information for each individual element of the model:
Inspecting sequence humactga [M19283]
(1 - 575):
Model: YY1F_SRFF_01
(378 - 397 (+))
| Matrix element Model element |
Position | Str | Sequence | Core sim. --- |
Mat. sim. Model sim. |
Distance to next element |
|---|---|---|---|---|---|---|
| V$YY1F/YY1.01 | 378 - 396 | (+) | GATCGCCATATATGGACAT | 1.000 | 0.757 | 1 bp |
| V$SRFF/SRF.03 | 379 - 397 | (+) | ATCGCCATATATGGACATG | 1.000 | 0.996 | --- |
The third output file of ModelInspector contains a statistics of the model matches and detailed information for your own models:
| Model Name | # matches | in # seq. | ||
|---|---|---|---|---|
| total | (+) str. | (-) str. | ||
| YY1F_SRFF_01 | 6 | 2 | 4 | 4 |
If you are interested in more details, ModelInspector and FastM are described in
| © 1998-2011 Genomatix Software GmbH - All rights reserved |