Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

FastM: Definition of Sequence Models


[Introduction] [Model Definition] [References]

Introduction

FastMis a method to develop user defined models of transcriptional regulatory DNA units (e.g. promoters). Thus, modular organizations of functional sequence regions (e.g. promoters) can be modeled.

Models generated by FastM can then be used with ModelInspector to scan any DNA sequences or sequence databases for matches to the model.

FastM will build a so-called model from the information of

The principles behind FastM and ModelInspector are published in Klingenhoff et al., 1999 (Bioinformatics).


Model Definition

Definition of a Model
Define a new model You have to enter a name for your model which will be used for further reference to this model, and select the number of sequence elements for this model.
After you press the "Continue" button, a new form will ask for the type of elements. If you selected the wrong number of elements, you can use your browser's back-button and correct your model.

Element types
Matrix
description
Select a matrix or a matrix family from the respective scrollable lists given in the form.
The matrices are taken from the MatInspector matrix library. The matrix names are from all sections of the library and for clarity their names start with one of the following
  • F$: Fungi
  • I$: Insects
  • V$: Vertebrates
  • P$: Plants
  • B$: Bacteria
  • N$: Nematodes
  • W$: Viruses
  • O$: Other functional elements
  • U$: User-defined
Further parameters for the matrix are:
  • the strand orientation
    (refers to the matrix matches as they will be found by MatInspector on sense strand or antisense strand of the selected sequence)
  • the core similarity
  • and the matrix similarity
IUPAC string There are two ways to define a IUPAC element:
  • User-defined IUPAC string:
    • Only the IUPAC symbols ABCDGHKMNRSTUVWY can be used, with e.g. S representing C or G. Other letters in your string will be ignored.
  • individual IUPAC strings from the MatInspector library
Further parameters are:
  • the maximum number of mismatches that are allowed
    (absolute value for individual IUPACs, relative value (in % of length) for IUPAC families).
    The mismatches may occur at any position of the string.
  • the strand orientation
Transcription start site The transcription start site (TSS) of genes can be included as element in promoter models. Including the TSS in promoter models makes sense when transcription factor binding sites are found in a conserved distance upstream or downstream of the TSS (like the TATA box). Transcription factor binding site models generated by FrameWorker can be extended by including the TSS as additional element.

Important note:
Transcription start sites can only be identified in Genomatix promoter sequences, i.e. promoter sequences that have been extracted by Gene2Promoter or the Genomatix promoter databases that are available in GEMS Launcher. For all Genomatix promoter sequences (except the promoters derived from Comparative Genomics) the transcription start site(s) are annotated in the sequence. Searching models including the TSS as element in other sequences will fail because the position of the TSS cannot be found.

User-defined Model A model that was previously build (by FastM or FrameWorker) can be incorporated into a new model, thus hierarchical models can be build.

Models can be selected from

  • the Genomatix model library (e.g. a vertebrate promoter module)
  • your user-defined models, e.g. if a FastM model was saved before, it can be used as an element for a new model

Parameters:

  • the strand orientation
    (refers to the model matches as they will be found by ModelInspector on sense strand or antisense strand of the selected sequence)
  • the model threshold (i.e. % of number of elements of the model)
    Default is 100 % (i.e. all individual elements of the model have to be present).
Direct repeat A direct repeat is defined as a stretch of basepairs that is repeated with a high degree of similarity in the same sequence.

Parameters:

  • the minimum length of the stretch of basepairs
  • the minimum percent of matches that the repeat has to show
  • the maximum distance between the two repeats in basepairs
    (has to be smaller or equal to the length of the sequence that is checked, if the two repeats can occur at any position within the sequence, enter -1)
Short
multiple
repeat
A short multiple repeat is defined as a stretch of basepairs that is repeated several times with a high similarity within the sequence.

Parameters:

  • the repeat string that is repeated several times
    (the shorter the string, the more often it should be repeated; this way e.g. Poly-A-stretches can be defined)
  • the minimum number of repetitions of the given string
  • the percent of matches that the repeats have to show compared to the defined repeat string
Hairpin
(inverted
repeat)
A hairpin is defined as an inverted repeat that forms the stem of the hairpin and a stretch of unpaired basepairs that forms the loop.

Parameters:

  • the stem size, i.e. the length of the repeat string
  • the loop size
  • the minimum free energy to be reached with the minimum stem length (in kcal/mol)
  • the minimum free energy for saving the inverted repeat (in kcal/mol)
Terminal repeat
(only for
first element)
A terminal repeat (direct or inverted) is defined as a stretch of basepairs that is repeated at the beginning and at the end of the defined model.

Parameters:

  • is the string repeated on the same strand or on the antisense strand?
  • the beginning of the repeat string (if it is known)
  • the minimum length of terminal repeat string
  • the minimum percent of matches for the repeat string at the end of the model
  • the distance between the last element of the model and the terminal (inverted) repeat at the end of the sequence
Distance range definition
Select a distance range between two consecutive elements Please enter the minimum and maximum distance between two elements in nucleotides.

For determination of the distances the middle positions (so-called anchor position) of the two consecutive elements are used.


Model Overview


References

If you are interested in more details, FastM and ModelInspector are described in