Genomatix-Logo
Overview of Help-Pages
FrameWorker-Logo

FrameWorker: Definition of common framework


[Introduction] [Input and Parameters] [FrameWorker Output] [History]

Introduction

FrameWorker is a complex software tool that allows users to extract a common framework of elements from a set of DNA sequences. These elements are usually transcription factor binding sites since this tool is designed for the comparative analysis of promoter sequences (e.g. inter-species analysis).

Single elements -> Organizational framework
From single elements to common framework

FrameWorker returns the most complex models that are common to the input sequences (satisfying the user parameters). Models/frameworks are defined as all elements (TF sites) that occur in the same order and in a certain distance range in all (or a subset of) the input sequences. The resulting models can be saved in the user-directory and subsequently can be used to scan any set of sequences.

Note that FrameWorker uses strand-specific TF sites for building models: i.e. if the number of sequences with a TF site on one strand (sense OR antisense) is above quorum, then the TF is considered a "common element" and can be used to build potential frameworks.

If no frameworks are found in a set of sequences, the quorum constraint or the distance constraints should be lowered (see parameter section below)
If too many frameworks are found, the quorum constraint or the distance constraints could be raised or mandatory elements should be selected (e.g. if known from biological data).


New in FrameWorker 5.4 (May 2007):

Please see below for a history of changes.


Hint! Warning:
If the majority of input sequences is highly homologous the resulting models will not be significant! E.g. two promoter sequences from mouse and human that are 95% homologous will probably yield frameworks that do not necessarily contain the relevant (i.e. functional) sites. In this case, FrameWorker will display a warning in the output.

Please note that framework and model are used synonymous here.


FrameWorker Input Data

Sequence Selection
Sequence data The program expects a set of DNA sequences which are the basis for extraction of common elements. These sequences can be supplied in either one of the following formats: You can enter your correctly formatted sequence(s) directly into the form, e.g. with copy and paste, or upload a file containing the sequences.
Library selection
Matrix group The selection here corresponds to the MatInspector settings.
The sequences are scanned for matches to the selected MatInspector matrices. The matches found are used as basic elements for the extraction of a common framework.

Note: The lower the core and matrix similarities the more basic elements can be used for extraction of a framework. The trade off may be very unspecific resulting models.

Matrix filters Matrices used for the analysis can be filtered by tissues as described here
FrameWorker Parameters
Quorum constraint
for framework
The lower limit of sequences within the input set that has to contain the common framework. Default is the absolute number of sequences that corresponds to at least 85% of the input sequences.
Distance constraints
for framework
Minimum and maximum distance between two elements:

These values denote the minimum and maximum distances between the anchors of two elements of a resulting model. Only basic elements that occur in the given distance range are considered for inclusion in a framework.

New in FrameWorker 5.0 (Nov. 2006):
Maximum distance variance between two elements:

This parameter sets the maximum possible variance of distances between the anchors of the model elements. A framework satisfies the parameter if the distances between the instances of the model in the input sequences do not differ more than the distance variance parameter. For promoters, a distance variance of 20-30 basepairs is usually sufficient.

Example: If the minimum distance is set to 5, maximum distance is set to 200, and the distance variance is set to 20, the following models might be possible:
  • El.1 ← 5-25 → El.2
  • El.1 ← 120-140 → El.2
  • El.1 ← 180-200 → El.2

New in FrameWorker 5.4 (May 2007):

Restrictive model search: By default, all models satisfying the given distance variance are listed (even if there are more specific models satisfying the remaining parameters). If the restrictive search option is selected, FrameWorker selects only those models for output having a minimum variation in the distance between elements.
Depending on the input sequences, this can result in less models, especially if a large distance variance is allowed.

Example: If all 6 input sequences contain one match to the model
El.1 ← 30-35 → El.2 (distance variance:  6 bps, common to 6 sequences ( 6 matches, 6 non-overlapping)
the regular search option might find the following model instead which is less specific (more matches in the sequences)
El.1 ← 26-35 → El.2 (distance variance: 10 bps, common to 6 sequences (14 matches, 6 non-overlapping)
These parameters are hidden by default. You can use the reveal box next to the section header to reveal them!
Element constraints Minimum and maximum number of elements in models:

FrameWorker lists only models with the minimum number of elements and stops after models with the maximum number of elements are determined. The default is a minimum of 2 elements and a maximum of 4 elements per model.

Show intermediate models:

Usually FrameWorker shows only the most complex models that are common to the input sequences. If this option is checked, all intermediate (i.e. shorter) models are also listed in the output.

Example: If the most complex model consists of 5 elements all models with 2, 3 and 4 elements are also listed.

Mandatory Elements:

FrameWorker will only show models that contain at least the element(s) selected here. The selected elements can be combined with "and" (ALL elements must be present in models) or "or" (any of the selected elements must be present).

Note: Internally, FrameWorker will search ALL possible element combination and will filter in a second step for the selected mandatory elements.

Example: If the transcription factor families "V$AP1F" and "V$CAAT" are selected here, the result will contain only common frameworks that contain an AP1 site as well as a CCAAT binding factor site (if there are any at all).

Options Maximum number of models:

The number of different models per model length which will be listed in the output can be set by the user (default: 10, maximum: 100).

These parameters are hidden by default. You can use the reveal box next to the section header to reveal them!
Show detailed model matches:

Usually only the model parameters of the frameworks/models found in the sequences are displayed in the FrameWorker results. If you wish to see all matches to the models (i.e. positions of the elements) in your input sequences check this option. The number of matches to each of the FrameWorker derived models in each input sequence is limited (default: 10).

You can also display the sequences of the binding sites included in the models. The sequences are always displayed in 5'->3' direction. If the binding sites are located on the antisense strand the displayed reverse complement sequence begins at the end position of the binding site.

These parameters are hidden by default. You can use the reveal box next to the section header to reveal them!
Determine the p-value of models:

If this option is set, a background promoter sequence set of 5000 human promoters is scanned with the models generated by FrameWorker. The results of this search are used to check whether the models can also be found with a set of randomly selected promoters. The p-value is the probability to obtain an equal or greater number of sequences with a model match in a randomly drawn sample of the same size as the input sequence set. The lower this probability the higher is the specificity of the model.

Note: Determination of the p-value increases the computing time. If many models are found, the result may not be finished before the server timeout is reached and the email option has to be used.

Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!


FrameWorker Output

As an example we use a set of 11 promoters for orthologous rhodopsin genes from 10 species. There is evidence from literature that two transcription factors (NRLF/V$NRLF and CRX/V$HOXF) act synergistically to regulate rhodopsin transcription (MEDLINE: 8939891, MEDLINE: 10887186, MEDLINE: 10984472)

The 11 input sequences from 10 species were selected from Gene2Promoter, where "rhodopsin, rod pigment" was entered as search term and all orthologous genes were selected. The mouse rhodopsin gene has two alternative promoters annotated in ElDorado.

FrameWorker first automatically detects that there are two alternative promoters from mouse, and asks the user to continue with the the regular FrameWorker or the exhaustive analysis. If there are promoter sets (only when the sequences are from Gene2Promoter or Comparative Genomics in ElDorado), an automatic analysis of all promoter sets is also possible).

first selection

When choosing the exhaustive FrameWorker (checking option 2), selecting "70% of loci" as quorum constraint, the following overview will return:

combinations 1

Doing the same analysis with the additional parameter of "V$NRLF" and "V$HOXF" as mandatory elements, the output is reduced to

Please note two main facts:

  1. Only one of the two alternative promoters in mouse seems to fit to the other orthologous promoters (Combination 2), as there are no models found in Combination 1.
  2. The two mandatory elements reduce the found models from more than 250 models with 6 elements (FrameWorker terminates here) to 49 models with six elements that definitely contain both NRLF and HOXF (although they are quite similar).
Clicking on either the combination links or the model names will give you a detailed FrameWorker output.

Here is the example FrameWorker output with a few explanations:


History of changes

New in FrameWorker 5.1 (Dec. 2006):
Introduced a change to avoid double or overlapping models in the output. This may reduce the total number of models compared to FrameWorker 5.0, but the result is essentially the same.

New in FrameWorker 5.0 (Nov. 2006):
FrameWorker now features an additional distance parameter, the maximum distance variation between two elements within a model. This allows setting a higher maximum distance between two elements, but keeping the variation of distances between the sequences within a certain "distance band". This way, generated frameworks are more likely to be biologically relevant because the variation in the distance between the elements is set to an upper limit.

New in FrameWorker 4.6 (Apr. 2006):
FrameWorker allows mandatory elements, i.e. the user can specify one or several elements (i.e. transcription factor binding sites) that must be part of all models in the FrameWorker output.
This requirement is very helpful if it is known from experiments that a certain transcription factor(s) plays a role in the regulation of the input sequences, since it filters the output of FrameWorker.

New in FrameWorker 4.5 (Jan. 2006):
When Genomatix-annotated promoters from Gene2Promoter are submitted to FrameWorker,