Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

FrameWorker: Definition of common framework


[Introduction] [Input and Parameters] [Output] [Example]

Introduction

FrameWorker is a complex software tool that allows users to extract a common framework of elements from a set of DNA sequences. These elements are usually transcription factor binding sites since this tool is designed for the comparative analysis of promoter sequences (e.g. inter-species analysis).

Single elements -> Organizational framework
From single elements to common framework

FrameWorker returns the most complex models that are common to the input sequences (satisfying the user parameters). Models/frameworks are defined as all elements (TF sites) that occur in the same order and in a certain distance range in all (or a subset of) the input sequences. The resulting models can be saved in the user-directory and subsequently can be used to scan any set of sequences.

Note that FrameWorker uses strand-specific TF sites for building models: i.e. if the number of sequences with a TF site on one strand (sense OR antisense) is above quorum, then the TF is considered a "common element" and can be used to build potential frameworks.

If no frameworks are found in a set of sequences, the quorum constraint or the distance constraints should be lowered (see parameter section below)
If too many frameworks are found, the quorum constraint or the distance constraints could be raised or mandatory elements should be selected (e.g. if known from biological data).


!! Warning:
If the majority of input sequences is highly homologous, the resulting models will not be significant! E.g. two promoter sequences from mouse and human that are 95% homologous will probably yield frameworks that do not necessarily contain the relevant (i.e. functional) sites. In this case, FrameWorker will display a warning in the output.

Please note that framework and model are used synonymous here.


FrameWorker Input Data

Sequence Selection
Sequence data The program expects a set of DNA sequences which are the basis for extraction of common elements. These sequences can be supplied in either one of the following formats: You can enter your correctly formatted sequence(s) directly into the form, e.g. with copy and paste, or upload a file containing the sequences.
Library selection
Matrix group The selection here corresponds to the MatInspector settings.
The sequences are scanned for matches to the selected MatInspector matrices. The matches found are used as basic elements for the extraction of a common framework.

Note: The lower the core and matrix similarities the more basic elements can be used for extraction of a framework. The trade off may be very unspecific resulting models.

Matrix filters Matrices used for the analysis can be filtered by tissues as described here
FrameWorker Parameters
Quorum constraint
for framework
The lower limit of sequences within the input set that has to contain the common framework. Default is the absolute number of sequences that corresponds to at least 85% of the input sequences.
Sequence constraints If mandatory sequences are selected here, FrameWorker will only show models that are found in all selected sequences (and of course, fulfilling all other criteria). Up to ten mandatory sequences can be selected.
Distance constraints
for framework
Minimum and maximum distance between two elements:

These values denote the minimum and maximum distances between the anchors of two elements of a resulting model. Only basic elements that occur in the given distance range are considered for inclusion in a framework.

Maximum distance variation between two elements:

This parameter sets the maximum possible variation of distances between the anchors of the model elements. A framework satisfies the parameter if the distances between the instances of the model in the input sequences do not differ more than the distance variation parameter. For promoters, a distance variation of 20-30 basepairs is usually sufficient.

Example: If the minimum distance is set to 5, maximum distance is set to 200, and the distance variation is set to 20, the following models might be possible:
  • El.1 ← 5-24 → El.2
  • El.1 ← 120-139 → El.2
  • El.1 ← 181-200 → El.2

Restrictive model search: By default, all models satisfying the given distance variation are listed (even if there are more specific models satisfying the remaining parameters). If the restrictive search option is selected, FrameWorker selects only those models for output having a minimum variation in the distance between elements.
Depending on the input sequences, this can result in less models, especially if a large distance variation is allowed.

Example: If all 6 input sequences contain one match to the model
El.1 ← 30-35 → El.2 (distance variation:  6 bps, common to 6 sequences ( 6 matches, 6 non-overlapping)
the regular search option might find the following model instead which is less specific (more matches in the sequences)
El.1 ← 26-35 → El.2 (distance variation: 10 bps, common to 6 sequences (14 matches, 6 non-overlapping)
These parameters are hidden by default. Clicking on will reveal them.
Element constraints Minimum and maximum number of elements in models:

FrameWorker lists only models with the minimum number of elements and stops after models with the maximum number of elements are determined. If no model with the given minimum number of elements is found, the largest found models are listed in the output (e.g. if min=6 and max=6, but no 6-element-models are found, the program lists 4-element-models, if those are the most complex models).
The default is a minimum of 2 elements and a maximum of 4 elements per model.

Show intermediate models:

Usually FrameWorker shows only the most complex models that are common to the input sequences. If this option is checked, all intermediate (i.e. shorter) models are also listed in the output.

Example: If the most complex model consists of 5 elements all models with 2, 3 and 4 elements are also listed.

Mandatory Elements:

FrameWorker will only show models that contain at least the element(s) selected here. The selected elements can be combined with "and" (ALL elements must be present in models) or "or" (any of the selected elements must be present).

Note: Internally, FrameWorker will search ALL possible element combination and will filter in a second step for the selected mandatory elements.

Example: If the transcription factor families "V$AP1F" and "V$CAAT" are selected here, the result will contain only common frameworks that contain an AP1 site as well as a CCAAT binding factor site (if there are any at all).

Options Maximum number of models:

The number of different models per model length which will be listed in the output can be set by the user (default: 10, maximum: 100).

These parameters are hidden by default. Clicking on will reveal them.
Show detailed model matches:

Usually only the model parameters of the frameworks/models found in the sequences are displayed in the FrameWorker results. If you wish to see all matches to the models (i.e. positions of the elements) in your input sequences check this option. The number of matches to each of the FrameWorker derived models in each input sequence is limited (default: 10).

You can also display the sequences of the binding sites included in the models. The sequences are always displayed in 5'->3' direction. If the binding sites are located on the antisense strand the displayed reverse complement sequence begins at the end position of the binding site.

These parameters are hidden by default. Clicking on will reveal them.
Determine the p-value of models:

If this option is set, a background promoter sequence set of 5000 human promoters is scanned with the models generated by FrameWorker. The results of this search are used to check whether the models can also be found with a set of randomly selected promoters. The p-value is the probability to obtain an equal or greater number of sequences with a model match in a randomly drawn sample of the same size as the input sequence set. The lower this probability the higher is the specificity of the model.

Note: Determination of the p-value increases the computing time. If many models are found, the result may not be finished before the server timeout is reached and the email option has to be used.

Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!


FrameWorker Output

Generally, the output of a regular FrameWorker analysis is divided into the following sections:

Note: for the difference of a regular FrameWorker or the exhaustive analysis see the example below.

Model Overview

This table contains an overview of how many single elements and models were found to be common to the input sequences (depending on the quorum). If mandatory elements were selected, the number of checked models is also given.
Here a total of 54 frameworks with four elements were found, 12 of them containing both V$AP1R and V$BCDF (which were set as mandatory elements, see example below). A total of 8 five-element-models was found, but none of those contained both mandatory elements.

model overview

Graphical view of sites in all found models

Here, the common elements as well as the found models are displayed graphically:

overview_graphics

Model details

Here, all complex models found are listed. Only the models with most elements are given here (e.g. if the most complex model consists of 6 elements, the models with 2, 3, 4, or 5 elements are not listed, unless the parameter "Intermediate models" was selected).
For each complex model, its elements, their order and their relative distances are displayed. The similarity and distance values show the range occurring in the input sequences. If the option "Determine p-value of models" is set, the p-value of the model is shown additionally and the models are sorted according to this score.

A click on the model name will take you directly to the graphics with the view of the corresponding model matches.

In the example the most complex models consist of four elements, here the first model of 12 is shown:

overview_graphics

Each model can be saved by clicking the checkbox and supplying a name for the model. It can then later be used with other Genomatix programs (e.g. to search complete databases for matches to this user-defined model).



If the option "Show detailed model matches" was selected, an extra table with the model matches per sequence is displayed. A model can match more than once in a sequence.
Each element (El.1, El.2, ...) is characterized by its matrix name, position on the sequence, its matrix score and strand orientation. The distance between two elements is also given. An entry in the overlap column indicates that this match does overlap in at least one element with another match from the same sequence.

In this case, nine of 14 input sequences contain exactly one match to the framework/model consisting of 4 elements. Four of the 14 sequences contain no match to the model.

Matches:

SequenceEl. 1Dist.El. 2Dist.El. 3Dist.El. 4Overlap
GXP_3141524(RHO/rabbit) Oryctolagus cuniculus (945 bp) V$DLX5.01
370-388 (379/0.93/+)
5V$DLX3.01
375-393 (384/0.94/-)
24V$CRX.01
400-416 (408/0.99/-)
38V$MARE.01
436-456 (446/0.99/+)
-
GXP_2320175(RHO/zebra_finch) Taeniopygia guttata (996 bp) ---
GXP_617604(opn1mw4/zebrafish) Danio rerio (610 bp) ---
GXP_3382808(rho/western_clawed_frog) Xenopus tropicalis (1011 bp) V$DLX5.01
419-437 (428/0.94/+)
5V$DLX3.01
424-442 (433/0.94/-)
23V$CRX.01
448-464 (456/0.95/-)
37V$MARE.01
483-503 (493/1.00/+)
-
V$DLX5.01
419-437 (428/0.94/+)
5V$DLX3.01
424-442 (433/0.94/-)
23V$CRX.01
448-464 (456/0.95/-)
41V$VMAF.01
487-507 (497/0.86/+)
yes
GXP_3870201(RHO/cow) Bos taurus (944 bp) V$DLX5.01
370-388 (379/0.93/+)
5V$DLX3.01
375-393 (384/0.94/-)
24V$CRX.01
400-416 (408/0.98/-)
38V$MAFB.01
436-456 (446/0.85/+)
-
GXP_438462(Rho/mouse) Mus musculus (601 bp) ---
GXP_1998132(RHO/pig) Sus scrofa (955 bp) V$DLX5.01
380-398 (389/0.93/+)
5V$DLX3.01
385-403 (394/0.94/-)
23V$OTX1.01
409-425 (417/0.99/-)
38V$MAFB.01
445-465 (455/0.85/+)
-
GXP_231880(RHO/dog) Canis lupus familiaris (986 bp) V$DLX5.01
411-429 (420/0.93/+)
5V$DLX3.01
416-434 (425/0.94/-)
24V$CRX.03
441-457 (449/0.96/-)
38V$MAFB.01
477-497 (487/0.85/+)
-
GXP_1215006(RHO/chimp) Pan troglodytes (946 bp) V$DLX5.01
374-392 (383/0.93/+)
5V$DLX3.01
379-397 (388/0.94/-)
24V$CRX.01
404-420 (412/0.99/-)
38V$MAFB.01
440-460 (450/0.84/+)
-
GXP_4214965(RHO/chicken) Gallus gallus (1105 bp) ---
GXP_1043413(RHO/rhesus_monkey) Macaca mulatta (983 bp) V$DLX5.01
412-430 (421/0.93/+)
5V$DLX3.01
417-435 (426/0.94/-)
23V$OTX2.01
441-457 (449/0.99/-)
38V$MAFB.01
477-497 (487/0.84/+)
-
GXP_19602(Rho/rat) Rattus norvegicus (946 bp) V$DLX1.02
382-400 (391/0.97/+)
5V$DLX1.02
387-405 (396/0.97/-)
22V$CRX.01
410-426 (418/0.99/-)
34V$MAFA.01
442-462 (452/0.99/+)
-
GXP_950626(LOC100015632/opossum) Monodelphis domestica (1042 bp) V$DLX5.01
452-470 (461/0.93/+)
5V$DLX3.01
457-475 (466/0.93/-)
22V$CRX.03
480-496 (488/0.96/-)
38V$MARE.01
516-536 (526/0.99/+)
-
GXP_1345759(LOC100056641/horse) Equus caballus (945 bp) V$DLX5.01
373-391 (382/0.93/+)
5V$DLX3.01
378-396 (387/0.94/-)
23V$CRX.01
402-418 (410/0.98/-)
38V$MAFB.01
438-458 (448/0.85/+)
-


Common elements

This table gives all single elements (i.e. transcription factor binding sites) that were found to be common to the input sequences (depending on the quorum). These elements make up the basis of which the more complex models can built.
overview_graphics

FrameWorker Example

As an example we use a set of 16 promoters for orthologous rhodopsin genes from 15 species. There is evidence from literature that two transcription factors (NRL/V$AP1R and CRX/V$BCDF) act synergistically to regulate rhodopsin transcription (MEDLINE: 8939891, MEDLINE: 10887186, MEDLINE: 10984472)

The 16 input sequences from 15 species were selected from Gene2Promoter, where "rhodopsin" was entered as search term and all orthologous genes were selected. The mouse rhodopsin gene has two alternative promoters annotated in ElDorado.

FrameWorker first automatically detects that there are two alternative promoters from mouse, and asks the user to continue with the the regular FrameWorker or the exhaustive analysis. If there are promoter sets (only when the sequences are from Gene2Promoter or Comparative Genomics in ElDorado), an automatic analysis of all promoter sets is also possible).

first selection

When choosing the exhaustive FrameWorker (checking option 2), selecting "70% of loci" as quorum constraint, the following overview will return:

combinations 1

Doing the same analysis with the additional parameter of "V$AP1R" and "V$BCDF" as mandatory elements, the output is reduced to

Please note: The two mandatory elements reduce the found models from more than 43 models with six elements to 4 models with six elements that definitely contain both BCDF and AP2R (although they are quite similar).

Clicking on either the combination links or the model names will give you a detailed FrameWorker output as explained above.