Genomatix-Logo
Overview of Help-Pages
MatInspector-Logo

Search for common TF sites in multiple sequences


[Introduction] [Input] [Parameters] [Output] [Java Graphics]

Introduction

Display of transcription factor binding sites common to a set of sequences. The sequences are scanned for matches to TF binding sites by MatInspector. The common sites are displayed graphically and in table-form. The length of all input sequences combined must not exceed 1 million basepairs.


Input

For the search for common TF sites, the same sequence input options as in MatInspector are available, except for the database search option.


Parameters

Library selection
Libraries Available libraries are the MatInspector weight matrix library of transcription factor binding sites and a plant IUPAC library based on PLACE.
Matrix Search Parameters
Matrix / IUPAC parameters

Depending on the type of the selected library further search parameters like matrix group or matrix filters can be entered.

The parameters correspond to the MatInspector matrix parameters or IUPAC parameters.

Matrix filters Matrices used for the analysis can be filtered by the tissues they are associated with. Just select one or more tissues (e.g. blood cells or liver) from the list. The tissue associations can be viewed by using the link "Show all tissue associations".

List of available tissues:

Adipose Tissue Adrenal Glands Antibody-Producing Cells Antigen-Presenting Cells
Bladder Blastomeres Blood Cells Blood Platelets
Bone Marrow Cells Bone and Bones Brain Breast
Cardiovascular System Cartilage Central Nervous System Connective Tissue
Digestive System Ear Embryonic Structures Endocrine System
Erythrocytes Eye Germ Cells Granulocytes
Heart Hematopoietic System Hemocytes Immune System
Integumentary System Islets of Langerhans Kidney Leukocytes
Leydig Cells Liver Lung Luteal Cells
Lymphocytes Monocytes Muscle, Skeletal Muscle, Smooth
Muscles Myeloid Cells Myocardium Nervous System
Neuroglia Neurons Nose Ovary
Pancreas Parathyroid Glands Phagocytes Pineal Gland
Pituitary Gland Prostate Respiratory System Skeleton
Spinal Cord Testis Thymus Gland Thyroid Gland
Ubiquitous Urogenital System

Tissues are assigned to matrix families, not individual matrices. The tissue associations of matrix families are determined by automatic evaluation of all PubMed abstracts (co-citations of transcription factors and tissues) and subsequent manual curation.

Note: Up to now tissue filtering is only available for vertebrate matrices.

TF sites common to The lower limit of sequences within the input set that has to contain the common TF sites. Default is the absolute number of sequences that corresponds to at least 85% of the input sequences.
Output format Usually, output consists of a graphical and a tabular display of the common TF sites. Using this parameter, you can avoid that the graphics are displayed. This can be useful e.g. if your computer is not equipped with too much working memory. However, if the number of input sequences is larger than 50, always only the table is displayed (the graphics would take too long to load and use up too much resources on your computer). You can NOT circumvent this behavior by setting this parameter.

Common TF binding sites

Transcription factor binding sites common to the set of input sequences are displayed as an interactive graphics including a match summary table

NOTE: If the input consists of more than 50 sequences, only the table is displayed. You can also omit the graphical output by choosing the "table only" option of the output format parameter.

This interactive graphic page is implemented as a Java applet, running within your web browser requiring a Java plug-in (also see our technical FAQs).

Java graphics:

common transcription factors

The Java graphic consists of four parts:

The main sequence panel

The black and gray patterned line represents the sequence. Each scale line corresponds to 50 basepairs.

Each sequence is preceded by a colored box containing information on the sequence, in particular (top to bottom):

The color of the box also represents the organism. Please note that possibly not all this information is available for all of your input sequences. If e.g. no organism information can be provided, the box comes in a default color.
The sequences are vertically aligned along their start positions.

Additionally, the line is flanked by red numbers indicating the currently visible portion of the sequence, i.e. the index of the first and last base pair respectively. Note that indices are absolute sequence positions, i.e. the positions range from '1' to the length of the sequence.

matrix matches Each matrix match is represented by a half round symbol. Matches which were found on the positive or negative strand reside on top or below the sequence line. There is one color for each matrix family, i.e. matches which are caused by matrices of the same family are painted in the same color.

The different colors of the symbols indicate the different matrix (resp. IUPAC) families. Note that matches caused by user-defined matrices are represented by a different symbol (a square instead of the half-round shape).

match annotation Left-clicking on a matrix match symbol will show a small display window containing information on the specified match: The name of the matrix, the matrix family to which it belongs, the position of the matrix match (relative to the sequence), the matrix threshold, and the nucleotide sequence which was matched. The annotation window is also a hyperlink. Following the link will open a new browser window showing further information on the matrix family.
Moving the mouse pointer out of (or clicking on) a highlighted matrix match symbol will close the information display. Additionally, the symbol (of the matrix match) is put into the background. This is helpful as the matrix matches often overlap each other and can even be completely covered.

The arrow symbol on the sequence transcription start region stands for a transcription start site (TSS) or putative TSS. Please note that there can be several (or none) TSS for one sequence.

The navigation panel

The features of this panel are described here.

The toolbar including filter options

Next to the buttons for the basic features of the Java graphics toolbar you can find a button Show Match Summary Table for displaying the match summary table

The remaining buttons are for filtering of the matrix/IUPAC matches: You can filter the matrix matches by threshold, occurrence and family.

Each matrix is provided with a search threshold, which is one of the adjustable Matrix Parameters. When a matrix matches a sequence, the level of similarity is expressed as a number called the "matrix similarity" or "matrix threshold". This number can be larger or equal, but never smaller than the search threshold for the matrix.
treshold You can filter the matches by their matrix similarity relative to the search threshold by changing the value of the text field. Simply left-click the up- or down arrow on the right-hand side of the text field to increase or decrease the value by 0.01. The value "+/- 0" (which is the initial value) stands for the search threshold. The displayed value is a cutoff threshold, which means that all matches with at least this threshold are displayed. The possible values for this text field range from "+/- 0" to "+0.05".

Whenever there is more than one sequence you can also filter by the occurrence of families. That means you can specify that only those matches are displayed whose corresponding matrices belong to a family which is found on at least the displayed number of sequences. You can change the number by left-clicking on the arrows. The single arrows will increase respectively decrease the number by 1, the double arrows by 5. The largest possible value is the total number of sequences. The smallest possible value is determined by the above-mentioned pre-filtering and lies between 1 and the total number of sequences (in the latter case, no filtering by occurrence is possible). The initial value is the smallest possible number.

The "Select all" and "Deselect all" buttons can be used to check, resp. uncheck all checkboxes for the matrix/IUPAC families at once.

The checkbox list

Using the checkboxes, it is possible to filter by family. Only those matches are visible, whose matrix belongs to a selected family. Each checkbox has a border of the same color as the corresponding matrix/IUPAC match symbols.
If a checkbox and the associated family name appear transparent, this is an indication that the corresponding matches are currently not visible because they do not satisfy the current conditions regarding occurrence and/or threshold.
To facilitate the selection, one can use the "Select all" and "Deselect all" buttons from the toolbar.

Note that the visibility of a matrix match is controlled by the conjunctive combination of the filters. Thus, for a matrix match symbol to be displayed, the corresponding matrix family's checkbox must be "checked", there must be family matches on at least the specified number of sequences and the match itself must have a similarity of at least the chosen threshold.


Match summary table

You can display the match summary table by clicking the Show Match Summary Table button from the tool bar. The table shows the list of common TF sites, in how many sequences they occur and how often they match in each input sequence. Additionally, a significance value (p-value) is given for each common TF site.

Example:

match summary table

Clicking a column header will resort the table by the values of this column, i.e. by


Export to Excel format

The common transcription factor binding sites identified by MatInspector can be exported to a tab-delimited file by using one of the "Export" buttons below the match summary table. The files are saved to your local disk and can be opened directly with Microsoft Excel.

Note: With both export options the common TF sites identified originally will be exported. Filtering in the graphics has no influence on the matches that are exported to Excel.