![]() |
![]() |
The matrix overview is what you will get if you select the 'single matrices' category for browsing. It lists all matrices included in MatBase in alphabetical order. The screenshot below shows the first few matrices of the list.

| Matrix name: | Lists the names of the matrices in each family. Clicking on the name will take you to a matrix result page. |
|---|---|
| Matrix information: | Similar to the family information, 'Matrix information' is a short description of the specific transcription factor binding site that is found by the matrix. |
| RE: | is the re-value of the matrix, a statistical value explained in detail below. |
| opt: | Lists the optimized threshold of the matrix used by MatInspector. Again, please see the explanation below. |
The screenshot below shows the result for a single matrix. Depending on the amount of information available for a matrix the look be slightly different (e.g. if a matrix has been constructed directly from a weight matrix description instead of an alignment of sites).

| Matrix Name | The MatInspector matrices have an identifier
that indicates one of the following seven groups
|
|---|---|
| Description | Further information for a matrix or matrix family. |
| Family | The matrix family this matrix belongs
to.
Clicking on the family name will take you to the 'family result'. |
| References |
References for the original source of sequences/oligonucleotides or weight matrices used for the construction of the matrix with author, title and citation. Clicking on a reference id will take you to the 'reference result'. |
| Random expectation (re-value) | The re-value for each individual
matrix gives an expectation value for the number of matches per 1,000
base pairs of random DNA sequence (that is, it indicates how well a matrix
is defined).
Since there are binding sites that are biologically quite "loosely" defined, a high re-value is not necessarily a sign of a "bad" matrix description. A very low re-value might even be a sign of a description that is too strict. |
| Promoter matches | The value given is the percentage
of promoters in which a match to the matrix is
found with optimized matrix similarity. In order to determine
the promoter matches, promoter sequences are extracted from ElDorado.
The following promoter sequences are scanned for the different matrix
groups:
|
| Matrix matches | This table contains the absolute number of matches and the number of matches per 1,000 base pairs of the matrix in the genome and promoter sequences for each species listed. Please note that for some species only the numbers for promoters are given, as there is no completely assembled genome yet. |
| Optimized matrix threshold | This matrix similarity is
the optimized value defined in a way that a minimum number of matches
is found in non-regulatory test sequences (i.e. with this matrix similarity
the number of false positive matches is minimized).
This matrix similarity is used when the user checks "Optimized" as the matrix similarity threshold for MatInspector. |
| Length | Length of the matrix in base pairs. All matrices in a family are of the same (uneven) length. |
| Nucleotide Distribution Matrix | The nucleotide distribution matrix shows the nucleotide frequencies observed in aligned binding sites of the corresponding transcription factor. |
| Profile | The profile of a matrix is a graphical
representation of the Ci-vector, i.e. the degree of
conservation at each position of the matrix.
The IUPAC string consensus is
a representation of the matrix based on the following rules (adapted
from Cavener, Nucleic Acids Res. 15, 1353-1361, 1987):
|
| Core | The core sequence of a matrix is defined as the (usually 4) highest conserved, consecutive positions of the matrix. |
| Sequence logo | A graphical representation of the matrix consensus generated using the algorithm described in
|
| Statistical basis | The number of sites the matrix is based on. |
| Sites used to build the matrix |
This shows the alignment of the the sites that have been used to construct the matrix. It shows the names of the sites, the alignment, the matrix similarity score for each site and the reference(s) for the site. The matrix is built from the middle part of the alignment, any heading or trailing nucleotides are discarded in the process. Clicking on a site name will take you to the 'site result'. Clicking on a reference id will take you to the 'reference result'. |
| Sites rejected during matrix definition |
These are sites that have been published as binding sites for a transcription factor but didn't fit in the overall alignment in the matrix construction process. To get a specific weight matrix description these sites are left out. However they are listed here for completeness. Clicking on a site name will take you to the 'site result'. Clicking on a reference id will take you to the 'reference result'. |
| Identical sites (not used for matrix generation) |
These are sites with a sequence identical to one that has already been used in the alignment for the matrix. These sites wouldn't add any information to the matrix and are therefore only listed for completeness. Clicking on a site name will take you to the 'site result'. Clicking on a reference id will take you to the 'reference result'. |
| Ci-vector | The Ci-vector (consensus index vector) for the matrix represents the degree of conservation of each position within the matrix. The maximum Ci-value of 100 is reached by a position with total conservation of one nucleotide, whereas the minimum value of 0 only occurs at a position with equal distribution of all four nucleotides and gaps. |
| © 1998-2011 Genomatix Software GmbH - All rights reserved |