Genomatix-Logo
Overview of Help-Pages
MatInspector-Logo

Available Matrix Information


Matrix Name: The MatInspector matrices have an identifier that indicates one of the following seven groups
  • vertebrates (V$)
  • insects (I$)
  • plants (P$)
  • fungi (F$)
  • nematodes (N$)
  • bacteria (B$)
  • general core promoter elements (O$)
followed by an acronym for the factor the matrix refers to, and a consecutive number discriminating between different matrices for the same factor. Thus, V$OCT1.02 indicates the second matrix for vertebral Oct-1 factor.
Description: Further information for a matrix or matrix family.
Family: Each matrix belongs to a so-called matrix family, where functionally similar matrices are grouped together, eliminating redundant matches by MatInspector (if the family option was selected).

E. g. the matrix family V$NFKB includes 5 similar matrices for NFkappaB (V$NFKAPPAB.01, V$NFKAPPAB.02, V$NFKAPPAB.03, V$NFKAPPAB50.01, V$NFKAPPAB65.01) as well as 1 matrix for the NFkappaB related factor c-Rel (V$CREL.01).

Transcription factors: List of transcription factors that are represented by this matrix family.

The following information is given for each transcription factor:

  • organism (e.g. Homo sapiens, Mus musculus)
  • official gene symbol
  • GeneID (linked to a site where more information on the transcription factor is provided (e.g. Entrez Gene, TAIR)
Tissues: Tissues that are associated with the transcription factors represented by the matrix family. The tissue association has been determined by evaluation of all PubMed abstracts (co-citations of transcription factors and tissues).

The transcription factors are grouped into three tissue classes:

  • ubiquitous: transcription factors that are expressed in all tissues.
  • non exclusively associated: transcription factors that are associated with the tissues listed but not exclusively, they may be expressed also in other tissues.
  • preferentially associated: transcription factors that are expressed preferentially in the tissues listed.

List of available tissues:

Adipose Tissue Adrenal Glands Antibody-Producing Cells Antigen-Presenting Cells
Bladder Blastomeres Blood Cells Blood Platelets
Bone Marrow Cells Bone and Bones Brain Breast
Cardiovascular System Cartilage Central Nervous System Connective Tissue
Digestive System Ear Embryonic Structures Endocrine System
Erythrocytes Eye Germ Cells Granulocytes
Heart Hematopoietic System Hemocytes Immune System
Integumentary System Islets of Langerhans Kidney Leukocytes
Leydig Cells Liver Lung Luteal Cells
Lymphocytes Monocytes Muscle, Skeletal Muscle, Smooth
Muscles Myeloid Cells Myocardium Nervous System
Neuroglia Neurons Nose Ovary
Pancreas Parathyroid Glands Phagocytes Pineal Gland
Pituitary Gland Prostate Respiratory System Skeleton
Spinal Cord Testis Thymus Gland Thyroid Gland
Ubiquitous Urogenital System

Tissues are assigned to matrix families, not individual matrices. The tissue associations of matrix families are determined by automatic evaluation of all PubMed abstracts (co-citations of transcription factors and tissues) and subsequent manual curation.

Note: Up to now, tissue association is only available for vertebrate matrices.

Modules: Experimentally verified promoter modules that include this matrix family as one element.

Promoter modules are functional elements consisting of at least two transcription factor binding sites which are shown to act synergistically or antagonistically.

References: References for the original source of sequences/oligonucleotides for the matrix with author, title and citation.
Statistical basis: Shows more information about how many sequences/oligonucleotides were used to create the matrix description.
Random expectation (re-value): The re-value for each individual matrix gives an expectation value for the number of matches per 1000 base pairs of random DNA sequence (that is, it indicates how well a matrix is defined).

Since there are binding sites that are biologically quite "loosely" defined, a high re-value is not necessarily a sign of a "bad" matrix description. A very low re-value might even be a sign of a description that is too strict.

Promoter matches: The value given is the percentage of promoters in which a match to the matrix / matrix family is found with optimized matrix similarity. In order to determine the promoter matches, promoter sequences are extracted from ElDorado. The following promoter sequences are scanned for the different matrix groups:
  • Vertebrates: 366,000 human, mouse, and rat promoter sequences with an average length of 661 bp
  • General Core Promoter Elements: 366,000 human, mouse, and rat promoter sequences with an average length of 661 bp
  • Plants: 70,000 promoter sequences of Arabidopsis thaliana and rice with an average length of 625 bp
  • Insects: 21,000 promoter sequences of Drosophila melanogaster with an average length of 617 bp
  • Fungi: 11,800 yeast promoter sequences with an average length of 603 bp

Optimized matrix threshold: This matrix similarity is the optimized value defined in a way that a minimum number of matches is found in non-regulatory test sequences (i.e. with this matrix similarity the number of false positive matches is minimized).

This matrix similarity is used when the user checks "Optimized" as the matrix similarity threshold for MatInspector.

Length: Length of the matrix / matrix family in base pairs. All matrices in a family are of the same (uneven) length.
Anchor: All matrices in a family are of the same (uneven) length and have an anchor position assigned which is the center position of the matrix. This assures that matrices of a family match exactly at the same position.
Nucleotide Distribution Matrix: The nucleotide distribution matrix shows the nucleotide frequencies observed in aligned binding sites of the corresponding transcription factor.
Profile: The profile of a matrix is a graphical representation of the Ci-vector, i.e. the degree of conservation at each position of the matrix.

The IUPAC string consensus is a representation of the matrix based on the following rules (adapted from Cavener, Nucleic Acids Res. 15, 1353-1361, 1987):

  • A single nucleotide (A,C,G,T) is shown if its frequency is greater than 50% and at least twice as high as the second most frequent nucleotide.
  • A double-degenerate code (R,Y,K,M,S,W) indicates that the corresponding two nucleotides occur in more than 75% of the underlying sequences but each of them is present in less than 50%.
  • A triple-degenerate code (B,D,H,V) is shown if only one of the nucleotides does not appear at all.
  • All other frequency distributions are represented by the letter "N".
Core: The core sequence of a matrix is defined as the (usually 4) highest conserved, consecutive positions of the matrix.
Ci-vector: The Ci-vector (consensus index vector) for the matrix represents the degree of conservation of each position within the matrix. The maximum Ci-value of 100 is reached by a position with total conservation of one nucleotide, whereas the minimum value of 0 only occurs at a position with equal distribution of all four nucleotides and gaps.
Sequence logo A graphical representation of the matrix consensus generated using the algorithm described in
  • Crooks GE, Hon G, Chandonia JM, Brenner SE (2004).
    WebLogo: A sequence logo generator
    Genome Research, 14,1188-90
  • Schneider TD, Stephens RM (1990).
    Sequence Logos: A New Way to Display Consensus Sequences
    Nucleic Acids Res. 18, 6097-100
The weblogo source code is available at http://weblogo.threeplusone.com/

Further information:

For further reading please refer to the MatInspector publications.