Genomatix-Logo
Overview of Help-Pages
MatBase logo

Matrix family


Matrix families are groups of weight matrices for the same or functionally similar transcription factors. Why are there different weight matrices for the same factor at all? It is reasonable to keep separate matrix descriptions for a factor as they are based on different training data originating from independent publications. If you use MatInspector to search for potential transcription factor binding sites, similar matrices for one factor can lead to multiple matches at the same position or to matches that are only shifted by a few base pairs if the corresponding matrix descriptions are only partially overlapping or differ in length. After identifying all individual matches, MatInspector applies a further step and compares the matches of matrices that belong to the same family resulting in a dramatically reduced output of matches without losing sensitivity. For more information on the matrix family concept please see our publication in Bioinformatics (Cartharius et al. (2005), "MatInspector and beyond: promoter analysis based on transcription factor binding sites", Bioinformatics 21, pp 2933-2942). In MatBase, matrix families also function as the combining element between transcription factors and all the other data, listing additional information like domains, functional modules, tissues, etc.

Matrix family overview

The matrix family overview is what you will get if you select the matrix family category or any of its subcategories (like 'fungi','vertebrates', etc.) for browsing. It lists all available families and the matrices contained in them. The screenshot below shows the first five vertebrate families and their weight matrices.

family overview

So, what is listed in the overview?

Family: Shows the family names. If you click one of them you will be taken to a family result page, as explained below. The color in front of the name is the one that is used for coloring the matches in the graphical representation of MatInspector outputs.
Family Information: Gives a short description, which transcription factors the family is based on.
Matrix name: Lists the names of the matrices in each family. Clicking on the name will take you to a matrix result page.
Matrix information: Similar to the family information, 'Matrix information' is a short description of the specific transcription factor binding site that is found by the matrix.
RE: is the re-value of the matrix, a statistical value explained in detail in the 'Matrix'-section.
opt: Lists the optimized threshold of the matrix used by MatInspector. Again, please see the 'Matrix' section for a detailed explanation.

Matrix family result

The Matrix family result page shows the information for one specific family. Below is a screenshot for the 'V$BRNF' family. Depending on the available data some parts of the page might be missing or look different for other families (e.g. if a family is not part of a promoter module, 'Modules' will be absent).

family result

Information on the result page:

Family Name The family name is a unique identifier for the family. The first letter denotes the section the family belongs to. Currently these sections are available in Matbase: fungi (F$), insects (I$), plants (P$), vertebrates (V$), other (O$) and miscellaneous, which is made up from bacterial (B$), nematode (N$) and viral (W$) matrices. While the matrices listed in the 'miscellaneous' section are based on transcription factors or similar proteins, the matrices from the 'others' section can be used to find other regulatory DNA patterns, like poly-A- or initiator signals. After the separating '$' there is a 4 letter acronym describing the family.
Description A more human readable description of the family usually consisting of a quick overview of the transcription factors the matrix family is based on.
Transcription factors Lists genes for transcription factors represented by the family (i.e. factors binding to DNA patterns described by one or more of the matrices of the family). They are grouped by organism and identified by the official symbol from NCBI's EntrezGene database. Clicking on a symbol will take you to the 'transcription factor result' for the gene. For additional information on the gene, you can directly go to the 'More gene info'-page of ElDorado or to the EntrezGene page for the gene. For some plant genes there are also links to the Mendel Genome Database.
Binding domains Lists the binding domain(s) common to the transcription factors represented by the family. Clicking on the name of a domain will take you to the 'binding domain result'.
GO term(s) Lists all Gene Ontology term(s) associated with at least one transcription factor that is represented by the family. Clicking on any of the terms will take you to the 'GO term result'. General GO terms like "transcription" or "transcription regulation" have been removed from this list, as they obviously add no information here.
Tissues For most of the vertebrate families there is also information on the tissues that are associated with the transcription factors represented by the matrix family. The tissue association has been determined by evaluation of all PubMed abstracts (co-citations of transcription factors and tissues).

The transcription factors are grouped into three tissue classes:
  • ubiquitous: transcription factors that are expressed in all tissues.
  • non exclusively associated: transcription factors that are associated with the tissues listed but not exclusively, they may be expressed also in other tissues.
  • preferentially associated: transcription factors that are expressed preferentially in the tissues listed.

Clicking on a tissue name will take you to the 'tissue result'.
Promoter matches The value given is the percentage of promoters in which a match to the matrix family is found with optimized matrix similarity. In order to determine the promoter matches, promoter sequences are extracted from ElDorado. The following promoter sequences are scanned for the different matrix groups:

Starting with MatBase 10.0

  • Vertebrates: 375,000 human, mouse, and rat promoter sequences with an average length of 1184 bp
  • General Core Promoter Elements: 375,000 human, mouse, and rat promoter sequences with an average length of 1184 bp
  • Plants: 82,000 promoter sequences of Arabidopsis thaliana and rice with an average length of 1159 bp
  • Insects: 21,000 promoter sequences of Drosophila melanogaster with an average length of 1120 bp
  • Fungi: 11,800 yeast promoter sequences with an average length of 1105 bp

Up to MatBase 9.4

  • Vertebrates: 366,000 human, mouse, and rat promoter sequences with an average length of 661 bp
  • General Core Promoter Elements: 366,000 human, mouse, and rat promoter sequences with an average length of 661 bp
  • Plants: 70,000 promoter sequences of Arabidopsis thaliana and rice with an average length of 625 bp
  • Insects: 21,000 promoter sequences of Drosophila melanogaster with an average length of 617 bp
  • Fungi: 11,800 yeast promoter sequences with an average length of 603 bp

Match numbers This table contains the absolute number of matches and the number of matches per 1,000 base pairs of the matrix in the genome and promoter sequences for each species listed. Please note that for some species only the numbers for promoters are given, as there is no completely assembled genome yet.
Length Length of the matrix family in base pairs. All matrices in a family are of the same (uneven) length.
Matrices Shows all the matrices belonging to the family including the Re-value, the optimized threshold and the statistical basis of the matrix. Clicking on the matrix name will take you to the 'matrix result'.
Re-value is the random expectation-value of the matrix, a statistical value explained in detail in the 'Matrix'-section.
Opt Lists the optimized threshold of the matrix used by MatInspector. Please see the 'Matrix' section for a detailed explanation.
(aligned) Matrix iupac(s) Shows a multiple alignment of the consensus sequences of all matrices belonging to the family. The anchor position used for marking the position in MatInspector is printed in boldface.
Modules Experimentally verified promoter modules that include this matrix family as one element.

Promoter modules are functional elements consisting of at least two transcription factor binding sites which are shown to act synergistically or antagonistically.

Clicking on the module name will take you to the 'module result'.

Function Explains the specific function of the listed module.

[go back to MatBase overview]