Genomatix-Logo
Overview of Help-Pages
MatInspector-Logo

Background on MatInspector Algorithm

MatInd: Creation of a Matrix

The program MatInd constructs a description for a consensus (e.g. of a transcription factor binding site) which consists of

MatInd employs an alignment algorithm based on the method described by Frech et al. and creates the nucleotide distribution matrix by counting the bases at each position of the alignment.

The Ci-vector is constructed by calculating the Ci-value for each position i of the matrix:

[1] Ci(i) = (100 / ln5) * ( sum(P(i,b) * ln P(i,b)) + ln5)

where

This Ci-vector represents the conservation of the individual nucleotide positions in the matrix in numerical values and is used by MatInspector:

Ci=100 a position with total conservation of one nucleotide
Ci=0 a position with equal distribution of all four nucleotides and gaps

MatInd also defines a core region within the matrix which is represented by the four consecutive nucleotide positions with the highest Ci-sum. This core region of the matrix is used by MatInspector to preselect potential matches.


MatInspector Library

MatInspector's large library (>600) of transcription factor binding site matrices was created with MatInd and has been compiled on the basis of published matrices with emphasis on sequences with experimentally verified binding capacity.

The MatInspector library also includes information on


MatInspector: Search for Matrix matches

MatInspector uses

to scan sequences of unlimited length for matches to the consensus matrix description.

  1. The search starts with an optional preselection in which only matches to the core region are considered. This reduces the total number of matches and simultaneously accelerates the performance of the program.

    The core similarity is calculated for each position of the sequence:

    [2] core_sim = (sum( score(b,j))) / (sum(max_score(j)))

    where

  2. The matrix similarity is calculated only if the core similarity reaches an user defined threshold (core similarity):

    [3] mat_sim = (sum(Ci(j)*score(b,j)))/(sum(Ci(j)*max_score(j)))

    where

    matrix similarity = 1 only if the candidate sequence corresponds to the most conserved nucleotide at each position of the matrix.

    Multiplying each score with the Ci-value emphasizes the fact that mismatches at less conserved positions are easier tolerated than mismatches at highly conserved positions.

    The output of MatInspector consists of those matches that reach the user-defined minimum core and matrix similarity. Optionally the optimized matrix threshold for each matrix can be used as cut-off criterion.

  3. MatInspector applies a further step and compares the matches of matrices that belong to the same family. The program only lists the best match of a number of overlapping matches of a family in the output.

Further information:

For further reading please refer to the MatInspector publications.