Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

"Weight Matrix"
or "Nucleotide Distribution Matrix"


[Aligned sequences] [Profile of a weight matrix] [Weight matrix] [IUPAC]
[GEMS Launcher main menu]

Weight matrices are selective descriptions of DNA patterns. They are based on the nucleotide distribution observed in aligned DNA sites. Weight matrices are fully automatically generated using MatDefine. The Genomatix program MatInspector scans genomic sequences for matches to such weight matrices. Both tools are integrated into GEMS Launcher.

Aligned sequences of a transcription factor binding site (SRF)
Sequence NamePositionStr.AlignmentMatrix Similarity
MMTFEZIF2
MMTFEZIF1
HSACTCA2
XLACTCAG3
EBV
GGACAREG1
GGACAREG2
HSACTBPR
HSVLC1
MMCYR61G
XLACTIN8A
XLACTIN5A
XLACTCAG1
HSACTCA3
HSACTCA4
XLACTCAG2
MMCFOS
HSFOS
MMKROX1
MMTFEZIF
4 - 23
4 - 23
4 - 23
4 - 23
4 - 23
4 - 23
3 - 22
3 - 22
4 - 23
4 - 23
4 - 23
4 - 23
3 - 22
3 - 22
3 - 22
3 - 22
3 - 22
3 - 22
3 - 22
3 - 22
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
(+)
CG CCAT ATAAGGAGCAGGAA
CG CCTT ATATGGAGTGGCCC
GA CCAA ATAAGGCAAGGTGG
TA CCAA ATAAGGGCAGGCTG
AG CCAT ATGTGGACAGATGG
CG CCTT CTTTGGGCAGCGCG
AC CCAA ATATGGCGACGGCC
GT CCTT ATATGGACTCATCT
AT CCTT TTATGGCCCTGTCC
AC CCAA ATATGGAAATATTG
GC CCAT ATTTGGCGATCTTC
GC CCAT ATTTGGCGATCTTC
AT CCCT ATTTGGCCATCCCT
CT CCCT ATTTGGCCATCCCC
TT CCTT ACATGGTCTGGGGG
TT CCAT ACATGGGCTAAGGG
GT CCAT ATTAGGACATCTGC
GT CCAT ATTAGGACATCTGC
GT CCAT ATATGGGCAGCGAC
TC CCAT ATATGGCCATGTAC
0.855
0.901
0.865
0.872
0.941
0.857
0.904
0.914
0.866
0.897
0.955
0.955
0.926
0.940
0.843
0.854
0.945
0.945
0.978
0.988
Profile of the nucleotide distribution matrix
100.0
75.0
50.0
25.0

IUPAC:
n n C C A T a t w t g g n c a k s k n s n
  • Basepairs marked red show a high information content, i.e. the matrix exhibits a high conservation (ci-value > 60) at this position.
  • Basepairs in capital letters denote the core sequence used by MatInspector.
Weight matrix or nucleotide distribution matrix
Pos. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A 5 2 0 0 13 4 18 0 12 5 0 0 7 2 14 2 4 0 3 1
C 4 5 20 20 2 0 1 2 0 0 0 0 8 13 2 2 8 4 7 10
G 7 4 0 0 0 0 0 0 1 0 20 20 4 5 0 7 8 6 6 7
T 4 9 0 0 5 16 1 18 7 15 0 0 1 0 4 9 0 10 4 2
IUPAC N N C C A T A T W T G G N C A K S K N S
Ci 15.6 21.8 100.0 100.0 46.8 68.9 75.5 79.8 48.8 65.1 100.0 100.0 25.1 46.8 50.2 26.2 34.5 36.0 17.0 32.0
Sequence logo of the nucleotide distribution matrix Sequence logo for matrix

Download this logo as: png, pdf or eps

The plot of the profile of a weight matrix visualizes the the nucleotide conservation at a certain position. Typically, a weight matrix of a transcription factor binding site consists of a higher conserved core (red) and additional, less conserved positions.

The weight matrix table shows the frequencies of the nucleotides (A, C, G, T) at each position (Pos.) of the aligned sequences. The respective IUPAC string shown under the weight matrix profile and the weight matrix table is just a very rough description of a DNA pattern. For instance, compare the differences in the altitude of the profile at the three different positions that result in a "T" in the IUPAC string.

Usually, the strength in conservation of a certain position within a protein binding site is due to the function of the binding site. Higher conserved positions in the DNA site have higher impact to the binding strength and the specificity with respect to the appropriate protein (the transcription factor, for instance). A weight matrix reflects this profile of a binding site most accurate in weighing each position according to the observed biological conservation. Therefore, a weight matrix is a very accurate description of a binding site.