Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

"Weight Matrix"
or "Nucleotide Distribution Matrix"


[Aligned sequences] [Profile of a weight matrix] [Weight matrix] [IUPAC]
[GEMS Launcher main menu]

Weight matrices are selective descriptions of DNA patterns. They are based on the nucleotide distribution observed in aligned DNA sites. Weight matrices are fully automatically generated using MatDefine. The Genomatix program MatInspector scans genomic sequences for matches to such weight matrices. Both tools are integrated into GEMS Launcher.

Aligned sequences of a transcription factor binding site (yeast ABF)
Name Alignment
SCPLASM TATCTTTGTTAACGA
SCCOXCH2 GATCATTCCCAACGA
SCS33AA_INV GGTCACTCTAGACGG
M28606_INV TATCATTGCAAACGT
SCPHO5_INV CATCGTTAATGACGT
SCRGL2 TATCACGTCACACGA
SCPK01 CATCTCTCGCAACGG
SCUBCOX8_INV AGTCACGTGGAACGG
SCBAF1 CATCCCCATTAACGA
SCANB1RE_1 AATCATATTCGACGA
SCHIS3G_DED2 TGTCATTCTGAACGA
SCTMC1A AATCGTTTTGTACGT
SCHIS3G_DED1 CATCATTCTATACGT
SCRPO31 CATCACTATATACGT
SCANB1RE_2 TGTCGTCTCACACGG
SCMAT3_INV TATCGCCATATACGA
SCRPC40_INV AGTCACTATAAACGG
SCBTUB_2 GGTCACGATATACGT
SCBTUB_1_INV GGTCACTGTACACGT
SCMAT4 CATCATAAAATACGA
CHRIII_2 AATCACGAGCGACGG
SCENOC TGTCACTAACGACGT
Profile of the nucleotide distribution matrix
100.0
75.0
50.0
25.0
IUPAC: n n r t c a y t n t n n A C G N n n n
Weight matrix or nucleotide distribution matrix
Pos. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A 5 14 0 0 15 0 2 9 3 11 8 22 0 0 8
C 6 0 0 22 1 12 3 5 4 5 3 0 22 0 0
G 4 8 0 0 4 0 4 3 3 3 5 0 0 22 6
T 7 0 22 0 2 10 13 5 12 3 6 0 0 0 8
IUPAC N R T C A Y T N T N N A C G N
Ci 15.2 59.3 100.0 100.0 42.2 57.2 31.0 18.6 26.4 23.8 17.3 100.0 100.0 100.0 32.3

The plot of the profile of a weight matrix visualizes the differences of the nucleotide conservation at a certain position. The higher the number of asterisks (*) the higher the conservation. Typically, a weight matrix of a transcription factor binding site consists of a higher conserved core (red) and additional, less conserved positions.

The weight matrix table shows the frequencies of the nucleotides (A, C, G, T) at each position (Pos.) of the aligned sequences. The respective IUPAC string shown under the weight matrix profile and the weight matrix table is just a very rough description of a DNA pattern. For instance, compare the differences in the altitude of the profile at the three different positions that result in a "T" in the IUPAC string.

Usually, the strength in conservation of a certain position within a protein binding site is due to the function of the binding site. Higher conserved positions in the DNA site have higher impact to the binding strength and the specificity with respect to the appropriate protein (the transcription factor, for instance). A weight matrix reflects this profile of a binding site most accurate in weighing each position according to the observed biological conservation. Therefore, a weight matrix is a very accurate description of a binding site.