![]() |
![]() |
MatDefine is a tool for fully automatic definition and evaluation of weight matrices from a set of short DNA sequences. The resulting weight matrix can be used by MatInspector to scan nucleic acid sequences for matches to the described binding site.
The quality of a matrix is estimated by a value for random expectation (RE-value), which is defined as the number of matches with high matrix similarity (>= 0.85) expected in a random sequence of 1000 bp. This RE-value is assigned to each matrix.
Per default, the weight matrix is generated without any user interaction. A protocol describing the matrix definition process is delivered. The following steps are performed:
All default parameters can be changed. The following additional options are available:
In case of unsuitable input sequences, no matrix will be generated.
| Sequence Selection | |
|---|---|
| Input data | There are several ways to supply the input
data for MatDefine:
|
| Parameters | |
| Note: The following parameters are only relevant for sequence input files. If your input is a nucleotide distribution matrix these parameters will be ignored. | |
| Strand optimization | The strand optimization option
is useful if the orientation of the binding site is unknown.
In this case both strands of the input sequences are checked and the "+" or "-" strand is used for matrix definition.
If this option is selected, core-anchored alignment is used automatically. |
| Alignment and Tuple Search |
Core-anchored alignment
With core-anchored alignment, the best conserved core-tuple is selected and the alignment is anchored at the first position of the core-tuple in each sequence. The tuple selection algorithm is described in the CoreSearch paper. The following parameters can be modified:
These parameters are hidden by default. You can use the
|
| Unanchored alignment
Unanchored alignment means that the search for a common core sequence will be omitted. You should use this option if your sequences do not contain a highly conserved core sequence. These parameters are hidden by default. You can use the
|
|
| Matrix Creation | Cut off matrix ends
MatDefine automatically determines the correct length of the matrix by
cutting off low conserved positions at both matrix ends. For example,
if the input sequences are very different in length or contain sequences
around the binding site it is necessary to reduce the matrix length. These parameters are hidden by default. You can use the
|
| Remove identical sequences
Identical sequences (i.e. one sequence equals another sequence or is part of another sequence) can be removed to avoid a biased nucleotide distribution matrix. In case of sequences with different length the shorter sequence will be removed. Regardless of this option, MatDefine always identifies identical sequences in the output file. These parameters are hidden by default. You can use the
|
|
| Calculate optimized threshold
The optimized threshold of a weight matrix is the matrix similarity threshold that minimizes false positive matches when the matrix is used to scan sequences with MatInspector. It is defined in a way that at most 3 matches are found in 10,000 bp of non-regulatory test sequences (i.e. with the optimized threshold less than 3 false positives per 10,000 bp are found). Since the calculation of the optimized threshold requires some computing time it can be omitted for test runs. These parameters are hidden by default. You can use the
|
|
| Consistency Check | Minimum number of sequences
This is the minimum number of sequences which is required to define a matrix. If the input file contains less sequences or less sequences remain after the rejection process no matrix will be created. These parameters are hidden by default. You can use the
|
| Minimum matrix similarity
MatDefine generates a weight matrix which is consistent with its training set, i.e. all training sequences have to be identified by the resulting matrix. Therefore, only sequences that reach the minimum matrix similarity are included in the matrix, all other sequences are rejected. Decreasing the minimum matrix similarity may lead to inclusion of more sequences but also can influence the quality of the matrix. Increasing the minimum matrix similarity may lead to rejection of more sequences. If too few sequences are retained, no matrix will be created (see minimum number of sequences). These parameters are hidden by default. You can use the
|
|
| Library Comparison | Selection of matrix groups
Here, you can select the matrix groups from the current MatInspector library with which the newly generated matrix should be compared. Per default, all matrix groups including your user-defined matrices (if available) are selected. Please note that the library comparison cannot be completely disabled. If you do not select at least one matrix group, the new matrix will be compared with all available matrix groups. These parameters are hidden by default. You can use the
|
| Check all sequences
If this option is set, all input sequences will be checked against the matrix groups selected above (using optimized matrix similarity threshold). These parameters are hidden by default. You can use the
|
|
| Your email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! |
MatDefine creates a protocol detailing each step of the matrix generation,
and the weight matrix which is used by MatInspector.
The resulting matrix can be saved to your personal
matrix library (user-defined library).
The protocol file contains
| Sequence | Identical to |
|---|---|
| HSFOS | MMCFOS |
| XLACTIN5A | XLACTIN8A |
| Core sequence: | CCAT |
| Number of aligned sequences: | 20 |
| Number of rejected sequences: | 0 |
| Sequence Name | Position | Str. | Alignment | Matrix Similarity |
|---|---|---|---|---|
| MMTFEZIF2 MMTFEZIF1 HSACTCA2 XLACTCAG3 EBV GGACAREG1 GGACAREG2 HSACTBPR HSVLC1 MMCYR61G XLACTIN8A XLACTIN5A XLACTCAG1 HSACTCA3 HSACTCA4 XLACTCAG2 MMCFOS HSFOS MMKROX1 MMTFEZIF |
4 - 23 4 - 23 4 - 23 4 - 23 4 - 23 4 - 23 3 - 22 3 - 22 4 - 23 4 - 23 4 - 23 4 - 23 3 - 22 3 - 22 3 - 22 3 - 22 3 - 22 3 - 22 3 - 22 3 - 22 |
(+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) (+) |
CG CCAT ATAAGGAGCAGGAA CG CCTT ATATGGAGTGGCCC GA CCAA ATAAGGCAAGGTGG TA CCAA ATAAGGGCAGGCTG AG CCAT ATGTGGACAGATGG CG CCTT CTTTGGGCAGCGCG AC CCAA ATATGGCGACGGCC GT CCTT ATATGGACTCATCT AT CCTT TTATGGCCCTGTCC AC CCAA ATATGGAAATATTG GC CCAT ATTTGGCGATCTTC GC CCAT ATTTGGCGATCTTC AT CCCT ATTTGGCCATCCCT CT CCCT ATTTGGCCATCCCC TT CCTT ACATGGTCTGGGGG TT CCAT ACATGGGCTAAGGG GT CCAT ATTAGGACATCTGC GT CCAT ATTAGGACATCTGC GT CCAT ATATGGGCAGCGAC TC CCAT ATATGGCCATGTAC |
0.855 0.901 0.865 0.872 0.941 0.857 0.904 0.914 0.866 0.897 0.955 0.955 0.926 0.940 0.843 0.854 0.945 0.945 0.978 0.988 |
In case you want to save the resulting matrix to your personal library, some more information has to be entered:
| Matrix Identification | |
|---|---|
| Matrix Identification | The matrix identification consists
of
|
| Family Information | Each matrix belongs to a so-called matrix
family, where functionally similar matrices are grouped together
in order to eliminate redundant matches by MatInspector.
You can
|
| Extra Information | Here you can enter further information which will be stored in the References field of the matrix. |
MatDefine is described in:
| © 1998-2011 Genomatix Software GmbH - All rights reserved |