![]() |
![]() |
MatInspector is a software tool that utilizes a large library of matrix descriptions for transcription factor binding sites to locate matches in DNA sequences. MatInspector is almost as fast as a search for IUPAC strings but has been shown to produce superior results. It assigns a quality rating to matches and thus allows quality-based filtering and selection of matches.
The first version of MatInspector is described in Quandt et al., 1995 (NAR). A paper describing all the new features of the current version of MatInspector has been published in 2005 (Cartharius et al., 2005, Bioinformatics).
Generally, MatInspector can
| Gene Name Input | |
|---|---|
| Search promoters by gene | Use the combo box to select a gene. See below for details.
|
| Sequence Input | |
|---|---|
| Choose from your previously uploaded sequences | Select a sequence file from the list of your personal sequence files. |
| or enter the formatted DNA sequence(s) | Enter your correctly formatted sequence(s) directly into the
form, e.g. with copy and paste. The following formats are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped! |
| or upload a file containing sequence(s) (max. 100 MB) | If your browser supports this option, a sequence file can be uploaded. If you use this option, the file should contain the sequence(s) in either one of the following formats: Please note, that the size for uploaded files is limited to 100MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1). |
| or enter accession number(s) |
If you are interested in one or several special
sequences from a database section, you can supply a list of correct accession
numbers in the form. If you want to select more than one accession number,
please separate the accession numbers by commas or spaces.
On the Genomatix server accession numbers from the following databases can be entered:
|
| Search corresponding promoters for your sequence(s) |
If you activate this checkbox, your input sequence(s) is/are mapped against the
organism which you choose from the drop-down list. See below for details.
|
|---|
| Database input | |
|---|---|
| Select one of these database-sections | On the Genomatix server the following databases are available:
In case you have selected a section from the GenBank database you may also restrict the analysis to sequences containing user-defined keywords in their annotation. You can enter keywords which will be searched in
The keyword searches can be combined with "AND" or "OR". Please note that the keywords cannot contain blanks (all blanks will be skipped). These parameters are hidden by default. You can use the |
| HINT: If you want to check a VERY SHORT oligo, please enter the sequence padded with a few Ns at the beginning and end!!! |
MatInspector 8.0 introduced the possibility to submit promoter sequences
from Genomatix' ElDorado database directly to MatInspector.
There are two ways to achieve this:
| Gene Name Input | |
|---|---|
| Search promoters by gene |
The textfield to enter a gene name is a combo box. This means that, while you type a gene name within the field, a drop-down list appears. You then can select your gene of interest from that list by left-clicking the item. Please note that it may take a few moments before the drop-down list appears. The list items have to be computed and sent to your browser while you are typing. This may depend on your client machine and internet connection. Should you experience any rendering problems with the drop-down list, please have a look at our Technical FAQs.
|
| Sequence Input | |
|---|---|
| ... | |
| Search corresponding promoters for your sequence(s) |
To activate the mapping, simply check the box in this field. Of course, you must
use one of the sequence input options as well.
|
If you submitted a gene, all promoters of this gene are extracted directly from the ElDorado genome database.
If you submitted sequences, they are mapped against the selected
genome in the ElDorado database. The exon/intron structure of the mapped sequences
is compared to all transcripts annotated for the corresponding genomic region.
The promoters of all transcripts with at least one exon identical to one of the mapped exons match
your query.
On the search result page, you may select the promoters for analysis with MatInspector.

Some notes on promoter finding:
Depending on the selected MatInspector library a form with more parameters to fill in will appear:
| Matrix Search Parameters | |
|---|---|
| Library version | Here you can select a previous version of
the matrix library. This can be helpful for re-producing old results. By default, the latest matrix library is selected (please see the Library Statistics and the Library Release Notes). Note:
Certain parameter settings require that the Matrix library is automatically reset, disregarding
your selection. The assignment of genes to transcription factors (in
MatBase) depends on the ElDorado
database version, i.e. to any Matrix library version corresponds exactly one ElDorado version.
The ElDorado version however, is taken into account for the
literature-based lines of evidence. Literature analysis is available for
the following organisms: all vertebrates, both yeasts, fruitfly and thale cress.
This means that, if the literature-based lines of evidence are available, the matrix library which
corresponds to the current ElDorado database must be selected. In particular, this is required if
These parameters are hidden by default. You can use the
|
| Matrix group | The MatInspector matrix library consists
of carefully selected descriptions for transcription factor binding sites.
The matrix library is divided into the subsections/groups
When selecting a subset of matrices, the core and matrix thresholds can also be set individually for each selected matrix. A selected subset of matrices can also be saved in a personal directory and can be retrieved via the "use previously defined matrix subsets"-option. Note, that the list of previously defined subsets depends on the "MATRIX family"-selection! (There is a difference between matrix family subsets and individual matrix subsets.) Using the link "Check transcription factor <-> matrix family assignment" in the left column you can look up which transcription factor binding sites are represented by which matrix families. You can either enter the name of a transcription factor or the name of a matrix / matrix family from the current MatInspector library.
|
| Matrix families | Each matrix belongs to a so-called matrix family, where functionally
similar matrices are grouped together, eliminating redundant matches by
MatInspector professional.
All matrices in a family are of the same (uneven) length and have an anchor position assigned which is the center position of the matrix. This assures that matrices of a family match exactly at the same position. If matrix families are selected, MatInspector will only list
the best match from a family for each site. Otherwise (individual
matrices selected) different but closely related matrices might match
at the same position on the sequence (example). These parameters are hidden by default. You can use the
|
| Core similarity | The "core sequence" of
a matrix is defined as the (usually 4) highest conserved positions of the
matrix.
The maximum core similarity of 1.0 is only reached when the highest
conserved bases of a matrix match exactly in the sequence. Increasing the core similarity will miss matches that have one or more mismatches in the core region but have a high similarity to the rest of the matrix (This should only be done to enhance the performance of MatInspector.) Decreasing the core similarity (while retaining the same matrix similarity) might give a few more matches in the output that have more mismatches in the core region of the matrix. These parameters are hidden by default. You can use the
|
| Matrix similarity | The matrix similarity is calculated
as described in the MatInspector papers.
A perfect match to the matrix gets a score of 1.00 (each sequence position corresponds to the highest conserved nucleotide at that position in the matrix), a "good" match to the matrix usually has a similarity of > 0.80. Mismatches in highly conserved positions of the matrix decrease the matrix similarity more than mismatches in less conserved regions. Increasing the matrix similarity will find less matches in your sequence, but might miss matches that do have a "mismatch" compared to the matrix. Decreasing the matrix similarity will find more matches in your sequence. The matrix similarity is correlated to the re-value of a matrix: A matrix with a high re-value will find more matches even with a high matrix similarity than a well-defined matrix (low re-value). Since there are binding sites that are biologically quite "loosely" defined, a high re-value is not necessarily a sign of a "bad" matrix description. A very low re-value might even be a sign of a description that is too strict. Optimized matrix similarity: These parameters are hidden by default. You can use the
|
MatInspector can also perform searches for user-defined IUPAC strings or strings from predefined IUPAC-libraries instead of matrices:
| IUPAC String Parameters | |
|---|---|
| User-defined IUPAC string: |
MatInspector will locate matches to this user-defined IUPAC string.
Only the IUPAC symbols ABCDGHKMNRSTUVWY
can be used (e.g. R is A or G), all other letters are ignored. Please specify the maximum number of mismatches that are allowed in matches to the string (these can occur at any position of the string). The number of mismatches should not exceed 50% of the string-length. |
| Predefined IUPAC library: |
If IUPAC families are selected, MatInspector will only list the best
match from a family for each IUPAC family. Otherwise (individual IUPACs
selected) a single site might match different but closely related IUPAC
strings.
The IUPAC libraries provided are
A selected subset of IUPAC strings can also be saved in a personal directory and can be retrieved via the "use previously defined IUPAC subsets"-option. Note, that the list of previously defined subsets depends on the "IUPAC family"-selection! (There is a difference between IUPAC family subsets and individual string subsets.) |
| Output Parameters | |
|---|---|
| Lines of evidence | You may set some options for lines of evidence:
There is a limit for the computation for the lines of evidence. For database searches, or if the combined lengths of all input sequences is above 1 million basepairs, the lines of evidence are not available. |
| Extra output | The following extra output options are
available:
These parameters are hidden by default. You can use the
|
| Statistics | Depending on this option a statistics with the match numbers in the input sequences can be displayed below the result list. For database searches it can be interesting to view the statistics only but not the result list as the number of matches is limited to 5000. These parameters are hidden by default. You can use the
|
| Offset for match positions | You can supply MatInspector with a
number of basepairs that will be added to each position in the output (the
number can also be negative).
For example: These parameters are hidden by default. You can use the
|
| Email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! These parameters are hidden by default. You can use the
|
MatInspector creates an output file that contains (depending on your parameter settings)
For details on the algorithm or how core and matrix similarity are calculated, please see the algorithm details.
The analysis is terminated if 5000 matches are found as a larger number of matches will result in a huge output file where your browser may crash when displaying it.
Here is an example output for a matrix search using the vertebrate group of matrices:
The featuresof the interactive Java graphics, e.g. filtering for certain matches, are described here.

[click the image above for a detailed view]
A green background in the matrix similarity column marks a similarity above optimized, a red background marks a similarity below optimized (e.g. if a search was started using "optimized - 0.02").
For displaying the sequences which match a matrix in the MatInspector library the following code is used:
Note: Interactive handling of the result tables is deactivated if there are more than 50 sequences or more than 2500 matches in one sequence or more than 4000 matches in all sequences in the output.
It is possible to hide complete columns of the table.
The drop down list above the table contains the columns which can be hidden. Currently hidden columns
appear with a red background in this list, visible columns with a green background.
Selecting a column and then clicking the "Show/Hide column" button left of the drop-down list
will change the visibility of the column. Use the "Show all columns" button to
display all columns of the table.
Note that there are some columns hidden by default, i.e. when the result page is created.
symbol,
the table can be sorted by simply clicking this symbol. Clicking again will reverse the sort order.
| is the "or" operator: "A | B" will match any cell containing
at least one of the terms.
& is the "and" operator: "A & B" will match any cell
containing both terms.
! is the "not" operator: "!A" will match any cell
not containing "A".
Note: It is not possible to
combine the "|"- and "&"-operators in one expression.
Columns with numeric content allow the use of comparison operators.
Simply type the operator before a number.
The difference between filtering by e.g. "=10" and simply "10" is that the latter will
also match "100","210" etc.
Anchors can be used to denote the start or end of a string. { matches the start position, } matches the end position of the content of a column.
The map output is displayed if you selected the "Show matches aligned with sequence" option from the output parameters.
( 4) +MKCCCSCNGGCGn(V$AP2.01(0.932)) ( 30) +WTGCGTGGGCGKnnn(V$EGR1.01(0.810)) ( 43) +nnnnNNTGACGTGnnnnnnnn(V$ATF6.02(0.886)) ( 47) +GNTGACGTGKNNNWT(V$XBP1.01(0.908)) ( 45) +ARTNMCYNCNGYSTCAGCWGNTn(V$BEL1.01(0.815)) ( 45) +nnnnnnnnRTGASTCAGCAnnnnnn(V$NFE2.01(0.884)) 1 CTGCGCCCTCCGGCCGCCGGTGGCCCTCTGTGCGGTGGGGGAAGGGGTCGACGTGGCTCA ( 34) -nNNGGGGGNGGNNnn(V$ZBP89.01(0.931)) ( 59) -nRNCGYRRTGCATKNTGGGWAAN(V$STAF.01(0.772)) ( 84) +nNNGTGGGAAANNnn(V$RBPJK.02(0.949)) ( 90) +nG.AAAGYGAAASYnnnnn(V$IRF2.01(0.805)) ( 100) +nnnNNAGKKCCAGGNNMGn(V$PAX6.02(0.955)) 61 GCTTTTTGGATTCAGGGAGCTCGGGGGTGGGAAGAGAGAAATGGAGTTCCAGGGGCGTAA ( 80) -nNNGGGGGNGGNNnn(V$ZBP89.01(0.966)) ( 96) -nNNWTATTGAYTTNN(V$HNF6.01(0.846))
For each matrix match the IUPAC consensus sequence of the matrix is displayed aligned with the sequence. Matches on the (+) strand are shown above the sequence, (-) strand matches are shown below the sequence.
If a sequence has a length of more than 5000bp, the map output is omitted.
If you set the statistics parameter,
a table with the match numbers and number of sequences with a match is printed.
The layout of the table depends on the matrix search parameter:
Parameter set to "matches to matrix families":

Parameter set to "matches to individual matrices":

The transcription factor binding site matches identified by MatInspector can be exported to GenBank sequence files. MatInspector matches are annotated in the feature table with the feature key "misc_signal".
LOCUS GXP_287091 766 bp DNA
DEFINITION loc=GXL_241328|sym=GCG|geneid=2641|acc=GXP_287091|
taxid=9606|spec=Homo sapiens|chr=2|ctg=NC_000002|str=(-)|
start=162716900|end=162717665|len=766|tss=663|
descr=glucagon|
comm=GXT_2817146/NM_002054/663/bronze;
GXT_22755335/ENST00000375497/663/bronze
ACCESSION GXP_287091
COMMENT Matrix matches determined by MatInspector (Genomatix)
Matrix Family Library Version 8.0 (November 2008)
FEATURES Location/Qualifiers
misc_signal complement(14..30)
/note="V$FKHD/HNF3B.02, mat_sim: 0.912"
misc_signal 20..36
/note="V$HNF1/HNF1.03, mat_sim: 0.806"
misc_signal complement(53..73)
/note="V$CART/RHOX6.01, mat_sim: 0.878"
misc_signal 54..76
/note="V$LHXF/ISL2.01, mat_sim: 0.885"
BASE COUNT 268 a 135 c 149 g 214 t
ORIGIN
1 AGCATCAGCT ATCTTGGATG TTTAATCTTC ATTTTGCTCC ATCCTTTCTG CCTGAATTCC
61 ATTTATTAAA ACAGAACACA TAGGGGTTTA ATCAATATCC TTAAATTTTC CACAAACATA
121 ACATAAATAA ACTCCACGTT GTGAGGAAGA GAGGATTTTT AATACATATG TGTTGAATGA
181 ATGATCATTA TTTAGATAAA TGAATGACTG AAGTGATTGT TATATTCAGG TAAATTCATC
241 ATGGCTAGGT AGCAAACCAA AGACTTGTAA GAACCTCAAA TGAGGACATG CACAAAACAG
301 GGATGGCCAT GGGCTACGTA ATTTCAAGGT CTTTTGTCTT CAACGTCAAA ATTCACTTTA
361 GAGAACTTAA GTGATTTTCA TGCGTGATTG AAAGTAGAAG GTGGATTTCC AAGCTGCTCT
421 CTCCATTCCC AACCAAAAAA AAAAAAAAAA GATACAAGAG TGCATAAAAA GTTTCCAGGT
481 CTCTAAGGTC TCTCACCCAA TATAAGCATA GAATGCAGAT GAGCAAAGTG AGTGGGAGAG
541 GGAAGTCATT TGTAACAAAA ACTCATTATT TACAGATGAG AAATTTATAT TGTCAGCGTA
601 ATATCTGTGA GGCTAAACAG AGCTGGAGAG TATATAAAAG CAGTGCGCCT TGGTGCAGAA
661 GTACAGAGCT TAGGACACAG AGCACATCAA AAGTTCCCAA AGAGGGCTTG CTCTCTCTTC
721 ACCTGCTCTG TTCTACAGCA CACTACCAGA AGGTAAGATG ATTATA
//
Note: Only the currently visible matches are exported. I.e. you can customize the list of matches for export by using the filter feature of the result table(s).
All information available in the result table (like matrix family name, position, tissue association and lines of evidence) can be exported to a file in Microsoft Excel™ format. You can save this file to your local disk and/or open it directly with Microsoft Excel™ (or other software tools supporting this format, like OpenOffice.org Calc).
Note: Only the currently visible matches are exported. I.e. you can customize the list of matches for export by using the filter feature of the result table(s).
With MatInspector release 8.0 came a new main feature: the support of matrix matches by lines of evidence. There are three different lines of evidence available.
| Evidence | Description | Required settings |
|---|---|---|
| Known interaction |
The Gene<->TF interactions are Genomatix propietary expert curated information based on literature analysis. |
|
| Known cocitation |
The co-citation information is derived by automatical literature analysis (LitInspector). |
|
| Promoter module |
|
|
MatInspector is described in the following publications:
Reference for PLACE (included as IUPAC library into MatInspector):
| © 1998-2013 Genomatix Software GmbH - All rights reserved |