![]() |
![]() |
MatInspector is a software tool that utilizes a large
library of matrix descriptions for transcription factor binding sites to
locate matches in DNA sequences.
MatInspector is almost as fast as a search
for IUPAC strings but has been shown to produce superior results. It assigns
a quality rating to matches and thus allows quality-based filtering and selection
of matches.
The first version of MatInspector is described in Quandt et al., 1995 (NAR). A paper describing various new features of MatInspector has been published in 2005 (Cartharius et al., 2005, Bioinformatics).
Generally, MatInspector can
Gene Name Input | |
---|---|
Search promoters by gene | Use the combo box to select a gene. See below for details. |
General: Sequence Formats | |||||
---|---|---|---|---|---|
Accepted DNA sequence formats | The following formats for DNA sequences are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped! | ||||
Sequence Input | |||||
Choose from your previously uploaded sequences | Select a sequence file from the list of your personal sequence
files which were saved in the result management in prior analyses (via "add sequences", see below). |
Quick Upload ![]() |
Paste your sequence(s) in the form field in one of the accepted formats (see above). Note that sequences pasted in the "quick upload" field are not saved for future use. | ||
Add sequences |
Sequences or sequence files uploaded here are automatically saved in the result management for later use:
|
||||
Accession number(s) |
If you are interested in one or several special
sequences from a database section, you can supply a list of accession numbers.
If you want to select more than one accession number,
please separate the accession numbers by commas or spaces.
On the Genomatix server accession numbers from the following databases can be entered:
|
Search corresponding promoters for your sequence(s) |
If you activate this checkbox, your input sequence(s) is/are mapped against the organism which you choose from the drop-down list. See below for details. |
---|
Database input | |
---|---|
Select one of these database-sections | On the Genomatix server the following databases are available:
In case you have selected a section from the GenBank database you may also restrict the analysis to sequences containing user-defined keywords in their annotation. You can enter keywords which will be searched in
The keyword searches can be combined with "AND" or "OR". Please note that the keywords cannot contain blanks (all blanks will be skipped). These parameters are hidden by default. You can use the ![]() |
MatInspector 8.0 introduced the possibility to submit promoter sequences
from Genomatix' ElDorado database directly to MatInspector.
There are two ways to achieve this:
Gene Name Input | |
---|---|
Search promoters by gene |
The textfield to enter a gene name is a combo box. This means that, while you type a gene name within the field, a drop-down list appears. You then can select your gene of interest from that list by left-clicking the item. Please note that it may take a few moments before the drop-down list appears. The list items have to be computed and sent to your browser while you are typing. This may depend on your client machine and internet connection. Should you experience any rendering problems with the drop-down list, please have a look at our Technical FAQs. |
Sequence Input | |
---|---|
... | |
Search corresponding promoters for your sequence(s) |
To activate the mapping, simply check the box in this field. Of course, you must
use one of the sequence input options as well. |
If you submitted a gene, all promoters of this gene are extracted directly from the ElDorado genome database.
If you submitted sequences, they are mapped against the selected
genome in the ElDorado database. The exon/intron structure of the mapped sequences
is compared to all transcripts annotated for the corresponding genomic region.
The promoters of all transcripts with at least one exon identical to one of the mapped exons match
your query.
On the search result page, you may select the promoters for analysis with MatInspector.
Some notes on promoter finding:
Each promoter can have additional annotation:
Depending on the selected MatInspector library a form with more parameters to fill in will appear:
Matrix Search Parameters | |
---|---|
Library version | Here you can select a previous version of
the matrix library. This can be helpful for re-producing old results. By default, the latest matrix library is selected (please see the Library Statistics and the Library Release Notes). Note:
Certain parameter settings require that the Matrix library is automatically reset, disregarding
your selection. The assignment of genes to transcription factors (in
MatBase) depends on the ElDorado
database version, i.e. to any Matrix library version corresponds exactly one ElDorado version.
The ElDorado version however, is taken into account for the
literature-based lines of evidence. Literature analysis is available for
the following organisms: all vertebrates, both yeasts, fruitfly and thale cress.
This means that, if the literature-based lines of evidence are available, the matrix library which
corresponds to the current ElDorado database must be selected. In particular, this is required if
|
Matrix group | The MatInspector matrix library consists
of carefully selected descriptions for transcription factor binding sites.
The matrix library is divided into the subsections/groups
When selecting a subset of matrices, the core and matrix thresholds can also be set individually for each selected matrix. A selected subset of matrices can also be saved in a personal directory and can be retrieved via the "use previously defined matrix subsets"-option. Note, that the list of previously defined subsets depends on the "MATRIX family"-selection! (There is a difference between matrix family subsets and individual matrix subsets.) Using the link "Check transcription factor <-> matrix family assignment" in the left column you can look up which transcription factor binding sites are represented by which matrix families. You can either enter the name of a transcription factor or the name of a matrix / matrix family from the current MatInspector library.
|
Matrix families | Each matrix belongs to a so-called matrix family, where
similar matrices are grouped together, eliminating redundant matches by
MatInspector professional.
All matrices in a family are of the same (uneven) length and have an anchor position assigned which is the center position of the matrix. This assures that matrices of a family match exactly at the same position. If matrix families are selected, MatInspector will only list
the best match from a family for each site. Otherwise (individual
matrices selected) different but closely related matrices might match
at the same position on the sequence (example). These parameters are hidden by default. Clicking on
![]() |
Core similarity | The "core sequence" of
a matrix is defined as the (usually 4) highest conserved positions of the
matrix.
The maximum core similarity of 1.0 is only reached when the highest
conserved bases of a matrix match exactly in the sequence. Increasing the core similarity will miss matches that have one or more mismatches in the core region but have a high similarity to the rest of the matrix (This should only be done to enhance the performance of MatInspector.) Decreasing the core similarity (while retaining the same matrix similarity) might give a few more matches in the output that have more mismatches in the core region of the matrix. |
Matrix similarity | The matrix similarity is calculated
as described in the MatInspector papers.
A perfect match to the matrix gets a score of 1.00 (each sequence position corresponds to the highest conserved nucleotide at that position in the matrix), a "good" match to the matrix usually has a similarity of > 0.80. Mismatches in highly conserved positions of the matrix decrease the matrix similarity more than mismatches in less conserved regions. Increasing the matrix similarity will find less matches in your sequence, but might miss matches that do have a "mismatch" compared to the matrix. Decreasing the matrix similarity will find more matches in your sequence. The matrix similarity is correlated to the re-value of a matrix: A matrix with a high re-value will find more matches even with a high matrix similarity than a well-defined matrix (low re-value). Since there are binding sites that are biologically quite "loosely" defined, a high re-value is not necessarily a sign of a "bad" matrix description. A very low re-value might even be a sign of a description that is too strict. Optimized matrix similarity: |
MatInspector can also perform searches for user-defined IUPAC strings or strings from predefined IUPAC-libraries instead of matrices:
IUPAC String Parameters | |
---|---|
User-defined IUPAC string: |
MatInspector will locate matches to this user-defined IUPAC string.
Only the IUPAC symbols ABCDGHKMNRSTUVWY
can be used (e.g. R is A or G), all other letters are ignored. Please specify the maximum number of mismatches that are allowed in matches to the string (these can occur at any position of the string). The number of mismatches should not exceed 50% of the string-length. |
Predefined IUPAC library: |
If IUPAC families are selected, MatInspector will only list the best
match from a family for each IUPAC family. Otherwise (individual IUPACs
selected) a single site might match different but closely related IUPAC
strings.
The IUPAC libraries provided are
A selected subset of IUPAC strings can also be saved in a personal directory and can be retrieved via the "use previously defined IUPAC subsets"-option. Note, that the list of previously defined subsets depends on the "IUPAC family"-selection! (There is a difference between IUPAC family subsets and individual string subsets.) |
Output Parameters | |
---|---|
Lines of evidence | The following options are available as lines of evidence:
There is a limit for the computation for the lines of evidence. For database searches, or if the combined lengths of all input sequences is above 1 million basepairs, the lines of evidence are not available. |
Statistics | Depending on this option the output can be reduced to contain only the match summary table (no graphics, no match details). For database searches it can be interesting to view the statistics only but not the result list as the number of matches to be listed in detail is limited to 5000. |
Extra output | The following extra output options are available:
These parameters are hidden by default. Clicking on
![]() |
Offset for match positions | You can supply MatInspector with a
number of basepairs that will be added to each position in the output (the
number can also be negative).
For example: These parameters are hidden by default. Clicking on
![]() |
Email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! |
MatInspector creates an output file that contains (depending on your parameter settings)
For details on the algorithm or how core and matrix similarity are calculated, please see the algorithm details.
The analysis is terminated if 5000 matches are found as a larger number of matches will result in a huge output file where your browser may crash when displaying it.
Here is an example output for a matrix search using the vertebrate group of matrices:
The features of the interactive graphics, e.g. filtering for certain matches, are described here.
[click the image above for a detailed view]
Matrix information | The matrix family name, the detailed family description, the matrix name, and the matrix description for a match are listed. This information is taken from MatBase. |
---|---|
Position and Anchor | The position of the TF site within the input sequence (start - end) and the strand are given.
All matrices in a family are of the same (uneven) length and have an anchor position assigned which is the center position of the matrix. This assures that matrices of a family match exactly at the same position. If Genomatix promoter sequences are analyzed, genomic positions of the matches are given as tooltip. |
Core similarity | The "core sequence" of a matrix
is defined as the (usually 4) consecutive highest conserved positions
of the matrix. The core similarity is calculated as described here and in the MatInspector paper. The maximum core similarity of 1.0 is only reached when the highest conserved bases of a matrix match exactly in the sequence. More important than the core similarity is the matrix similarity which takes into account all bases over the whole matrix length! |
Matrix similarity | The matrix similarity is calculated
as described here and
in the MatInspector paper. A perfect match to the matrix gets a score of 1.00 (each sequence position corresponds to the highest conserved nucleotide at that position in the matrix), a "good" match to the matrix usually has a similarity of >0.80. Mismatches in highly conserved positions of the matrix decrease the matrix similarity more than mismatches in less conserved regions. In MatInspector, a green background in the matrix similarity column marks a similarity above optimized (i.e. a "good" match), a red background marks a similarity below optimized (e.g. if a search was started using "optimized - 0.02". |
Additional lines of evidence | Clicking a link in this column provides more details on the line of evidence, e.g. the literature co-citations of the analyzed gene and the identified transcription factor.
Click the # symbol to sort the matches by the number of available evidences. See a detailed description of additional lines of evidence
for a transcription factor binding site.
|
Sequence |
For displaying the sequences which match a matrix in the MatInspector library the following code is used:
|
Note: Interactive handling of the result tables is deactivated if there are more than 50 sequences or more than 2500 matches in one sequence or more than 4000 matches in all sequences in the output.
It is possible to hide complete columns of the table.
The dialog that appears when the columnchooser icon in the upper left corner is clicked,
contains the columns which can be hidden/shown.
Checking/unchecking the desired column(s) and then clicking the "confirm" button
will change the visibility of the column(s).
Note that there are some columns hidden by default, i.e. when the result page is created.
Click on the operator icon to the left of the search form, to change the search operator.
Columns with numeric content allow the use of comparison operators.
Simply change the operator via the dropdown menu.
The "advanced search" dialog allows the build more advanced search queries.
Click on the advanced search icon in the upper left corner, to open the advanced search dialog.
The first dropdown ("all"/"any") specifies, wether all search criteria, or only one of the criteria have to be met.
Add new search criterias by clicking the "+" icon or remove one by clicking the "-" icon.
The map output is displayed if you selected the "Show matches aligned with sequence" option from the output parameters.
( 4) +MKCCCSCNGGCGn(V$AP2.01(0.932)) ( 30) +WTGCGTGGGCGKnnn(V$EGR1.01(0.810)) ( 43) +nnnnNNTGACGTGnnnnnnnn(V$ATF6.02(0.886)) ( 47) +GNTGACGTGKNNNWT(V$XBP1.01(0.908)) ( 45) +ARTNMCYNCNGYSTCAGCWGNTn(V$BEL1.01(0.815)) ( 45) +nnnnnnnnRTGASTCAGCAnnnnnn(V$NFE2.01(0.884)) 1 CTGCGCCCTCCGGCCGCCGGTGGCCCTCTGTGCGGTGGGGGAAGGGGTCGACGTGGCTCA ( 34) -nNNGGGGGNGGNNnn(V$ZBP89.01(0.931)) ( 59) -nRNCGYRRTGCATKNTGGGWAAN(V$STAF.01(0.772)) ( 84) +nNNGTGGGAAANNnn(V$RBPJK.02(0.949)) ( 90) +nG.AAAGYGAAASYnnnnn(V$IRF2.01(0.805)) ( 100) +nnnNNAGKKCCAGGNNMGn(V$PAX6.02(0.955)) 61 GCTTTTTGGATTCAGGGAGCTCGGGGGTGGGAAGAGAGAAATGGAGTTCCAGGGGCGTAA ( 80) -nNNGGGGGNGGNNnn(V$ZBP89.01(0.966)) ( 96) -nNNWTATTGAYTTNN(V$HNF6.01(0.846))
For each matrix match the IUPAC consensus sequence of the matrix is displayed aligned with the sequence. Matches on the (+) strand are shown above the sequence, (-) strand matches are shown below the sequence.
If a sequence has a length of more than 5000 bp, the map output is omitted.
The transcription factor binding site matches identified by MatInspector can be exported to GenBank sequence files. MatInspector matches are annotated in the feature table with the feature key "misc_signal".
LOCUS GXP_287091 766 bp DNA DEFINITION loc=GXL_241328|sym=GCG|geneid=2641|acc=GXP_287091| taxid=9606|spec=Homo sapiens|chr=2|ctg=NC_000002|str=(-)| start=162716900|end=162717665|len=766|tss=663| descr=glucagon| comm=GXT_2817146/NM_002054/663/bronze; GXT_22755335/ENST00000375497/663/bronze ACCESSION GXP_287091 COMMENT Matrix matches determined by MatInspector (Genomatix) Matrix Family Library Version 8.0 (November 2008) FEATURES Location/Qualifiers misc_signal complement(14..30) /note="V$FKHD/HNF3B.02, mat_sim: 0.912" misc_signal 20..36 /note="V$HNF1/HNF1.03, mat_sim: 0.806" misc_signal complement(53..73) /note="V$CART/RHOX6.01, mat_sim: 0.878" misc_signal 54..76 /note="V$LHXF/ISL2.01, mat_sim: 0.885" BASE COUNT 268 a 135 c 149 g 214 t ORIGIN 1 AGCATCAGCT ATCTTGGATG TTTAATCTTC ATTTTGCTCC ATCCTTTCTG CCTGAATTCC 61 ATTTATTAAA ACAGAACACA TAGGGGTTTA ATCAATATCC TTAAATTTTC CACAAACATA 121 ACATAAATAA ACTCCACGTT GTGAGGAAGA GAGGATTTTT AATACATATG TGTTGAATGA 181 ATGATCATTA TTTAGATAAA TGAATGACTG AAGTGATTGT TATATTCAGG TAAATTCATC 241 ATGGCTAGGT AGCAAACCAA AGACTTGTAA GAACCTCAAA TGAGGACATG CACAAAACAG 301 GGATGGCCAT GGGCTACGTA ATTTCAAGGT CTTTTGTCTT CAACGTCAAA ATTCACTTTA 361 GAGAACTTAA GTGATTTTCA TGCGTGATTG AAAGTAGAAG GTGGATTTCC AAGCTGCTCT 421 CTCCATTCCC AACCAAAAAA AAAAAAAAAA GATACAAGAG TGCATAAAAA GTTTCCAGGT 481 CTCTAAGGTC TCTCACCCAA TATAAGCATA GAATGCAGAT GAGCAAAGTG AGTGGGAGAG 541 GGAAGTCATT TGTAACAAAA ACTCATTATT TACAGATGAG AAATTTATAT TGTCAGCGTA 601 ATATCTGTGA GGCTAAACAG AGCTGGAGAG TATATAAAAG CAGTGCGCCT TGGTGCAGAA 661 GTACAGAGCT TAGGACACAG AGCACATCAA AAGTTCCCAA AGAGGGCTTG CTCTCTCTTC 721 ACCTGCTCTG TTCTACAGCA CACTACCAGA AGGTAAGATG ATTATA //
Note: If the result includes 50 or less sequences, you can customize the list of matches for export by using the filter feature of the result table(s), else all matches will be exported.
All information available in the result table (like matrix family name, position, tissue association and lines of evidence) can be exported to a file in Microsoft Excel™ format. If Genomatix promoter sequences are analyzed, the Excel file also includes the genomic location of the matches (chromosome, start and end position on the chromosome). You can save this file to your local disk and/or open it directly with Microsoft Excel™ (or other software tools supporting this format, like OpenOffice.org Calc).
Note: If the result includes 50 or less sequences, you can customize the list of matches for export by using the filter feature of the result table(s), else all matches will be exported.
All information available in the result table (like matrix family name, position, tissue association and lines of evidence) can be exported to a file in TSV format. If Genomatix promoter sequences are analyzed, the TSV file also includes the genomic location of the matches (chromosome, start and end position on the chromosome).
Note: If the result includes 50 or less sequences, you can customize the list of matches for export by using the filter feature of the result table(s), else all matches will be exported.
If the promoter input option has been used, the genomic positions of the matches (chromosome, start position, end position, strand) can be exported to BED format. The name of the matrix/matrix family and the matrix similarity are included in column 4 and 5 of the BED file.
For each matrix (or matrix family) the Match Summary Table lists
how many matches were found in total (Match Total), in how many sequences (Common to seq.) and
and how often it matches in each input sequence.
Additionally, a significance value (see p-value)
is given for each common TF site.
Clicking on one of the column headers will sort the table by this column (ascending or descending as depicted by the little yellow arrows).
The complete table can also be saved in Microsoft Excel™ format (as at most 30 sequences or columns can be displayed in the interactive table).
With MatInspector release 8.0 came a new main feature: the support of matrix matches by lines of evidence. The following lines of evidence are available:
Category | Evidence | Description | Required settings |
---|---|---|---|
Experimental Evidence | Constrained elements |
Example: overlaps with constrained element This evidence is listed, if a match of a transcription factor overlaps with a region of the human genome that is annotated as an evolutionary constrained element (PI-elements, according to Kerstin Lindblad-Toh et al., Nature 478: 476-482, 2011).The original constrained element data was pre-processed by Genomatix to overlap with promoters but not with exons. |
|
ChIP-Seq data |
Example: correlates with ChIPSeq peak of V$STAT (TF: STAT3) Each matrix family is associated with at least one transcription factor. If a match of a matrix family is found to be overlapping with a ChIP-Seq region for the corresponding TF, this is listed as evidence.For details on ChIP-Seq data from ENCODE, please see this page. |
|
|
Literature Evidence | Known interaction |
Example: GCG is regulated by V$GATA (TF: GATA4) Each matrix family is associated with at least one transcription factor. If an input gene
and one of these transcription factors are known to interact, the matrix family is marked
as having this line of evidence.
Starting with MatInspector Release 8.3 only evidence for "gene of interest is regulated by TF" are shown (not vice versa). The Gene <-> TF interactions are Genomatix propietary expert curated information based
on literature analysis. |
|
Known cocitation |
Example: GCG is cocited with V$PAX (TF: PAX4, PAX6) Each matrix family is associated with at least one transcription factor. If an input gene
and one of these transcription factors are co-cited in PubMed, the matrix family
is marked as having this line of evidence. The co-citation has to occur
within the same sentence of a PubMed abstract.
Starting with MatInspector Release 8.3 a co-citation additionally requires a function word (like activates, affects, regulates) between the two genes in the sentence. The co-citation information is derived by automatical literature analysis
(LitInspector). |
||
Promoter Modules Evidence | Promoter module |
Example: part of model match of NFAT_SORY_01 Depending on the matrices/matrix group which are selected for analysis with MatInspector, the input sequences are examined for occurrences of promoter modules of the Genomatix module library, resp. your own user-defined modules (if you selected the "Check for user-defined models" option). Each matrix match which is also part of a model match, is flagged as supporting this line of evidence. |
|
MatInspector large scale is a version of MatInspector that allows you to retrieve analysis results with up to 10,000,000 matches (whereas the normal version allows only 10,000). In order to keep the results manageable, only the analysis parameters and match summary are shown on the result page. There is also the option to download the matches in TSV format. The additional lines of evidence for the matches are not available.
MatInspector is described in the following publications:
Reference for PLACE (included as IUPAC library into MatInspector):
© 2022 Precigen Bioinformatics Germany GmbH - All rights reserved |