Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

SMARTest: Search for S/MAR candidates


[Introduction] [Input] [Parameters] [Output] [References]

Introduction

SMARTest is a software tool that utilizes a proprietary library of currently 97 S/MAR-associated weight matrices to test genomic DNA sequences for the occurrence of potential regions of S/MARs (Scaffold/Matrix Attachment Regions).

Training sequences for generation of the S/MAR matrix library of SMARTest were selected from the EMBL database, from literature and from the S/MAR database S/MARt DB.

SMARTest scans DNA sequences for matches to the S/MAR matrix library using a sliding window of 300 bp in length (which corresponds to the minimum length of a functional MAR). If the number of base pairs covered by S/MAR matrices in a window exceeds a defined threshold this region is reported as a S/MAR candidate. The threshold has been derived from the analysis of 6 genomic sequences with experimentally defined S/MARs and non-S/MARs.

Please note that SMARTest could only be evaluated on the very small set of available S/MAR and non S/MAR sequences that are experimentally defined. Therefore, a number of SMARTest matches are assumed to be false positives. Furthermore, the SMARTest-library contains only weight matrices that are associated with the AT-rich class of S/MARs. Therefore, the current version of SMARTest can only predict this class of S/MARs.


Input

General: Sequence Formats
Accepted DNA sequence formats The following formats for DNA sequences are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped!
Sequence Input
Choose from your previously uploaded sequences Select a sequence file from the list of your personal sequence files which were saved in the result management in prior analyses (via "add sequences", see below).
Quick Upload new Paste your sequence(s) in the form field in one of the accepted formats (see above). Note that sequences pasted in the "quick upload" field are not saved for future use.
Add sequences

Sequences or sequence files uploaded here are automatically saved in the result management for later use:

Enter the formatted DNA sequence(s) Enter your correctly formatted sequence(s) directly into the form, e.g. with copy and paste (see above for accepted formats).
or upload a file containing sequence(s) (max. 100 MB) If your browser supports this option, a sequence file can be uploaded.
If you use this option, the file should contain the sequence(s) in either one of the formats listed above.
Please note, that the size for uploaded files is limited to 100 MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1).
Accession number(s) If you are interested in one or several special sequences from a database section, you can supply a list of accession numbers. If you want to select more than one accession number, please separate the accession numbers by commas or spaces.

On the Genomatix server accession numbers from the following databases can be entered:

  • GenBank (sections Bacteria, Invertebrates, Other Mammalian, Other Vertebrates, Plants, Primates, Rodents, Viruses, ESTs) (e.g. 'M65229')
  • Eukaryotic Promoter Database (EPD) (e.g. 'EP30014')
  • NCBI Reference Sequences (mRNA sequences) (e.g. 'NM_000402')
  • Genomatix Promoter Database (e.g. 'GXP_107276')
  • dbSNP (e.g. 'rs1234')
Database input
Select one of these database-sections On the Genomatix server the following databases are available:
  • Genomatix Promoter Database: Promoters of annotated genes
    Subset of all human, mouse, and rat promoters. Promoters of
    hypothetical proteins (e.g Loc127262) or genes that are annotated as
    "similar to ..." (e.g. Loc419384) are omitted.
  • Genomatix Promoter Database: Promoters of all genes
    All promoter sequences extracted from ElDorado genomes with "Genomatix optimized length" (1,000 bp upstream of the first TSS and 100 bp downstream of the last TSS).
  • Genomatix ElDorado Genomes
    All genomes available in ElDorado (human, mouse, rat, chimpanzee, rhesus monkey, dog, opossum, platypus, cow, horse, chicken, zebrafish, fruitfly, Anopheles, honeybee, C. elegans, Arabidopsis and rice)
  • Other databases
    • Philipp Bucher's Eukaryotic Promoter Database (EPD)
    • NCBI Reference Sequences (mRNA sequences)
  • GenBank sections
    The sections Bacteria, Invertebrates, Other Mammalian, Other Vertebrates, Plants, Primates, Rodents, and Viral are available.

In case you have selected a section from the GenBank database you may also restrict the analysis to sequences containing user-defined keywords in their annotation. You can enter keywords which will be searched in

  • the keyword line of the annotation
  • the description line of the annotation
  • the complete annotation

The keyword searches can be combined with "AND" or "OR". Please note that the keywords cannot contain blanks (all blanks will be skipped).

These parameters are hidden by default. You can use the reveal box next to the section header to reveal them!

Please note: The length of the DNA sequence should be at least 300 bp since SMARTest uses a sliding window of this length to predict S/MAR candidates.


SMARTest Input Data

SMARTest Parameters
Max. number of matches Enter the maximum number of matches in the output file. Per default, at most 100 matches are shown in the output.
Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!


Program Output

SMARTest creates an output file that contains:

Example

The following example shows the SMARTest output for human alpha-1-antitrypsin and corticosteroid binding globulin intergenic region (accession number: AF156545), where two experimentally verified matrix attachment regions are described at positions:

(Rollini P., Namciu S. J., Marsden M. D., and Fournier R. E. (1999). Identification and characterization of nuclear matrix-attachment regions in the human serpin gene cluster at 14q32.1. Nucleic Acids Res. 27, 3779-3791).

Regions of potential S/MARs:

Inspecting sequence AF156545 [AF156545] (1 - 30461):

Homo sapiens alpha-1-antitrypsin and corticosteroid binding globulin intergenic region sequence.
Start End Length in bp
5461 5995 535
25736 26205 470
27461 27790 330

Total length of S/MAR regions in sequence: 1335 bp (4.4 %)


A total of 1 sequences were inspected by SMARTest (30461 bp)


Number of sequences containing S/MARs: 1
Number of sequences containing no S/MARs: 0
Overall content of S/MAR regions: 4.4%
Total number of predicted S/MAR regions: 3

References

If you are interested in more details, the SMARTest method and S/MARt DB are described in

SMARTest has recently been compared to two other SMAR-finding programs: