Genomatix-Logo
Overview of Help-Pages

Genomatix Sequence Tools


Genomatix Sequence Tools are little helpers that have a web interface for everyday tasks for DNA sequences. The following tasks are available:

The output sequence is displayed on the screen and can then be saved to the local disk.


Input

General: Sequence Formats
Accepted DNA sequence formats The following formats for DNA sequences are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped!
Sequence Input
Choose from your previously uploaded sequences Select a sequence file from the list of your personal sequence files which were saved in the result management in prior analyses (via "add sequences", see below).
Quick Upload new Paste your sequence(s) in the form field in one of the accepted formats (see above). Note that sequences pasted in the "quick upload" field are not saved for future use.
Add sequences

Sequences or sequence files uploaded here are automatically saved in the result management for later use:

Enter the formatted DNA sequence(s) Enter your correctly formatted sequence(s) directly into the form, e.g. with copy and paste (see above for accepted formats).
or upload a file containing sequence(s) (max. 100 MB) If your browser supports this option, a sequence file can be uploaded.
If you use this option, the file should contain the sequence(s) in either one of the formats listed above.
Please note, that the size for uploaded files is limited to 100 MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1).
Accession number(s) If you are interested in one or several special sequences from a database section, you can supply a list of accession numbers. If you want to select more than one accession number, please separate the accession numbers by commas or spaces.

On the Genomatix server accession numbers from the following databases can be entered:

  • GenBank (sections Bacteria, Invertebrates, Other Mammalian, Other Vertebrates, Plants, Primates, Rodents, Viruses, ESTs) (e.g. 'M65229')
  • Eukaryotic Promoter Database (EPD) (e.g. 'EP30014')
  • NCBI Reference Sequences (mRNA sequences) (e.g. 'NM_000402')
  • Genomatix Promoter Database (e.g. 'GXP_107276')
  • dbSNP (e.g. 'rs1234')

Tasks

Reverse-complement sequence(s)

When choosing this task the complete input sequence is reverse-complemented, i.e. the 5'-3' antisense strand is given in the output.

Parameters
Output format Please select the format of the output sequences.

Possible output formats are


Extract sequence(s) by genomic position or accession number

The user-selected sequences (identified by their accession numbers or via a genomic position) are extracted from the available databases/genomes and returned as result.

Sequence Selection
Accession number(s) Enter the accession number(s) of the sequences you want to extract. In case you want to extract more than one sequence, please separate the accession numbers by commas or spaces.
Region You can also enter a genomic region with its contig or chromosome ID (e.g. NC_000001 or chr1) together with either
  • a genomic start and end position (9500000 - 1000000)
  • or a band id (e.g. p35.3 or q41)
Note that the species / genome can be selected via the "Current Genome" option (top right of page).
Parameters
Output format Please select the format of the output sequences.

Possible output formats are

If you extract the sequences in EMBL or GenBank format all annotations of the original sequence will be retained. If you extract the sequences in IG or FASTA format only the sequence itself will be extracted.

Extract parts of sequence(s)

The user-defined piece of DNA sequence is cut out from the input sequence and returned as result.

Parameters
Extracting There are two ways to define the positions of the sequence to extract:
  • either by start position and end position
  • or by start position and length of the piece

Example:

To extract the positions 500 to 600 from a 1000 bp sequence, enter

  • either "500" for start position and "600" for end position
  • or "500" for start position and "100" for length
Output format Please select the format of the output sequences.

Possible output formats are


Reformat sequence(s)

This task changes the format of a given sequence to FASTA, IG, EMBL, or GenBank. This can be helpful for creating sequence sets from different sources: simply reformat all sequences to one common format and copy them into one file, so the set can be used for further analysis.

Parameters
Output format Please select the format of the output sequences.

Possible output formats are


Create sequence statistics

This task will create tables with statistics regarding the input sequence.

Parameters
Statistics for Depending on the user's selection from the checkbox group the
  • AT/GC-content
  • mono-nucleotide numbers and frequencies
  • di-nucleotide numbers
  • tri-nucleotide numbers
are listed in the output.
Here is an example output for the accession number "U03518".

If there is more than one sequence in the input (e.g. when uploading a sequence file) the user can choose between statistics for

  1. all sequences
    • a summary of the nucleotide frequencies for the total of all basepairs found in the input is displayed
  2. each single input sequence
    • for each sequence the three tables with the nucleotides frequencies will be displayed