Genomatix-Logo
Overview of Help-Pages

Genomatix Tools


[Input] [Reverse] [Extract] [Extract from database] [Reformat] [Statistics] [Compare lists]


Genomatix Tools are little helpers that have a web interface for everyday tasks. The following tasks are available:

The output sequence is displayed on the screen and can then be saved to the local disk.


Input

Sequence Input
Choose from your previously uploaded sequences Select a sequence file from the list of your personal sequence files.
or enter the formatted DNA sequence(s) Enter your correctly formatted sequence(s) directly into the form, e.g. with copy and paste.
The following formats are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped!
or upload a file containing sequence(s) (max. 100 MB) If your browser supports this option, a sequence file can be uploaded.
If you use this option, the file should contain the sequence(s) in either one of the following formats: Please note, that the size for uploaded files is limited to 100MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1).
or enter accession number(s) If you are interested in one or several special sequences from a database section, you can supply a list of correct accession numbers in the form. If you want to select more than one accession number, please separate the accession numbers by commas or spaces.

On the Genomatix server accession numbers from the following databases can be entered:

  • GenBank (sections Bacteria, Invertebrates, Other Mammalian, Other Vertebrates, Plants, Primates, Rodents, Viruses, ESTs) (e.g. 'M65229')
  • Eukaryotic Promoter Database (EPD) (e.g. 'EP30014')
  • NCBI Reference Sequences (mRNA sequences) (e.g. 'NM_000402')
  • Genomatix Promoter Database (e.g. 'GXP_107276')
  • dbSNP (e.g. 'rs1234')

Tasks

Reverse-complement sequence(s)

When choosing this task the complete input sequence is reverse-complemented, i.e. the 5'-3' antisense strand is given in the output.

Parameters
Output format Please select the format of the output sequences.

Possible output formats are


Extract parts of sequence(s)

The user-defined piece of DNA sequence is cut out from the input sequence and returned as result.

Parameters
Extracting There are two ways to define the positions of the sequence to extract:
  • either by start position and end position
  • or by start position and length of the piece

Example:

To extract the positions 500 to 600 from a 1000 bp sequence, enter

  • either "500" for start position and "600" for end position
  • or "500" for start position and "100" for length
Output format Please select the format of the output sequences.

Possible output formats are


Extract sequence(s) from database

The user-selected sequences (identified by their accession numbers) are extracted from the available databases and returned as result.

Sequence Selection
Accession number(s) Enter the accession number(s) of the sequences you want to extract. In case you want to extract more than one sequence, please separate the accession numbers by commas or spaces.
Parameters
Output format Please select the format of the output sequences.

Possible output formats are

If you extract the sequences in EMBL or GenBank format all annotations of the original sequence will be retained. If you extract the sequences in IG or FASTA format only the sequence itself will be extracted.


Reformat sequence(s)

This task changes the format of a given sequence to FASTA, IG, EMBL, or GenBank. This can be helpful for creating sequence sets from different sources: simply reformat all sequences to one common format and copy them into one file, so the set can be used for further analysis.

Parameters
Output format Please select the format of the output sequences.

Possible output formats are


Create sequence statistics

This task will create tables with statistics regarding the input sequence.

Parameters
Statistics for Depending on the user's selection from the checkbox group the
  • AT/GC-content
  • mono-nucleotide numbers and frequencies
  • di-nucleotide numbers
  • tri-nucleotide numbers
are listed in the output.
Here is an example output for the accession number "U03518".

If there is more than one sequence in the input (e.g. when uploading a sequence file) the user can choose between statistics for

  1. all sequences
    • a summary of the nucleotide frequencies for the total of all basepairs found in the input is displayed
  2. each single input sequence
    • for each sequence the three tables with the nucleotides frequencies will be displayed


Compare (merge/intersect) two lists

This task compares two lists of elements (e.g. sequence names, gene names or accession numbers). The result shows the union and the intersection of the two lists, the elements that are in only one of the two lists and multiple elements in each of the lists. The results can be exported to Excel.

Parameters
List 1 Enter the first list of elements separated by blanks, newlines or commas. (e.g. "Abcg8,Abhd2,Ace2,Actl7a")
List 2 Enter the second list of elements separated by blanks, newlines or commas. (e.g. "Ace2,Actn3,Adam15,Adam1a,Actl7a")
Associated values For both lists, additional lists of associated values can be entered. To be associated correctly to the main list-values they must be the same number and order as the list elements.
The associated values are optional. This option is helpful, when comparing output lists from e.g. GePS where expression values are assigned to gene names. The output will display the associated values together with the gene names. In case of genes that occur in both lists, both associated values are given (see example below).
Case Sensitivity Check if uppercase and lowercase letters should be distinguished in the comparison (default is case-sensitive).
Compute Probability Usually your input lists are subsets of a large list of entities (the "population", in statistical terms), e.g. genes or promoters. Based on the lengths m and n of the input lists (counting only unique elements!), the cardinality i of the intersection of the two lists and the cardinality N of the population (a positive interger, which you must enter into the textfield), two probability values are computed:
  • the probability that two lists of m resp. n elements picked randomly from the population have exactly i elements in common, and
  • the probability that two lists of m resp. n elements picked randomly from the population have at least i elements in common
In the output, the probabilities are printed both as percent values and in scientific notation.
If you do not enter a value into the textfield, the probabilities are not computed.

Example Input:

List1Associated values for List1List2Associated values for List2
Abcg8
Abhd2
Ace2
Actl7a
0,1234
-0,3
-0,14
0,34
Ace2
Actn3
Adam15
Adam1a
Actl7a
0,4
-0,4
0,34
0,14
0,35

Example Output:

Case-sensitive Comparison of Lists
Input List1 4 elements,
4 of them unique
 Abcg8, Abhd2, Ace2, Actl7a
0 elements 
Input List2 5 elements,
5 of them unique
 Ace2, Actl7a, Actn3, Adam15, Adam1a
0 elements 
7 elements Abcg8, Abhd2, Ace2, Actl7a, Actn3, Adam15, Adam1a
2 elements Ace2, Actl7a
2 elements Abcg8, Abhd2
3 elements Actn3, Adam15, Adam1a
Probability values probability that 2 random subsets having 5 resp. 4 elements, picked from a set of 20 elements, have an intersection of exactly 2 elements is 21.6718% ( 0.2167182663e0 )
of at least 2 elements is 24.8710% ( 0.2487100103e0 )

When exporting the intersection of both lists to Excel the output looks like this:

List elementassociated value from List1associated value from List2
Ace2-0,140,4
Actl7a0,340,35