![]() |
![]() |
DiAlign is a (DNA or protein) alignment program that relies on comparison of whole segments of sequences instead of comparison of single nucleic/amino acids.
The program DiAlign constructs alignments from gapfree pairs of similar segments of the sequences. Such segment pairs are referred to as diagonals.
Every possible diagonal is given a so-called weight reflecting the degree of similarity among the two segments involved. The overall score of an alignment is then defined as the sum of weights of the diagonals it consists of and the program finds an alignment with a maximum score -- in other words: the program tries to find a consistent collection of diagonals with a maximum sum of weights.
DiAlign does not use any gap penalty, thus avoiding this critical parameter. Consequently the program is especially suited to detect local similarities in otherwise completely unrelated sequences.
In the example below, the sequence segments corresponding to diagonals are underlined in each sequence. The color corresponds to the segment of the other sequence involved in the same diagonal. Lower case letters indicate amino acids that are not included in any diagonal and remained unaligned. The first diagonal shown in the alignment consists of the TPLPSH segment of HTLV_II and the APLPIH segment of HBV. The rows of * signs below the alignment symbolize the degree of overlapping diagonals at each point.
Mathematical details of the algorithm are described in Morgenstern et al., 1996 (Proc. Natl. Acad. Sci. USA) and a more general description including application examples is given in Morgenstern et al., 1998 (Bioinformatics).
| Sequence Input | |
|---|---|
| Choose from your previously uploaded sequences | Select a sequence file from the list of your personal sequence files. |
| or enter the formatted DNA sequence(s) | Enter your correctly formatted sequence(s) directly into the
form, e.g. with copy and paste. The following formats are accepted: There should be only IUPAC characters in the sequence, any other characters will be skipped! |
| or upload a file containing sequence(s) (max. 100 MB) | If your browser supports this option, a sequence file can be uploaded. If you use this option, the file should contain the sequence(s) in either one of the following formats: Please note, that the size for uploaded files is limited to 100MB. If you want to analyze larger sequences please contact support@genomatix.de. For whole chromosomes you can use the accession number option below (e.g. 'NC_000001' for human chromosome 1). |
| or enter accession number(s) |
If you are interested in one or several special
sequences from a database section, you can supply a list of correct accession
numbers in the form. If you want to select more than one accession number,
please separate the accession numbers by commas or spaces.
On the Genomatix server accession numbers from the following databases can be entered:
|
| Sequence Types | |
|---|---|
Please also check if your sequence is supposed to be read as
|
|
| Alignment Parameters | |
|---|---|
| Type of sequence | DiAlign uses information on the loaded sequences for the alignment.
In case of protein sequences there is no further choice, but in case of DNA the sequences can be
|
| Threshold T | As described above,
DiAlign uses diagonals to construct an alignment. The threshold T influences
the set of used diagonals: with T > 0, a diagonal is considered
for alignment only if its weight exceeds this threshold. Regions
of lower similarity are not aligned.
DiAlign usually produces reasonable alignments without a threshold,
i.e. with T = 0. These parameters are hidden by default. You can use the
|
| Output Parameters | |
| Display of alignment | '*' signs below alignment
'*' characters are used in the DiAlign output to create a pseudo-graphical representation indicating
In the first two cases, the user can specify the maximum number of '*' characters per column in the program output thus changing the resolution of the graphics. In the other two cases, one '*' signs denotes identical or variable positions, respectively. The latter two options are especially suited for very similar sequences where one is interested only in the mismatches within an alignment. These parameters are hidden by default. You can use the
|
| Color coding within alignment
By default, the nucleic/amino acids in the DiAlign output that were actually aligned (diagonals) are color coded.
The color-code-option can be switched off to get a black-and-white result. These parameters are hidden by default. You can use the
|
|
| Do not show non-aligned blocks
This option is set by default. Non-aligned blocks are removed from the DiAlign alignment. One or more omitted non-aligned blocks are indicated by three dots. This option is especially suited to reduce the size of the alignment when a long sequence is aligned with a very short sequence (e.g. genomic sequence with corresponding mRNA). Switch off this option in case you want to see the complete alignment. Number of nucleic/amino acids per line The default number of nucleic/amino acids per line in the alignment output is 50. It can be set to 0 (= unlimited) so that the complete alignment is shown in one line. These parameters are hidden by default. You can use the
|
|
| Additional output | Additional output of pairwise
sequence similarities
With this option the similarity (relative to the maximum similarity) and the number of aligned nucleic/amino acids is shown for each pairwise alignment.
This option is suited to identify pairs of sequences that are very similar. These parameters are hidden by default. You can use the
|
| Additional output of alignment in FASTA
format
With this option the alignment is additionally displayed in FASTA format (e.g. if the alignment is used as input for other programs). By default, the program displays the output only in DiAlign format for easy interpretation. These parameters are hidden by default. You can use the
|
|
| Additional output of sequence tree
With this option a sequence tree in PHYLIP format can be displayed in the output. This tree is constructed by applying the UPGMA clustering method to the DiAlign similarity scores. It roughly reflects the different degrees of similarity among the sequences. For detailed phylogenetic analysis, we recommend the usual methods for phylogenetic reconstruction. These parameters are hidden by default. You can use the
|
|
| Email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! |
HTLV2 1 ldtapcLFSD GS------PQ KAAYVLWDQT IL---QQDIT PLPSHethSA MMLV 1 pdadhtwYTD GSSLLQEGQR KAGAAVTTET eviwaKALDA G---T---SA HEPB 1 rpglcQVFAD AT------PT GWGLVMGHQR MR---GTFSA PLPIHt---- ECOL 1 mlkqvEIFTD GSCLGNPGPG GYGAILRYRG RE---KTFSA GytrT---TN ***** ********** ********** ** ***** ***** ** **** ** ** ********** ** ***** ***** ** *** ** ** ********** ** ***** ** ****** HTLV2 42 QKGELLALIC GLRAAKPWPS LNIFLDSKYL IKYLHslaig aflgtsah-- MMLV 45 QRAELIALTQ ALKMAEgkk- LNVYTDSRYA FATAHIHGEI YRRRGLLTSE HEPB 38 --AELLAACF Arsrsgan-- -IIGTDN--- ---------- ---------- ECOL 45 NRMELMAAIV ALEALKEHCE VILSTDSQYV RQGITQWIHN WKKRGWKTAD ********** ********** ********** ********** ********** ********** ********** ********** ********** ********** ******* ****** ********** ***** ******* ****** ********** ***** ******** ...
For each pairwise alignment, the similarity (relative to the maximum similarity) and the number of aligned amino acids (in % of shorter sequence is given. Maximum values are underlined.
| MMLV (157 bp) |
HEPB (141 bp) |
ECOL (155 bp) |
|
|---|---|---|---|
| HTLV2 (135 bp) |
0.846 65 % |
0.383 25 % |
0.518 35 % |
| MMLV (157 bp) |
0.002 10 % |
1.000 63 % |
|
| HEPB (141 bp) |
0.460 55 % |
Please note that the similarity value 1.000 marks only the two most similar sequences, it does not necessarily mean that these sequences are identical.
>HTL2 ldtapcLFSDGS------PQKAAYVLWDQTIL---QQDITPLPSHethSA QKGELLALICGLRAAKPWPSLNIFLDSKYLIKYLHslaigaflgtsah-- -------QT---LQAALPPLLQGKTIYLHHVRSHT------NLPDPISTF NEYTDSLILApl-------------------------------------- ---------- >MMLV pdadhtwYTDGSSLLQEGQRKAGAAVTTETeviwaKALDAG---T---SA QRAELIALTQALKMAEgkk-LNVYTDSRYAFATAHIHGEIYRRRGLLTSE GKEIKNKDE---ILALLKALFLPKRLSIIHCPGHQ------KGHSAEARG NRMADQAARKAAITETPDTStll--------------------------- ---------- >HEPB rpglcQVFADAT------PTGWGLVMGHQRMR---GTFSAPLPIHt---- --AELLAACFArsrsgan---IIGTDN----------------------- -------------SVVLSR--------------KYTSFPWLLGCAANWI- LRGTSFVYVPSALNPADDPSrgrlglsrpllrlpfrpttgrtslyadsps vpshlpdrvh >ECOL mlkqvEIFTDGSCLGNPGPGGYGAILRYRGRE---KTFSAGytrT---TN NRMELMAAIVALEALKEHCEVILSTDSQYVRQGITQWIHNWKKRGWKTAD KKPVKNVDlwqrLDAALGQ--------------HQIKWEWVKGHAGHPE- NERCDELARAAAMNPTledtgyqvev------------------------ ----------
Trees can be visualized e.g. by the drawtree program contained in the PHYLIP software package.
((HTL2:0.111024, (MMLV:0.078471, ECOL:0.078471):0.032554):0.121218, HEPB:0.232242);
If you are interested in more details, the method is described in
The main changes of DiAlign2 compared to the first version of the program are described in
| © 1998-2013 Genomatix Software GmbH - All rights reserved |