Genomatix-Logo
Overview of Help-Pages

Genomatix: Search for orthologous regions in other species


[Introduction] [Parameters] [Output]

Introduction

This task identifies regions in genomes of different species that are orthologous to the regions in the input file (in BED, bigBed, or BAM format, input sequences should be > 50 bp).

To identify orthologous regions in a target species, a proprietary algorithm is used.

In a first step, homologous loci in the target organisms are searched in the ElDorado database (see Comparative Genomics). If no such loci are found, the flanking genes (up to 20 loci in both directions) are considered to find a syntenic region in the target organism. For the definition of a syntenic region, the two homologous genes in the target organism need to be on the same contig and must show the same relative strand orientation as the genes in the source organism.

In a second step, the input sequence is aligned to the syntenic region using a Smith-Waterman alignment. If the alignment fulfills the following criteria, the target region is listed in the output:

Sets of orthologous sequences can be saved, as well as analyzed for common TFBS patterns with FrameWorker or DiAlignTF to identify phylogenetically conserved regulatory structures.


Parameters

Input
Input

Input data are accepted in BED / bigBed file format or BAM file format containing the input regions. For some tasks BAM support might not be available.
The maximum amount of input regions and their maximum length can differ for the various tasks. The limits are usually shown on top of the input pages.

Within this section you can either
  • choose from previously uploaded BED/BAM files
  • or add a new BED or BAM file to the list (by clicking "Add BED/BAM file...")
For those tasks that allow to choose replicate data as input, you can use shift/ctrl-keys to select multiple files from the list. All selected files will then be treated as replicates.

When adding a new file, a new window will open, asking you to either

  • upload one or several BED/BAM files from your local computer
  • or import one or several BED/BAM files from the GMS (see more details)
  • or import one or several BED/BAM files from the GGA (see more details)
For the new BED/BAM files, you will have to select the correct organism, as the organism and the genome build are associated with the BED file for future use (the default is your latest choice in the current session).
Note that files critically depend on the underlying genome build, which can be changed by selecting a different ElDorado version on the top right of the page before uploading a file. You can see the list of genomes available in ElDorado.

Note that almost all browsers have a general upload limit of 2 GB, i.e. files bigger than this size should be zipped before uploading from your local computer. This restriction does not apply when using the direct import from the GGA/GMS.

Optionally you can specify a name for saving uploaded files on the server, otherwise the name of the uploaded file will be used. If several files are uploaded, the string given here will be used as prefix for each file name.

If any of the regions in the input file cannot be completely assigned to the selected genome (e.g. wrong chromosome numbering or wrong positions within a chromosome), an error message will appear and the regions will be skipped. If no valid region is found in an uploaded file, the complete file will be skipped.

After one or several BED/BAM files were uploaded successfully, and after closing the popup window, the list of available BED/BAM files will be automatically updated.

Uploaded BED or BAM files can be deleted from the project anytime via the project management.

or enter a single genomic region Alternatively you can enter the coordinates of a single stretch of genomic sequence here.
Please enter the chromosome or contig ID (Examples: chr1 or NC_000001) and valid positions on this chromosome. This will refer to the currently selected organism/genome version (top right).
Exclude short sequences
Exclude sequences The longer the input sequences the better the chances are to find relevant orthologous sequences. Short sequences can be skipped with this option. To use all input sequences set this to 0
Target
Target species The program searches the genomes of the species you select here for sequences which are orthologous to the regions in the input file. Depending on the source organism only a certain selection of target organisms is available (i.e. orthologs can be searched only within vertebrates, plants, or insects respectively).
Output
Result Here, you can edit the default name of the result file.
Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!

We recommend to use the email option for more than ca. 500 input regions.

Output

The output has three sections:

1. Analysis Parameters

2. Statistics of Regions and their Orthologous Sequences

Example screenshot

3. Orthologous Regions

For each input region, the output lists those species in which orthologs are found together with the chromosomal location and alignment score.
A click on the species name opens up a GenomeBrowser window with the found orthologous region.

Ortholog sets can be analyzed for common TFBS patterns with FrameWorker or DiAlignTF to identify phylogenetically conserved regulatory structures.

By combining the region selection via the "select-regions" buttons and a organism selection almost any combination of sequences can be extracted into one sequence file for further analysis. Input sequences and orthologous regions can be saved in the local file system or in your personal sequence directory.

Example screenshot