Genomatix-Logo
Overview of Help-Pages
RegionMiner

RegionMiner subtask: Search for orthologous regions in other species


[Introduction] [Parameters] [Output]

Introduction

This task identifies regions in genomes of different species that are orthologous to the regions in the input file (input sequences should be > 50 bp).

To identify orthologous regions in a target species, a proprietary algorithm is used.

In a first step, homologous loci in the target organisms are searched in the ElDorado database (see Comparative Genomics). If no such loci are found, the flanking genes (up to 20 loci in both directions) are considered to find a syntenic region in the target organism. For the definition of a syntenic region, the two homologous genes in the target organism need to be on the same contig and must show the same relative strand orientation as the genes in the source organism.

In a second step, the input sequence is aligned to the syntenic region using a Smith-Waterman alignment. If the alignment fulfills the following criteria, the target region is listed in the output:

Sets of orthologous sequences can be saved, as well as analyzed for common TFBS patterns with FrameWorker or DiAlignTF to identify phylogenetically conserved regulatory structures.


New in RegionMiner Release 3.2 (Dec. 2009):

Significantly improved algorithm to find orthologous regions in cases that were previously not found.


Parameters

Input
Input

Input data are accepted as a tab delimited file in BED / bigBed file format containing the input regions specified at least by chromosome number, start position and end position (in this order).
The maximum amount of regions and their maximum length can differ for various tasks. The limits are usually shown on top of the input pages.

Within this section you can either
  • choose from previously uploaded BED files
  • or add a new bed file to the list (by clicking "Add Bed file...")

When adding a new file, a new window will open, asking you to either

  • upload one or several BED files from your local computer
  • or import a BED file from the GMS (see more details)
  • or import a BED file from the GGA (see more details)
For the new BED files, you will have to select the correct organism, as the organism and the genome build are associated with the BED file for future use (the default is your latest choice in the current session).
Note that BED files critically depend on the underlying genome build, which can be changed by selecting a different ElDorado version on the top right of the page before uploading a BED file. You can see the list of genomes available in ElDorado.

Note that almost all browsers have a general upload limit of 2 GB, i.e. BED files bigger than this size should be zipped before uploading from your local computer. This restriction does not apply when using the direct import from the GGA/GMS.

Optionally you can specify a name for saving uploaded BED files on the server, otherwise the name of the uploaded file will be used. If several files are uploaded, the string given here will be used as prefix for each BED file name.

If any of the regions in the input file cannot be completely assigned to the selected genome (e.g. wrong chromosome numbering or wrong positions within a chromosome), an error message will appear and the regions will be skipped. If no valid region is found in an uploaded file, the complete file will be skipped.

After one or several BED files were uploaded successfully, and after closing the popup window, the list of available BED files will be automatically updated.

Uploaded BED files can be deleted from the project anytime via the project management.

Target
Target species The program searches the genomes of the species you select here for sequences which are orthologous to the regions in the input file. Depending on the source organism only a certain selection of target organisms is available (i.e. orthologs can be searched only within vertebrates, plants, or insects respectively).
Output
Result Here, you can edit the default name of the result file.
Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!

We recommend to use the email option for more than ca. 500 input regions.

Output

The output has three sections:

1. Analysis Parameters

2. Statistics of Regions and their Orthologous Sequences

Example screenshot

3. Orthologous Regions

For each input region, the output lists those species in which orthologs are found together with an alignment score.

Ortholog sets can be analyzed for common TFBS patterns with FrameWorker or DiAlignTF to identify phylogenetically conserved regulatory structures.

By combining the region selection via the "select-regions" buttons and a organism selection almost any combination of sequences can be extracted into one sequence file for further analysis. Input sequences and orthologous regions can be saved in the local file system or in your personal sequence directory.

Example screenshot