Genomatix-Logo
Overview of Help-Pages

BED File Toolbox


[Introduction] [Input] [Actions]

Introduction

The BED file toolbox provides a number of tools that are often needed when handling BED files. The following actions can be performed:

General Parameters for the BED file toolbox

Input
Input

Input data are accepted in BED / bigBed file format or BAM file format containing the input regions. For some tasks BAM support might not be available.
The maximum amount of input regions and their maximum length can differ for the various tasks. The limits are usually shown on top of the input pages.

Within this section you can either
  • choose from previously uploaded BED/BAM files
  • or add a new BED or BAM file to the list (by clicking "Add BED/BAM file...")
For those tasks that allow to choose replicate data as input, you can use shift/ctrl-keys to select multiple files from the list. All selected files will then be treated as replicates.

When adding a new file, a new window will open, asking you to either

  • upload one or several BED/BAM files from your local computer
  • or import one or several BED/BAM files from the GMS (see more details)
  • or import one or several BED/BAM files from the GGA (see more details)
For the new BED/BAM files, you will have to select the correct organism, as the organism and the genome build are associated with the BED file for future use (the default is your latest choice in the current session).
Note that files critically depend on the underlying genome build, which can be changed by selecting a different ElDorado version on the top right of the page before uploading a file. You can see the list of genomes available in ElDorado.

Note that almost all browsers have a general upload limit of 2 GB, i.e. files bigger than this size should be zipped before uploading from your local computer. This restriction does not apply when using the direct import from the GGA/GMS.

Optionally you can specify a name for saving uploaded files on the server, otherwise the name of the uploaded file will be used. If several files are uploaded, the string given here will be used as prefix for each file name.

If any of the regions in the input file cannot be completely assigned to the selected genome (e.g. wrong chromosome numbering or wrong positions within a chromosome), an error message will appear and the regions will be skipped. If no valid region is found in an uploaded file, the complete file will be skipped.

After one or several BED/BAM files were uploaded successfully, and after closing the popup window, the list of available BED/BAM files will be automatically updated.

Uploaded BED or BAM files can be deleted from the project anytime via the project management.

Available Actions for BED files

image of available actions

The parameter selection for each of the available tasks is hidden by default, click on the reveal box next to the section header of the desired action to see the parameters.
Note that the pivotal factor for the selection of the action is the radio button in front of the action within the parameter selection. "Upload to current project" is set as default action, if nothing else is selected.


Conversion from BED format to a different format

Conversion to sequence

Tick the radio button to convert a BED file into the corresponding DNA sequences from the selected organism. To additionally extract basepairs up- and/or downstream of the regions, enter the number of basepairs to be extracted in the form.
The resulting sequence file is in FASTA format, with additional annotation as described in the Genomatix annotation syntax section. The file can be downloaded to the local computer or saved in the Sequences section of the Genomatix project management and can be used for other tasks that require sequence input.
Note: The extracted sequences will of course depend on the selected organism AND the selected ElDorado version (i.e. the genome build).

toolbox sequence conversion

Conversion to UCSC Genome Browser custom track

The input BED file will be converted to a BED file with information for display in the UCSC Genome Browser (i.e. containing "browser" and "track" lanes). The resulting file can be saved locally and can then be added to the UCSC Genome Browser as custom track. It will be displayed as "User Track generated by Genomatix" in dark blue, starting the display at about the first region found in the BED file.

Note that Genomatix uses the designation 'chrMT' for mitochondrial chromsomes, while UCSC uses 'chrM'.


Conversion from other formats to BED format

Conversion from GFF to BED format

To convert a file in GFF (General Feature Format) to BED format, a GFF file can be uploaded here. An additionally selected BED file in the Input section will be ignored.
The correct species for the input file must be supplied to allow checking correct chromosome numbering.

Conversion from wiggle to BED format

To convert a wiggle file (as described at UCSC) to BED format, a wiggle file can be uploaded here. An additionally selected BED file in the Input section will be ignored.
Note, that only "variableStep" and "fixedStep" wiggle files can be converted. Also, the uploaded file must contain the keywords "type wiggle_0" in the track line, otherwise they are not recognized as wiggle files.
The correct species for the input file must be supplied to allow checking correct chromosome numbering.

Conversion from Illumina export.txt to BED format

To convert a file in Illumina export.txt-format to BED format, an export.txt file can be uploaded with this option Such export files are generated by the GERALD step of the Illumina pipeline and consist of 22 columns within a tab-separated file (for details of the exact format e.g. check this Seqanswers thread). An additionally selected BED file in the Input section will be ignored.

Pileup or Duplicate Removal

Here duplicates can be removed from a BED file. The maximum pileup parameter allows setting the number of reads from a pileup to be kept in the resulting BED file.
A maximum pileup parameter of 1 will remove all redundant reads. A pileup threshold > 1 means that all reads, which occur more than n times will be discarded.
Note that only chromosome and position are used for comparison of reads, so even if two reads differ in score or Id they will be regarded as redundant if the position on the chromosome is identical.

Subset of Regions

Select one of these actions to extract a subset of the regions within a BED file. The result will also be a BED file which contains only the regions from the input file that fulfill the set criteria.

Criteria can be

For the length and score criteria, the lower or upper value can also be set to arbitrary by entering a "-" into the corresponding parameter field. This allows e.g. to select all regions with a length >= 100 bp from the input BED file without knowing the maximum length of the regions.

The resulting BED file can be downloaded and e.g. used for other NGS Analysis tasks.

toolbox subset extraction


Extension/Reduction/Trimming of regions

Extension/Reduction of regions

Select this action to extend all regions of a BED file by a certain number of basepairs/positions in either one direction (up- or downstream) or in both directions. The resulting BED file contains all valid regions, which are extended by the given extent, unless the chromosome ends are reached (i.e. the positions are not smaller than 0, and not larger than the corresponding chromosome). The regions can be shortened by entering negative extensions.
Example: if a region on chr1, positions 100 to 200 is extended by 50 basepairs in both directions, then the resulting region is chr1, positions 50 to 250.

Trim regions to the same size

When the regions within a BED file are of various length (e.g. IonTorrent reads), this action can be used to create a BED file where all regions have the same length.
The target length of the resulting regions can be set by the user. Longer reads are shortened, shorter reads are extended to the target length.
Additionally, the starting point of the trimming can be set. There are three different scenarios:
  1. start the trimming from the 5' end (chr1 - 1000 - 1200)
  2. start the trimming from the 3' end (chr1 - 1800 - 2000)
  3. start the trimming symmetrical around center of regions (chr1 - 1400 - 1600)
Given in parenthesis is the resulting region, for an example input region of "chr1 - 1000 - 2000" and target length set to 200 bps.
Note: The resulting regions might be shorter than the given target length, if the end of a chromosome is reached.

Sort regions

Here, the input BED file is sorted (chromosome alpha-numerically, positions numerically). A sorted input BED file can be of advantage for e.g. the NGSAnalyzer task, as the performance will be enhanced especially for repeated runs with the same input.
The resulting BED file can be downloaded and e.g. used for other NGS Analysis tasks.

toolbox sorting


Comparison of BED files

Here, two BED files can be compared to find unique or overlapping regions. Since we are comparing regions with a certain extension, parameters are necessary to define the "amount of overlap" that is used for analysis.
There are two parameters:

If one of the criteria above is fulfilled, two regions are counted as overlapping.
Examples:

  1. region X from position 1000 to 2000, and region Y from 1900 to 3000 are overlapping by criterion 1, if the overlap variable is set to 50
  2. region X from position 1000 to 1055, and region Y from 1050 to 3000 are overlapping by criterion 2 if the distance variable is set to 100
  3. region X from position 1000 to 2000, and region Y from 2040 to 2090 are overlapping by criterion 2 if the distance variable is set to 100

The resulting table shows the number of

Each of the above subsets can be selected, and can be downloaded as a BED file or can directly be saved into the Genomatix project management for further analysis with other tasks.

image of result of BED file comparison tool


Concatenation of BED files

Here, two or more BED files can be concatenated. Note, that you need to select the first file from the selection on top of the page, and the file(s) to add to the first files here. The files are simply concatenated while removing certain comment lines, but they are not sorted.


Mapping of sequence files to the genome to get BED files

If the mapping action is selected, a sequence file in FASTA or GenBank format must be supplied. The sequences are searched in the selected genome and - if found - the corresponding positions on chromosomes are written into a BED file. For finding the positions for mapping/alignment within the genome the same algorithm as for the sequence input in ElDorado is used.

The score of the resulting region in the BED file represents percentage of the input region that was aligned to the genomic sequence. Additionally, the alignment quality of the mapping is given in column 7 of the BED file (which is not used by other Genomatix programs).

There are two parameter settings:

This task is limited to a certain number of input sequences.