Genomatix-Logo
Overview of Help-Pages

Genomic regions: BED File Format - Description


Most NGS Analysis tasks expect descriptions of genomic regions as input. This data should be supplied in the BED file format, which is described below. Generally, the chromosomal input regions are specified by chromosome number, start position and end position.
For another description of the BED file format you can also visit the UCSC page.

Genomatix programs also accept files in bigBED format, which stores the region data in compressed binary indexed files, allowing improved processing time and a smaller memory footprint. More details on the bigBED format can be found at UCSC.

Note, that BED and bigBED files and the positions are dependent on the underlying genomic sequence, i.e. the genome build which is defined in the Genomatix Suite through the selection of an ElDorado version. The Eldorado version can be changed in the "Projects & Account -> Account" section on top of each page.

If you need to convert your BED file from a genome version that is currently not available in ElDorado, you can use the LiftOver service at UCSC.

Description of the BED file format accepted by Genomatix

BED files are tab-delimited files with one line for each genomic region.
Optionally, columns containing a name for the region, a score and the strand orientation (+/-) can be added.
The lines of a BED file that describe a genomic region have three required fields and additional optional fields with tabs as delimiters. The original BED format allows up to 9 optional fields, but only the first three of those (ID, score and strand) are read and used by Genomatix programs.

The first three (required) BED fields are:

  1. chromosome
    The designation of the chromosome in one of the following notations:
    Note that Genomatix allows both 'M' and 'MT' as designation for the mitochondrial chromsomes.
  2. start
    The starting position of the region in the chromosome. The first base in a chromosome is numbered 0.
  3. end
    The ending position of the region in the chromosome. The end base is not included in the display of the feature.
    For example, the first 100 bases of a chromosome are defined in a BED file as start=0, end=100.
    In the Genomatix ElDorado Database this corresponds to bases 1-100 on the chromosome.
NOTE: BED file format is zero-based and half-open, whereas numbering of genomic positions in Genomatix programs is based at 1 and includes the end position!

The three additional optional BED fields read by Genomatix programs are:

  1. name or ID
    Defines the name of the BED line, or - for Genomatix - an ID for the region.
  2. score
    A score assigned to the region (in the original BED format the score is between 0 and 1000, Genomatix allows other values).
  3. strand
    Defines the strand: can be either '+' or '-'. Additionally, Genomatix allows a '0' (zero) for strand, where no strand information is available.

Please note the following (from the BED file specification):
The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used.

Comments and all lines that do not match the format described above (starting with "chr" and containing at least two integers with genomic positions) are skipped. Depending on the task, the input file may contain a maximum number of regions. This limit is displayed on the input page of each task.

If you need to analyze larger sets of genomic regions, please contact support@genomatix.de.

Example BED file (tab separated values)

this is a comment line: experiment xy
# this is also a comment
chr1	26519270	26519623	read1	 70	+
chr1	39723904	39724119
chr2	10841542	10841853	read3	 80	-
chr2	88937859	88938309	read4
chrY	 1235555	 2335575		 90