Genomatix-Logo
Overview of Help-Pages
RegionMiner

BED File Format - Description


Most RegionMiner tasks expect descriptions of genomic regions as input. This data should be supplied in the BED file format, which is described below. Generally, the chromosomal input regions are specified by chromosome number, start position and end position.
For another description of the BED file format you can also visit the UCSC page.

Note, that BED files and the positions are dependent on the underlying genomic sequence, i.e. the genome build which is defined in the Genomatix Portal through the selection of an ElDorado version. The Eldorado version can be changed in the 'Personal' section on top of each page.

If you need to convert your BED file from a genome version that is currently not available in ElDorado, you can use the LiftOver service at UCSC.

Description of the BED file format accepted by Genomatix

BED files are tab-delimited files with one line for each genomic region.
Optionally, columns containing a name for the region, a score and the strand orientation (+/-) can be added.
The lines of a BED file that describe a genomic region have three required fields and additional optional fields with tabs as delimiters. The original BED format allows up to 9 optional fields, but only the first three of those (ID, score and strand) are read and used by Genomatix programs.

The first three (required) BED fields are:

  1. chromosome
    The name of the chromosome (e.g. chr3, chrY, chr2L)
    Note that Genomatix uses the designation 'chrMT' for mitochondrial chromsomes, while UCSC uses 'chrM'.
  2. start
    The starting position of the region in the chromosome. The first base in a chromosome is numbered 0.
  3. end
    The ending position of the region in the chromosome. The end base is not included in the display of the feature.
    For example, the first 100 bases of a chromosome are defined in a BED file as start=0, end=100.
    In the Genomatix ElDorado Database this corresponds to bases 1-100 on the chromosome.
NOTE: BED file format is zero-based and half-open, whereas numbering of genomic positions in Genomatix programs like Gene2Promoter, ElDorado or RegionMiner is based at 1 and includes the end position!

The three additional optional BED fields read by Genomatix programs are:

  1. name or ID
    Defines the name of the BED line, or - for Genomatix - an ID for the region.
  2. score
    A score assigned to the region (in the original BED format the score is between 0 and 1000, Genomatix allows other values).
  3. strand
    Defines the strand: can be either '+' or '-'. Additionally, Genomatix allows a '0' (zero) for strand, where no strand information is available.
Comments and all lines that do not match the format described above (starting with "chr" and containing at least two integers with genomic positions) are skipped.Depending on the RegionMiner task, the input file may contain a maximum number of regions. This limit is displayed on the input page of each task.
If you need to analyze larger sets of genomic regions, please contact support@genomatix.de.

Example BED file (tab separated values)

this is a comment line: experiment xy
# this is also a comment
chr1	26519270	26519623	read1	 70	+
chr1	39723904	39724119
chr2	10841542	10841853	read3	 80	-
chr2	88937859	88938309	read4
chrY	 1235555	 2335575		 90