BED File Format - Description
Most RegionMiner tasks expect descriptions of genomic regions as input.
This data should be supplied in the BED file format, which is described below.
Generally, the chromosomal input regions are
specified by chromosome number, start position and end position.
For another description of the BED file format you can also visit
the UCSC page.
Note, that BED files and the positions are dependent on the underlying genomic sequence, i.e. the genome build
which is defined in the Genomatix Portal through the selection of an
ElDorado version. The Eldorado version can be changed in the
'Personal' section on top of each page.
If you need to convert
your BED file from a genome version that is currently not available in ElDorado, you can use the
LiftOver service at UCSC.
Description of the BED file format accepted by Genomatix
BED files are tab-delimited files with one line for each genomic region.
Optionally, columns containing a name for the region, a score and the
strand orientation (+/-) can be added.
The lines of a BED file that describe a genomic region have three required fields and additional optional fields with tabs as delimiters.
The original BED format allows up to 9 optional fields, but
only the first three of those (ID, score and strand) are read and used by Genomatix programs.
The first three (required) BED fields are:
- chromosome
The name of the chromosome (e.g. chr3, chrY, chr2L)
Note that Genomatix uses the designation 'chrMT' for mitochondrial chromsomes, while UCSC uses 'chrM'.
- start
The starting position of the region in the chromosome.
The first base in a chromosome is numbered 0.
- end
The ending position of the region in the chromosome.
The end base is not included in the display of the feature.
For example, the first 100 bases of a chromosome are defined in a BED file as start=0, end=100.
In the Genomatix ElDorado Database this corresponds to
bases 1-100 on the chromosome.
NOTE: BED file format is zero-based and half-open, whereas numbering
of genomic positions in Genomatix programs like Gene2Promoter, ElDorado or RegionMiner
is based at 1 and includes the end position!
The three additional optional BED fields read by Genomatix programs are:
- name or ID
Defines the name of the BED line, or - for Genomatix - an ID for the region.
- score
A score assigned to the region (in the original BED format the score is between 0 and 1000,
Genomatix allows other values).
- strand
Defines the strand: can be either '+' or '-'. Additionally, Genomatix allows a '0' (zero)
for strand, where no strand information is available.
Comments and all lines that do not match the format described above (starting with "chr" and containing
at least two integers with genomic positions) are skipped.Depending on the RegionMiner task, the input file may contain a maximum number of regions. This limit
is displayed on the input page of each task.
If you need to analyze larger sets
of genomic regions, please contact support@genomatix.de.
Example BED file (tab separated values)
this is a comment line: experiment xy
# this is also a comment
chr1 26519270 26519623 read1 70 +
chr1 39723904 39724119
chr2 10841542 10841853 read3 80 -
chr2 88937859 88938309 read4
chrY 1235555 2335575 90