Genomatix-Logo
Overview of Help-Pages

Genomatix: Mapping NGS Reads to the Genome


[Introduction] [Parameters] [Output]

Introduction

This task allows mapping of sequence reads from all established NGS Systems onto genomes from various species, resulting in BAM files with mapped reads which can directly be used as input for further analysis on the GGA, like structural variant calling, ChIPSeq workflow or RNA expression analysis.

Two different mapping algorithms are currently available:


Parameters

Input data
Available Files
Input data should contain the input reads / tags from the sequencing. The following formats are accepted: Within this section you can either
  • choose from previously uploaded Read files
  • or add a new Read file to the list (by clicking "Add Read file...")
You can use shift/ctrl-keys to select multiple files from the list. All selected files will then be used as input (also see merge option below).

When adding a new file, a new window will open, asking you to either

  • upload one or several Read files from your local computer
  • or import one or several Read files from the GGA (see more details)
Note: Paired-end files must be uploaded together to be recognized as paired end!

Note that almost all browsers have a general upload limit of 2 GB, i.e. files bigger than this size should be zipped before uploading from your local computer. This restriction does not apply when using the direct import from the GGA/GMS.

Optionally you can specify a name for saving uploaded files on the server, otherwise the name of the uploaded file will be used. If several files are uploaded, the string given here will be used as prefix for each file name.

After one or several Read files were uploaded successfully, and after closing the popup window, the list of available Read files will be automatically updated.

Uploaded Read files can be deleted from the project anytime via the project management.

Merge results If this checkbox is ticked, all mapping results from several input files will be merged into one single BAM output file. Otherwise, each input file will result in a single BAM file which will be stored separately into the result management.
Genomatix Mapper Parameters
Sequencing
Antisense directed
Select this checkbox in case of an antisense sequencing experiment (Library type of the RNA-seq protocol.)
Specifies whether the strand generated during first strand synthesis (antisense strand) was sequenced. If this option is set, the reads will be reverse complemented before mapping.
Seed Mapping
This parameter controls the seed search strategy:
Select fast for an exact search of seeds from the mapping library.
Select deep for a seed search with one error tolerance.

Note: The seed search mode should be selected according to the sequence length and sequencing error rate. The 'deep' mode should be used if the sequences are short (< 50 bps) or if the expected error rate is high (> 10%). Otherwise the 'fast' mode can be used without losing a significant amount of good hits compared to the deep mode.

Please note that for sequences ≥ 75bps this parameter will be automatically set to 'fast'.
Alignment
Alignment type
  • ungapped alignment:
    Alignment of the complete sequence read to a region in the target sequence, identified via a shortest unique seed is simply done by counting point mutations without consideration of insertions/deletions.
  • gapped alignment:
    Here, a Needleman-Wunsch alignment is performed, allowing point mutations, insertions, and deletions, e.g. if homopolymer resolution is an issue.
Please note that gapped alignment is required for subsequent Small Variant Detection
Alignment quality can be selected in two different ways:
  1. By setting a Minimum alignment quality (0-100%):
    This parameter determines the minimum overall alignment quality which is reported in the output files. The alignment quality is calculated by the number of aligned nucleotides divided by the overall alignment length. A minimum alignment quality can also be specified via minimum number of allowed point mutations / insertions / deletions (see the following two parameters). If the alignment quality should be specified via minimum allowed numbers of insertions/deletions and point mutations rather than the overall alignment quality, please do not select this option.

  2. By setting the Maximum number of allowed point mutations / insertions / deletions:
    • Maximum number of allowed point mutations >= 0:
      This parameter determines the maximum number of allowed point mutations in the overall alignment which is reported in the output files.
      Please note that this parameter is only active if the minimum alignment quality parameter is not selected.
    • Maximum number of allowed insertions/deletions >= 0 (only for gapped alignment):
      This parameter determines the maximum number of allowed insertions/deletions in the overall alignment which is reported in the output files.
      Please note that this parameter is only active if the gapped alignment ist selected above.
Note: The choice of the alignment method depends on the sequencer which generated the results:
  • For Illumina (Solexa) sequencers the expected rate of insertions/deletions is very low, therefore the ungapped alignment is sufficient in most cases.
  • Vendors like IonTorrent or 454 should be mapped with gapped alignment due to the inherent rate of insertions and deletions. However, IonTorrent and 454 generally produce sequences of a significant length and therefore these sequences can be mapped in the combination 'fast seed search / gapped alignment'.
  • 92% for Illumina sequences and 85% for SOLID reads have been found to be good hallmarks for the alignment quality parameter.
Linker
If linker sequences must be removed from the sequence reads, a file containing the linkers can be uploaded here.
Linker sequences will be identified in the sequence reads and the reads will be trimmed accordingly.
Format: The linker file should be a plain text file with each line containing a single linker sequence. Linker sequences are given in plain format i.e. without header.

Example for a linker file:
	AGCGAGGCGAT
	ACGGGAGGCTTTATGA
	GTCGAGTATGGAT
Masking
These parameters determine the number of base pairs at the ends of the sequence reads which should be excluded from the mapping.
  • Read1 trim 5'
    Number of base pairs at the 5'end of the sequence reads which should be trimmed. In case of paired end sequencing this parameter refers to the first read.
  • Read1 trim 3'
    Number of base pairs at the 3'end of the sequence reads which should be trimmed. In case of paired end sequencing this parameter refers to the first read.
  • Read2 trim 5'
    Number of base pairs at the 5'end of the second sequence read in case of paired end sequencing. Not available in case of single read sequencing.
  • Read3 trim 3'
    Number of base pairs at the 3'end of the second sequence read in case of paired end sequencing. Not available in case of single read sequencing.
Output Options
Reporting of matches:
  • report unique only
    Only the best unique hit is reported in the resulting BAM file
  • report all incl. multiple hits
    Here, the algorithm reports also mappings which could be aligned at multiple positions of the target sequence with equal best alignment quality.
de novo splicing
If this parameter is set, a de novo splice junction detection via spliced alignment of the sequence reads will be calculated.
  • local
    a local spliced alignment is computed (based on the Genomatix ExonMapper algorithm, see details)
  • global
    global spliced alignment is performed, i.e. the program searches for splice events in a genome wide manner (details)
  • both
    both - local and global - spliced alignments are computed
Bowtie Parameters
For details on the algorithm please refer to the Bowtie2 Sourceforge page
Trimming of reads
  • trim 5':
    number of bases to trim from 5' / left end of reads
  • trim 3':
    number of bases to trim from 3' / right end of reads
Alignment
  • end to end :
    the entire read must align; no clipping
  • local:
    local alignment; ends might be soft clipped
Sensitivity
The details of the sensitivity setting depend on the type of alignment selected:
  • for end-to-end the following internal Bowtie2 parameters are used:
    • very-fast : -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
    • fast: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
    • sensitive: -D 15 -R 2 -N 0 -L 22 -i S,1,1.15
    • very-sensitive: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
  • for local
    • very-fast-local: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
    • fast-local: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
    • sensitive-local: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75
    • very-sensitive-local: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
Multiple Alignments
  • report best hit: look for multiple alignments, report best, with MAPQ
  • report best 3 hits: report up to 3 alignments in BAM file
Output
Result Here, you can edit the default name of the result file.
Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!

Output

The output has a number of sections, depending on the input and parameters:

  1. Analysis Parameters
  2. Overview
  3. Hit distribution
  4. Alignment Quality
  5. Various statistics on BAM file
  6. Download of Results

The result sections are described in detail below.


1. Analysis Parameters


2. Overview

Mapping Overview

A table with the main results that were found during mapping. The categories (like "unique hits", "ignored hits") depend on the selected mapping algorithm and differ between Bowtie2 and the Genomatix Mapper. Here is a description of the categories for the Genomatix Mapper.

Example for a Mapping Overview table from the Genomatix Mapper:

Total Reads 92579731 100.00 %
insufficient quality hits 12798123 13.82 %
multiple hits 14727988 15.91 %
ignored hits 27 0.00 %
unique hits 64246603 69.40 %
ambiguous hits 806990 0.87 %

Example for a Mapping Overview table from Bowtie2:

Total Read Pairs 99999 100.00 %
discordant pairs, unique 88 0.09 %
concordant pairs, unique 87360 87.36 %
not mapped as pair 65 0.07 %
concordant pairs, multiple 12486 12.49 %

BAM file Overview

The table "BAM file Overview" contains info on the number of mapped, unmapped, skipped reads and further details.
It corresponds to the read statistics of the BAM file statistics. For an example screenshot, see there.


3. Hit Distribution

In this chart the Mapping Overview is displayed graphically. Here's an example:

The data is displayed in a pie chart and can either be downloaded in various graphic formats (PNG, JPEG, PDF, SVG) or as tab-separated text file.


4. Alignment Quality

This column chart shows the alignment quality profile for the unique hits and the multiple hits (only if selected in the parameters) respectively.

Example:

The rightmost column shows the number and percentage of perfectly mapped reads (alignment quality = 100). Also shown are the reads mapping with one (alignment quality 97%) or two mismatches. Moving the mouse pointer over one of the columns shows the numbers.
Note that you can zoom into the graphics by selecting a x-range with the mouse. To show the default range, click the "reset zoom" button.


5. Various statistics on BAM file

The following statistics for the BAM file will be displayed:

For examples and details please see the BAM statistics help page.
The data is displayed graphically and can either be downloaded in various graphic formats (PNG, JPEG, PDF, SVG) or as tab-separated text file.


6. Download of Results

All additional files that were created during the mapping process can be downloaded as an archive (tar-file).
For example, if the Genomatix Mapper was started with default parameters three extra text files are available: For a detailed description please see the Genomatix Mapper Output section.