Genomatix-Logo
Overview of Help-Pages

SICER - Spatial clustering for Identification of ChIP-Enriched Regions
(only available on GGA)


[Introduction] [Parameters] [Output]

Introduction

When a Genomatix NGS task performs peak finding (e.g. in the ChIPSeq workflow) the user can select the SICER algorithm, that analyzes tags from ChIP-Seq data sets and finds significant regions for further analysis with other tasks. It thus represents a clustering algorithm for ChIPSeq data like NGSAnalyzer or MACS.

SICER is particularly recommended when analysing histone modifications.

Details of the algorithm are described in

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data
Chongzhi Zang, Dustin E. Schones, Chen Zeng, Kairong Cui, Keji Zhao, and Weiqun Peng
Bioinformatics 25, 1952 - 1958 (2009)

Please cite this paper if you are using this algorithm in a published work.

This task automatically sets a number of parameters for SICER (e.g. the effective genome size), thus simplifying the usage.
The output comprises the original SICER output. Resulting clusters can be downloaded as BED files or directly saved to the project managment to be used with other tasks.


Parameters

SICER parameters
Redundancy threshold The number of copies of identical reads allowed in a library.
Window size Resolution of SICER algorithm. For histone modifications, one can use 200 bp.
Note from the SICER manual: The choice of window size and gap size has a large effect on outcome. In general, the broader the domain, the bigger the gap should be. For histone modifications H3K4me3, W=200 and (gap = 1 window) are suggested. For H3K27me3, W=200 and (gap = 3 windows) are suggested for first try. If even bigger gap size is found to work better, you might also want to try increasing the window size eg, window size = 1K, and gap = 3 windows)
Fragment size Is for determination of the amount of shift from the beginning of a read to the center of the DNA fragment represented by the read. FRAGMENT_SIZE=150 means the shift is 75.
Gap size Needs to be multiples of window size. Namely if the window size is 200, the gap size should be 0, 200, 400, 600, ...
FDR The FDR is calculated using p-value adjusted for multiple testing, following the approach developed by Benjamini and Hochberg.
E-value nr. of islands expected in random background, only if no control data supplied
Note: E-value is not p-value. Suggestion for first try on histone modification data: E-value=100. If you find ~10000 islands using this evalue, an empirical estimate of FDR is 1E-2


Output

Analysis Parameters

Here, the analysis parameters like input files, result name, database version (i.e. underlying genome), the SICER version and the SICER parameters are shown.

Cluster detection

A summary with cluster statistics is displayed, including number of clusters found, number of reads within clusters and avergae cluster length.

SICER Output

The original SICER output (except for the redundancy-removed input file) and all output files created by SICER can be found and downloaded at the details page after clicking on complete clustering results.

Direct download of result files for further analysis

The resulting BED file containing the clusters/peaks can be downloaded or can directly be saved into the project management for further analysis with other tasks. Depending on user parameters and input the BED file corresponds to from SICER.
The BED file either includes the read count value (when started with control) or the island score as calculated by SICER (when started without control) as score for each region.

A tab-separated file with details produced by SICER can also be downloaded. It contains detailed information on called peaks and corresponds to

You can open this file in Excel™ and sort/filter it using Excel functions. Depending on the parameters the file includes:
Note that the original BED file output from SICER is not canonical. It is corrected to the canonical format here to allow direct use in further analysis with Genomatix tools (canonical BED format is zero-based and half-open, whereas the original SICER algorithm includes the end position)

Copyright

The original SICER program is available at Weiqun Peng homepage.