![]() |
![]() |
NE = c * #readsregion / (#readsmapped * lengthregion)where NE is the normalized expression or enrichment value,
If the expression values for two different conditions (here called "treatment" and "control" for simplicity) are to be compared, the following statistical testing methods for evaluating differential expression are available:
Audic S, Claverie JM (1997)
The significance of digital gene expression profiles
Genome Res. 1997;7(10):986-995
Anders S, Huber W (2010)
Differential expression analysis for sequence count data
Genome Biology 2010;11:R106
Robinson MD, Smyth GK (2007)
Moderated statistical tests for assessing differences in tag abundance
Bioinformatics 2007;23(21):2881-2887
Robinson MD, Smyth GK (2008)
Small-sample estimation of negative binomial dispersion, with applications to SAGE data
Bioinformatics 2008;9(2):321-332
Robinson MD, Oshlack A (2010)
A scaling normalization method for differential expression analysis of RNA-seq data
Genome Biology 2010;11:R25
For defining up- and down-regulated transcripts between two conditions or samples, the following criteria are used (parameters set by the user):
Benjamini Y, Hochberg Y (1995)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J Roy Stat Soc B 1995;57:289-300
To calculate the list of up-regulated genes, all up-regulated alternative transcripts of a gene
are used to calculate a mean log2 fold change in expression level. The gene list containing GeneId, Symbol and mean log2 fold change
is then sorted by the highest log2 fold change. The top 50 genes are displayed in
the output, the complete list can be downloaded and can be used as input data e.g. for the Genomatix Pathway System.
The list of down-regulated genes is calculated correspondingly, using all down-regulated alternative transcripts of a gene.
The program also gives the list of up- and down-regulated genes,
i.e. those genes where some alternative transcripts are up-regulated and some others are down-regulated at the same time.
See more details in the program output section below.
Part of this RegionMiner task and functionality is described in:
Sultan M, et al (2008)
A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome
Science 321 (5891), 956-60
| Input | |
|---|---|
| Input file(s) with read positions from RNA-Seq |
Input data are accepted in
BED / bigBed file format or
BAM file format containing the input regions.
For some tasks BAM support might not be available.
For those tasks that allow to choose replicate data as input, you can use shift/ctrl-keys to select multiple files
from the list. All selected files will then be treated as replicates.
When adding a new file, a new window will open, asking you to either
For the new BED/BAM files, you will have to select the correct organism, as the
organism and the genome build are associated with the BED file for future use
(the default is your latest choice in the current session).
Note that files critically depend on the underlying genome build, which can be changed by selecting a different ElDorado version on the top right of the page before uploading a file. You can see the list of genomes available in ElDorado. Note that almost all browsers have a general upload limit of 2 GB, i.e. files bigger than this size should be zipped before uploading from your local computer. This restriction does not apply when using the direct import from the GGA/GMS. Optionally you can specify a name for saving uploaded files on the server, otherwise the name of the uploaded file will be used. If several files are uploaded, the string given here will be used as prefix for each file name. If any of the regions in the input file cannot be completely assigned to the selected genome (e.g. wrong chromosome numbering or wrong positions within a chromosome), an error message will appear and the regions will be skipped. If no valid region is found in an uploaded file, the complete file will be skipped. After one or several BED/BAM files were uploaded successfully, and after closing the popup window,
the list of available BED/BAM files will be automatically updated.
Uploaded BED or BAM files can be deleted from the project anytime via the project management. |
| Optional control file(s) for differential analysis |
If additional input data is available
(e.g. data from a different condition or tissue, here called "control" data),
it can be selected or uploaded here. After the tickbox is checked, an additional selection will appear
(same options as for the "treatment" file(s), see above). |
| Differential Analysis Parameters |
The differential analysis parameter section will only appear,
if at least one control file was uploaded in the section above.
For a short introduction to the different methods, see above in the
Introduction to the Differential Analysis.
The thresholds that define a transcript as differentially expressed (or a region as enriched/depleted) can be set here. There are two criteria, that are combined (both must be satisfied for differential expression/enrichment):
|
| Analysis Options | |
| Strand Specificity | Check this box if the sequencing experiment was strand specific (e.g. with Helicos data). These parameters are hidden by default. You can use the
|
| Read Classification | When checked, a read classification is done for each input file from the input data: The number of input reads overlapping genomic elements like exons, introns, promoters and intergenic regions will be given in the result. |
| Output | |
| Result | Here, you can edit the default name of the result file. |
| Email address | Here you can choose between two methods for receiving
the results:
The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! We recommend to use the email option for this task! |
The output has a number of sections, depending on the input (one or two data sets) and parameters:
Note that the log2 fold change values cannot be calculated under certain conditions (e.g. if no expression is detected for a transcript in the control set). Such cases are indicated by a "-Inf", "Inf" or "NA" value in the output.

The download links below the numbers allow accessing
| Details on | Filename |
|---|---|
| all analyzed transcripts | 10.transcript_summary |
| differentially expressed transcripts | 11.diff_expressed_transcripts |
| both up- and down-regulated genes | 12.diff_expressed_genes_up_and_down |
| up-regulated genes | 13.diff_expressed_genes_up |
| down-regulated genes | 14.diff_expressed_genes_down |
| all analyzed genes | 15.genes_summary |





In case the read statistics option has been checked, a table for each of the input data sets is given. It shows the number of reads overlapping genomic elements is given. Each input read is classified either as
A table with the distribution of the reads on the different chromosomes of the genome is available. The content of this table is hidden by default, but can be shown by clicking the "Show details" link in the header.
If a detailed read classification is desired, this can be done for the input files with the RegionMiner task Annotation & Statistics.

Details on NE value distribution
The histogram shows how many transcripts are expressed with a specific
intensity. The histogram displays 50 classes of NE values. Note that the last class sums up
all NE values larger than 1.
This section is hidden by default and can be shown by clicking on the >>>show details<<< link.

Expression Profile for Transcripts
Expression Profile for Genes

| Details on | Filename |
|---|---|
| For each input data set: | |
| expression values for each transcript in this sample | sample-dir/07.expression_profile |
| Statistics on the distribtuion of NE values | sample-dir/08.expression_statistics |
| Main results: | |
| all analyzed transcripts | 10.transcript_summary |
| differentially expressed transcripts | 11.diff_expressed_transcripts |
| both up- and down-regulated genes | 12.diff_expressed_genes_up_and_down |
| up-regulated genes | 13.diff_expressed_genes_up |
| down-regulated genes | 14.diff_expressed_genes_down |
| all analyzed genes | 15.genes_summary |
| If differential analysis with the test methods 'edgeR' and 'DESeq' was selected: | |
| count data used for the test method | 91.input_replicate_analysis |
| library size, i.e. the total read numbers | 91.input_replicate_analysis.libsize |
| result from the test method | 92.output_replicate_analysis |
| Plots: | |
| Fold change scatter plot | 93.fold_change_plot.png |
| Fold change scatter plot (for EdgeR) | 94.fold_change_all_plot.png |
| Volcano plot of (adj.) p-values | 95.volcano_plot.png |
| Volcano plot of (adj.) p-values (for EdgeR) | 96.volcano_all_plot.png |
Data files for all analyzed transcripts and for differentially expressed transcripts
1: transcript ID (Eldorado)
2: accession number of the transcript (external e.g. RefSeq, Genbank, Ensembl)
3: locus ID (Eldorado)
4: symbol of the gene
5: gene ID (NCBI Entrez Gene, 0 if not available, -2 if ambiguous)
6: contig/chromosome accession number
7: chromosome
8: strand
9: start position of the transcript
10: end position of the transcript (start < end)
11: length of the transcript (sum of exons)
12: number of exons
13: p-value (depends on the selected method)
14: adjusted p-value(depends on the selected method)
15: log2(fold change), i.e. log2(expression value of control data set / expression value of treatment data set),
note, that this value can be -Inf/+Inf if one of the conditions shows no expression
16: Regulation of treatment (set1) compared to control (set2), (values can be "up", "down", "no")
the following columns depend on the number of input files:
- number of reads for each replicate from the treatment sets and the control sets
- normalized expression value for each replicate from the treatment sets and the control sets
- the mean normalized expression value across the treatment replicates
- the standard deviation of the normalized expression values across the treatment replicates
- the mean normalized expression value across the control replicates
- the standard deviation of the normalized expression values across the control replicates
TranscriptId Accn LocusId Symbol GeneId ContigAccn Chromosome Strand Start End ... GXT_21962895 NM_001034592 GXL_353276 DNAJB2 533668 NC_007300 chr2 + 111629112 111636393 ... GXT_21962908 NM_174360 GXL_353253 CXCR2 281863 NC_007300 chr2 - 110616093 110617795 ... GXT_21962913 XM_592166 GXL_353073 LCT 514332 NC_007300 chr2 + 64499763 64548198 ... ... Transcript length #exons p-value adj. p-value log2(fold change) Regulation ... ... 1894 10 5.86E-001 9.66E-001 -0.18 down ... ... 1703 1 7.42E-002 5.73E-001 -Inf down ... ... 5784 17 1.60E-001 7.93E-001 0.66 up ... ... #reads treat1 #reads treat2 #reads treat3 #reads ctrl1 #reads ctrl2 #reads ctrl3 ... ... 1124 2563 629 2619 1105 1100 ... ... 0 0 0 2 2 0 ... ... 0 11 2 0 4 0 ... ... NE treat1 NE treat2 NE treat3 NE ctrl1 NE ctrl2 NE ctrl3 ... ... 0.29 0.25 0.25 0.28 0.34 0.3 ... ... 0 0 0 0 0 0 ... ... 0 0 0 0 0 0 ... ... mean NE(treat) stddev NE(treat) mean NE(ctrl) stddev NE(ctrl) ... 0.26 0.02 0.31 0.03 ... 0 0 0 0 ... 0 0 0 0
Data files for up-regulated genes and for down-regulated genes
1: gene ID (NCBI Entrez Gene) 2: symbol of the gene 3: number of alternative transcripts for this gene that are up-/down-regulated regulated 4: total number of alternative transcripts available in the Genomatix annotation for this gene 5: mean log2 fold change of up-/down-regulated transcripts 6: min log2 fold change of up-/down-regulated transcripts 7: max log2 fold change of up-/down-regulated transcripts 8: standard deviation across the log2 fold change values of the regulated alternative transcripts 9: minimum p_value for the regulated alternative transcripts 10: mean NE(treat): mean normalized expression value for the regulated alternative transcripts in the treatment data 11: stddev NE(treat): standard deviation across the NE values for the regulated alternative transcripts in the treatment data 12: mean NE(ctrl): mean normalized expression value for the regulated alternative transcripts in the control data 13: stddev NE(ctrl): standard deviation across the NE values for the regulated alternative transcripts in the control data
GeneId Symbol #transcripts regulated total #transcripts for gene ... 505518 PLET1 1 1 ... 616216 OOSP1 2 2 ... 767916 MGC127695 3 3 ... ... mean log2(fold change) of reg. trans. min fold change of reg. trans. max fold change of reg. trans. fc stddev ... ... 5.270 5.270 5.270 0.000 ... ... 4.956 4.915 4.997 0.041 ... ... 4.189 4.103 4.232 0.061 ... ... min p_value mean NE(treat.reg.) stddev NE(treat.reg.) mean NE(ctrl.reg.) stddev NE(ctrl.reg.) ... 1.15e-41 0.07340 0.075 0.00200 0.002 ... 2.92e-03 0.00816 0.009 0.00028 0.000 ... 5.03e-03 0.00920 0.009 0.00050 0.001
Data file for up- and down-regulated genes
1: gene Id (NCBI Entrez Gene) 2: symbol of the gene 3: total number of alternative transcripts for this gene 4: number of up-regulated transcripts for this gene 5: mean log2 fold change of up-regulated transcripts 6: number of down-regulated transcripts for this gene 7: mean log2 fold change of down-regulated transcripts
GeneId Symbol total #transcripts for gene ... 407173 JSP.1 10 ... ... #up-regulated transcripts mean log2(fold change up) #down-regulated transcripts mean log2(fold change down) ... 1 2.11 3 -2.17
Data file for all genes
1: gene Id (NCBI Entrez Gene)
2: symbol of the gene
3: total number of alternative transcripts for this gene
4: mean log2 fold change all transcripts in treatment file(s)
please note, that here the mean NE is calculated across ALL transcripts of the gene across ALL replicates
5: standard deviation of log2 fold changes of all transcripts in treatment (possibly replicates)
6: mean log2 fold change all transcripts in control file(s)
7: standard deviation of log2 fold changes of all transcripts in control (possibly replicates)
GeneId Symbol total #transcripts for gene mean NE(treat) stddev NE(treat) mean NE(ctrl) stddev NE(ctrl) 280675 AR 4 0.00507 0.003 0.00208 0.002 280677 C3 5 0.17381 0.084 0.18711 0.047 280678 C4A 3 0.43977 0.087 0.14772 0.018
Data files if one of the test methods 'edgeR' and 'DESeq' was selected
1. id: Genomatix Transcript Id 2. p-value: p-value resulting from the hypothesis test of the selected test method for differential expression (DESeq or edgeR) 3. adj. p-value: Benjamini-Hochberg adjusted p-value (from column 2) 4. log2FoldChange: logarithmic (base 2) fold-change in read abundance/expression level in treatment over control (> 0 is enrichment in treatment, < 0 is decrease in treatment); for DESeq, the fold-change corresponds to the base mean, for edgeR to the concentration for DESeq the remaining columns are 5. baseMean: mean expression level across all replicates, treatment and control 6. baseMean control: mean expression level within control group 7. baseMean treatment: mean expression level within treatment group for edgeR the remaining column is 5. log counts-per-million expression level, logarithmic (base 2)
| © 1998-2013 Genomatix Software GmbH - All rights reserved |