NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.
Segmentation bake off
- 1 Segmentation Bake off
- 2 Segmentation datasets
- 3 Criteria A: Precision/Recall
- 4 Criteria B: Downstream Analysis
- 5 Criteria C: Expert Biologist Look and Feel
Segmentation Bake off
This page coordinates efforts on segmentation comparisons. The aim is to make a considered choice of one segmentation (or combined segmentation) to present as the ENCODE segmentation, and to present summary analyses on. Each segmentation will be available independently.
|ChromHMM 25 state|| Also see ChromHMM version 8 wiki page||Jason Ernst, single model applied to all Tier 1 and 2. Files named fourcol_ChromHMM_HEPG2_concatenate_25.bed.gz, etc.|
|Segway 25 state|| Also see Segmentation Progress wiki page||Michael Hoffman, individual model trained per cell line, and applied to the same line. State labels not common between models. Files name gm12878.coordinated.bed.gz, etc.|
|ChromHMM 10 state||||Jason Ernst, manual identification of states into groups of related colours. For the purposes of comparison I intend to treat the colour groupings as single states. The result of this grouping can be found at [add file locations]|
|Segway 11 state||||Steve Wilder, Ian Dunham. Generated from the 25 state segmentations by clustering segments by signal and identifying related groups of segments across cell line models (see ppt).|
|Segway + Multipass 11 state||||Bill Noble, Michael Hoffman, Avinash Sahu. Only available for K562 at present|
Criteria A: Precision/Recall
Generate Precision/recall and ROC plots for each segmentation class vs the following data
- Gencode TSS
- Gencode TTS
- Ross' K562 enhancers for K562 segmentations
- HumanKnownCRMs_hg19.txt This is a combination of the CRMs in "ErythroidCRMsHumV3_hg18.txt" and "KnownCRMs93RP.txt" to give a set of 207 known CRMs active in a variety of tissues, but still a lot of them are erythroid. I also lifted them over to hg19 to facilitate analysis of the segmentations. Submitted by Ross Hardison, July 09, 2011.
- Gene bodies
- TFBS (spp) particularly CTCF, P300
- Conserved bases?
- Methylation data (RRBS)
- Elements group predicted HOT, Enhancer and promoter tracks?
Preliminary pilot analyses uploaded here (note these are performed for 10 and 11 state not due to any preference) :
- Confusion matrix
- Heatmap analysis of Overlaps of all TFs (SPP IDR) vs 10 and 11 state segmentations.
- Precision Recall and ROC
- Precision Recall ChromHMM 10 State vs TFs (Gm12878)
- Precision Recall Segway 11 state vs TFs (Gm12878)
- ROC plot ChromHMM 10 State vs TFs (Gm12878)
- ROC plot Segway 11 State vs TFs (Gm12878)
- AUC difference plot between ChromHMM 10 state vs Segway 11 state TFs (Gm12878)
- ROC plots for Segmentations vs Gencode TSS Clusters
- ROC plots for Segmentations vs expressed TSS (Gm12878 and K562)
Criteria B: Downstream Analysis
- Signal split of RNA, Methylation datasets
- Preliminary plots
- Methylation distribution within states
- Self Organizing Map downstream analysis: | SOMs of K562 Chrom HMM 10-state and Segway multipass and | SOMs of Tier 1 & stacked segmentations.
Criteria C: Expert Biologist Look and Feel
Please look at the following links to compare segmentations side by side (not available yet). Browse around a number of your favourite locations and switch on other tracks. Note: Check that you are on hg19, and clear custom tracks after viewing one link before viewing the next one, . Please indicate you preferences by this confidential doodle poll and follow up with an email to Ian Dunham with the major factors in your choice.
25 State Segmentations
Track colours are based on Jason Ernst's scheme for chromHMM. For Segway an equivalent colour scheme has been adopted based on the clustering described in Steve Wilder's presentation. Note specific mnemonics have not been given to the 25 state Segway track yet, but they are indicated by the colour scheme.
Clustered (10/11/hierarchichal) Segmentations
Results from comparisons by biologists
Ross did a comparison of the "Clustered (10/11/hierarchichal) segmentations" for Segway and ChromHMM, mostly K562 along with some GM12878, focused on known regulatory regions (dominated by erythroid ones). Results are in this presentation: Media:CompareSegwayChromHMM_rh.ppt
Proposed Mnemonics for Combined 7-state segmentation
These are the proposed "conservative labels" for the merged/combined segmentation.
|Combined 7 state|
|R||R||Predicted Repressed or Low Activity region.||Gray|
|CTCF||CTCF||CTCF enriched element.||Turquoise|
|PF||PF||Predicted promoter flanking region.||Light Red|
|TSS||TSS||Predicted promoter region including TSS.||Bright Red|
|T||T||Predicted transcribed region.||Dark Green|
|WE||WE||Predicted weak enhancer or open chromatin cis regulatory element.||Yellow|
Mnemonics for 10/11-state
Note: These Mnemonics and descriptions are based on combined analysis across segmentations, and may not exactly correspond to current or final mnemonics used for individual segmentations. They are used here as an aid to this comparison.
|Segway 11 state|
|1||R0||Polycomb Repressed||Dark Gray|
|3||D1||Dead intergenic||Light Gray|
|4||R1||Low signal intergenic?||Light Green|
|5||D2||Dead Input driven||Light Gray|
|7||TSS1||surrounds (usually 3') TSS||Light Red|
|8||G0||Transcribed gene||Dark Green|
|9||G1||Transcribed gene||Dark Green|
|11||E||Candidate Enhancer/open chromatin||Orange|
|ChromHMM clustered 10 state|
|States 1-2||AP||Active Promoter||Dark Red|
|State 3||PF||Promoter Flanking||Light Red|
|State 4||IP||Inactive promoter||Purple|
|States 5-6||CSE||Candidate Strong Enhancer||Orange|
|States 7-11||CWE||Candidate Weak Enhancer/Open Chromatin||Yellow|
|States 12-13||I||Distal CTCF/Candidate Insulator||Turquoise|
|States 14-19||T||Transcription associated||Dark Green|
|States 20-22||RP||Polycomb Repression||Dark Gray|
|States 23||D0||Low activity proximal to active states||Light Green|
|States 24-25||D1||Heterochromatin/Repetitive/CNV||Light Gray|
|Tentative assignments Segway+Multipass 11 state|
|2||D1||Dead, low signal||Light Gray|
|3||D2||Dead, low signal||Light Gray|
|7||TSS||Transcription Start Site||Dark Red|
|9||D0||Dead/low signal||Light Gray|