NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Segmentation bake off

From Encode2 Wiki
Jump to: navigation, search

Segmentation Bake off

This page coordinates efforts on segmentation comparisons. The aim is to make a considered choice of one segmentation (or combined segmentation) to present as the ENCODE segmentation, and to present summary analyses on. Each segmentation will be available independently.

Segmentation datasets

Segmentation Location Notes
ChromHMM 25 state [1] Also see ChromHMM version 8 wiki page Jason Ernst, single model applied to all Tier 1 and 2. Files named fourcol_ChromHMM_HEPG2_concatenate_25.bed.gz, etc.
Segway 25 state [2] Also see Segmentation Progress wiki page Michael Hoffman, individual model trained per cell line, and applied to the same line. State labels not common between models. Files name gm12878.coordinated.bed.gz, etc.
ChromHMM 10 state [3] Jason Ernst, manual identification of states into groups of related colours. For the purposes of comparison I intend to treat the colour groupings as single states. The result of this grouping can be found at [add file locations]
Segway 11 state [4] Steve Wilder, Ian Dunham. Generated from the 25 state segmentations by clustering segments by signal and identifying related groups of segments across cell line models (see ppt).
Segway + Multipass 11 state [5] Bill Noble, Michael Hoffman, Avinash Sahu. Only available for K562 at present

Criteria A: Precision/Recall

Generate Precision/recall and ROC plots for each segmentation class vs the following data

Preliminary pilot analyses uploaded here (note these are performed for 10 and 11 state not due to any preference) :

Criteria B: Downstream Analysis

Criteria C: Expert Biologist Look and Feel

Please look at the following links to compare segmentations side by side (not available yet). Browse around a number of your favourite locations and switch on other tracks. Note: Check that you are on hg19, and clear custom tracks after viewing one link before viewing the next one, . Please indicate you preferences by this confidential doodle poll and follow up with an email to Ian Dunham with the major factors in your choice.

Composite Segmentation

25 State Segmentations

Track colours are based on Jason Ernst's scheme for chromHMM. For Segway an equivalent colour scheme has been adopted based on the clustering described in Steve Wilder's presentation. Note specific mnemonics have not been given to the 25 state Segway track yet, but they are indicated by the colour scheme.

Clustered (10/11/hierarchichal) Segmentations

Results from comparisons by biologists

Ross did a comparison of the "Clustered (10/11/hierarchichal) segmentations" for Segway and ChromHMM, mostly K562 along with some GM12878, focused on known regulatory regions (dominated by erythroid ones). Results are in this presentation: Media:CompareSegwayChromHMM_rh.ppt

Proposed Mnemonics for Combined 7-state segmentation

These are the proposed "conservative labels" for the merged/combined segmentation.

State Name Mnemonic Description Colour
Combined 7 state
R R Predicted Repressed or Low Activity region. Gray
CTCF CTCF CTCF enriched element. Turquoise
E E Predicted enhancer. Orange
PF PF Predicted promoter flanking region. Light Red
TSS TSS Predicted promoter region including TSS. Bright Red
T T Predicted transcribed region. Dark Green
WE WE Predicted weak enhancer or open chromatin cis regulatory element. Yellow

Mnemonics for 10/11-state

Note: These Mnemonics and descriptions are based on combined analysis across segmentations, and may not exactly correspond to current or final mnemonics used for individual segmentations. They are used here as an aid to this comparison.

State Mnemonic Notes Colour
Segway 11 state
1 R0 Polycomb Repressed Dark Gray
2 I Distal CTCF Turquoise
3 D1 Dead intergenic Light Gray
4 R1 Low signal intergenic? Light Green
5 D2 Dead Input driven Light Gray
6 TSS0 TSS Dark Red
7 TSS1 surrounds (usually 3') TSS Light Red
8 G0 Transcribed gene Dark Green
9 G1 Transcribed gene Dark Green
10 D0 Dead Light Gray
11 E Candidate Enhancer/open chromatin Orange
ChromHMM clustered 10 state
States 1-2 AP Active Promoter Dark Red
State 3 PF Promoter Flanking Light Red
State 4 IP Inactive promoter Purple
States 5-6 CSE Candidate Strong Enhancer Orange
States 7-11 CWE Candidate Weak Enhancer/Open Chromatin Yellow
States 12-13 I Distal CTCF/Candidate Insulator Turquoise
States 14-19 T Transcription associated Dark Green
States 20-22 RP Polycomb Repression Dark Gray
States 23 D0 Low activity proximal to active states Light Green
States 24-25 D1 Heterochromatin/Repetitive/CNV Light Gray
Tentative assignments Segway+Multipass 11 state
0 E0 Enhancer Orange
1 R0 Repressed Dark Gray
2 D1 Dead, low signal Light Gray
3 D2 Dead, low signal Light Gray
4 T0 Transcribed Dark Green
5 T1 Transcribed Dark Green
6 E1 Enhancer Orange
7 TSS Transcription Start Site Dark Red
8 T2 Transcribed Dark Green
9 D0 Dead/low signal Light Gray
10 I/E Insulator/enhancer Orange