NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Test summer 10

From Encode2 Wiki
Jump to: navigation, search

Introduction

Replaced By

This page is superceeded by: Integration_Vignette_E01

We've left this below for posterity, but probably you should not be using this page anymore.

Old Analysis

In November 2010 we re-did both the discriminative and generative approaches for testing, and we should coordinate our discussion from here


Brief Agenda:

  • Presentation from Barbara about testing
  • Presentation/Discussion from Michael on generative picks
  • Presentation/Discussion from Anshul on discriminative picks


Testing Outline

As promised at the meeting, here's a write up of the experimental testing outline.

Link to the proposal put forth by the Myers/Wold Lab

We have 4 - potentially 5 - assays:

  • Transient enhancer assay (K562) (HA)
  • Transient promoter assay (K562) (HA)
  • Transient silencer assay (K562) (NHGRI)
  • Medaka Fish assays (Uni Heidelberg)

and then hopefully a number of these will also go into Len Pennachio's pipeline:

  • Mouse Embryo Enhancers (LBL)

We have 4 biological classes of elements, one being "background controls" some with multiple prediction methods:


  • Enhancers (180)
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks, split into 2 classes
    • Discriminative trainers Lead contact: Zhiping Weng (with M. Gerstein and Anschul definitely) 60 picks, from consensus between methods
    • Biologist lead picks Lead contact: Ross Hardison 60 picks
  • Insulator/Silencers
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks
  • 3'end state - unique state coming out from the segmentation
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks
  • Background set
    • Random picks from the "Encode visible" genome: Lead contact: Michael Hoffman 60 picks

Background set.

After a surprisingly long discussion we settled on choosing a random set of elements across the "signal-detectable" genome, meaning the part of the genome which ENCODE can see. We discussed extensively other options, like excluding "well known" annotation, but at the end came back to random picks from "encodable" space of the genome.

Note: these are deliberately not negative controls for the assays. We think all the assays have appropriate negative controls. What we think is appropriate to get a sense of is what a genome-background looks like for each assay.


In each case the picks should be drawn from a well defined set, in which that set is frozen for future reference.

In further conversations, it is suggested that the 60 picks each is expanded 5 fold, drawn by the same procedure so that if we have the appetite/desire to go "deeper" into any set then we have it pre-computed. This is almost like have the set for future reference, but just to make sure the picks are ready to go straight away.


Note that the segmentation forms a good part of the picks. Michael will generate the 240 picks by middle of next week (wednesday 28th July). The 60 discriminative enhancer picks will be ready by thursday 19th August. The biology pick enhancers will be ready by 16th September.


The proposal is to have a centralised cloning process at least for the HA and NHGRI effort, and we are in the process of getting quotes now for the best way to do this.


Segmentation picks

Number of intersections:

intersection.ctcf-a.bed.gz 9969
intersection.ctcf-b.bed.gz 23174
intersection.enh-a.bed.gz 21288
intersection.enh-b.bed.gz 38740
intersection.ge.bed.gz 44166

Number of predictions after filtering for overlap with RepeatMasker regions, promoters, and exons.

predictions.ctcf-a 1358
predictions.ctcf-b 3522
predictions.enh-a 2776
predictions.enh-b 5405
predictions.ge 4294

Individual enhancers from Ross Hardison's list.

All screenshots

Links to browser: WARNING: will erase all your existing tracks.

chr11:5261457-5263745

HS3

chr11:5257370-5259665

HS2_pos

chr11:5252469-5255093

HS1

chr11:5223940-5226690

HBG1_3'enh

chr11:5201451-5203716

HBB_3'enh

chr11:5128075-5131123

HPFH1enh_HS

chr11:5148328-5151029

HPFH6enh

chr1:47449158-47451515

SCL+19enhancer

chr10:94439502-94441735

Prh+1ErythroidEnhancer

chr11:128079688-128081935

Fli-enhancer

chr19:12858390-12860848

EKLF_Enh

chr8:11554916-11557223

Gata4Enhancer

chrX:48525230-48527500

Gata1Enh

chrX:48531686-48534426

Gata1In1enh

Discriminative learning picks