NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Test summer 10

From Encode2 Wiki
Jump to: navigation, search


Replaced By

This page is superceeded by: Integration_Vignette_E01

We've left this below for posterity, but probably you should not be using this page anymore.

Old Analysis

In November 2010 we re-did both the discriminative and generative approaches for testing, and we should coordinate our discussion from here

Brief Agenda:

  • Presentation from Barbara about testing
  • Presentation/Discussion from Michael on generative picks
  • Presentation/Discussion from Anshul on discriminative picks

Testing Outline

As promised at the meeting, here's a write up of the experimental testing outline.

Link to the proposal put forth by the Myers/Wold Lab

We have 4 - potentially 5 - assays:

  • Transient enhancer assay (K562) (HA)
  • Transient promoter assay (K562) (HA)
  • Transient silencer assay (K562) (NHGRI)
  • Medaka Fish assays (Uni Heidelberg)

and then hopefully a number of these will also go into Len Pennachio's pipeline:

  • Mouse Embryo Enhancers (LBL)

We have 4 biological classes of elements, one being "background controls" some with multiple prediction methods:

  • Enhancers (180)
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks, split into 2 classes
    • Discriminative trainers Lead contact: Zhiping Weng (with M. Gerstein and Anschul definitely) 60 picks, from consensus between methods
    • Biologist lead picks Lead contact: Ross Hardison 60 picks
  • Insulator/Silencers
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks
  • 3'end state - unique state coming out from the segmentation
    • Ab initio segmentation, consensus from two segmentations (MIT/UW) Lead contact: Michael Hoffman 60 picks
  • Background set
    • Random picks from the "Encode visible" genome: Lead contact: Michael Hoffman 60 picks

Background set.

After a surprisingly long discussion we settled on choosing a random set of elements across the "signal-detectable" genome, meaning the part of the genome which ENCODE can see. We discussed extensively other options, like excluding "well known" annotation, but at the end came back to random picks from "encodable" space of the genome.

Note: these are deliberately not negative controls for the assays. We think all the assays have appropriate negative controls. What we think is appropriate to get a sense of is what a genome-background looks like for each assay.

In each case the picks should be drawn from a well defined set, in which that set is frozen for future reference.

In further conversations, it is suggested that the 60 picks each is expanded 5 fold, drawn by the same procedure so that if we have the appetite/desire to go "deeper" into any set then we have it pre-computed. This is almost like have the set for future reference, but just to make sure the picks are ready to go straight away.

Note that the segmentation forms a good part of the picks. Michael will generate the 240 picks by middle of next week (wednesday 28th July). The 60 discriminative enhancer picks will be ready by thursday 19th August. The biology pick enhancers will be ready by 16th September.

The proposal is to have a centralised cloning process at least for the HA and NHGRI effort, and we are in the process of getting quotes now for the best way to do this.

Segmentation picks

Number of intersections:

intersection.ctcf-a.bed.gz 9969
intersection.ctcf-b.bed.gz 23174
intersection.enh-a.bed.gz 21288
intersection.enh-b.bed.gz 38740 44166

Number of predictions after filtering for overlap with RepeatMasker regions, promoters, and exons.

predictions.ctcf-a 1358
predictions.ctcf-b 3522
predictions.enh-a 2776
predictions.enh-b 5405 4294

Individual enhancers from Ross Hardison's list.

All screenshots

Links to browser: WARNING: will erase all your existing tracks.





























Discriminative learning picks