NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Experimental Validation Pipeline

From Encode2 Wiki
Jump to: navigation, search

Workflow

ExpValid figure1.png


source: GRCP046

People

Tim Hubbard th@sanger.ac.uk Sanger
Jennifer Harrow jla1@sanger.ac.uk Sanger
Sarah Grubb sg10@sanger.ac.uk Sanger
Electra Tapanari et3@sanger.ac.uk Sanger
Jose M. Gonzales jmg@sanger.ac.uk Sanger
Alexandre Reymond alexander.reymond@unil.ch University of Lausanne
Cedric Howald cedric.howald@unil.ch University of Lausanne
Roderic Guigo roderic.guigo@crg.eu CRG
Andrea Tanzer andrea.tanzer@crg.eu CRG
Rachel Harte hartera@soe.ucsc.edu UCSC
Julien Lagarde julien.lagarde@crg.eu CRG
Adam Frankish af2@sanger.ac.uk Sanger


Data

  • gencode annotation
  • RGASP (human and C.e.)
  • gene models
  • pseudogenes

Methods

  1. primer design pipeline
    • description of method (+diagram)
    • design modes:
      • mono
      • multi-exonic
      • multi-exonic spannig junction
      • pseudogenes
  2. RT-PCR
    • protocol for multi exonic vs monoexonic -> avoid amplification of genomic DNA
    • Libraries & sequencing
  3. Detection of amplicons: mapping the reads
  4. Crossvalidation with PE datasets
  5. Detection of novel exons and isoforms

Overview Datasets

Batch Datasets Primer Notes
III RGASP
human
C.elegans
link to primers notes
IV Homo sapiens
GRCh37/hg19
gencode v4
primer: batchIV set
released: Jul 22, 2010
V Homo sapiens
GRCh37/hg19
gencode v6 + David, John, Laura
primer: batchV.p3primer.uniq.NO.batchIV.ordered.gtf.gz, DgonzalesMono100803.tgz, JohnPrimerMono100803.tgz, Laura_order_110215.tgz
input: gen6_pre.known_junction.uniq.HAVNA.NOTknown.ex.gtf.gz
released: Dec 22, 2010 (gen6), Feb 15, 2011 (david, john, laura)
VI Homo sapiens
GRCh37/hg19
HBM de novo models, gencode v6
link to primers
released: May 27, 2011
notes
VII Homo sapiens
GRCh37/hg19
gencode v8
input: gen8.exon.gtf.gz

primer: Gen8Set.tgz
released: Jul 7, 2011

selected models for validation: NOT transcript_status "KNOWN", NOT level 1, NOT transcript_type *pseudo*, ONLY source HAVANA, ONLY unique sj, NOT previously validated sj

primer design stats: gen8.p3primer.stats.txt

IX Homo sapiens
GRCh37/hg19
Cufflinks v3
input: cshl_caltech_Ap_incl_in_longv7_intergenic.exon.multiexonic.gtf.gz

primer: cuffv3_Ap_igas.multiexonic.p3primer.120117.fa.gz
cuffv3_Ap_igas.multiexonic.p3primer.120117.gtf.gz
released: Jan 17, 2012

cufflinks v3 models, RNA fraction: longPolyAplus, caltec+CSHL, 15 cell lines, intergenic/anti-sense relative to gen7

selected models for validation: IDR ≤ 0.1, intergenic relative to gen10, both unique and shared junctions
primer design mode: mus
primer design stats: cuffv3_Ap_igas.multiexonic.p3primer.stats.txt
presentation: 17th January 2012 Primer design for cufflinks models (AT)

X Homo sapiens
GRCh37/hg19
gencode11
input: gen11.exon.gtf.gz

primer: Gen11Set.tgz
released: March 30, 2012

selected models for validation: NOT transcript_status "KNOWN", NOT level 1, NOT transcript_type *pseudo*, ONLY source HAVANA, ONLY unique sj, NOT previously validated sj

primer design stats: gen11.p3primer.stats.txt

XI Homo sapiens
GRCh37/hg19
gencode13
input: gen13.exon.gtf.gz

primer: Gen13Set.tgz
released: August 31, 2012

selected models for validation: NOT transcript_status "KNOWN" of "NULL", NOT level 1, NOT transcript_type *pseudo*, ONLY source HAVANA, ONLY unique sj, NOT previously validated sj

primer design stats: gen13.p3primer.stats.txt

XI
(2nd part)
Homo sapiens
GRCh37/hg19
gencode14PRE
input: gen14.exon.gtf.gz

primer: Gen14Set.tgz
released: October 10, 2012

selected models for validation: NOT transcript_status "KNOWN", NOT level 1, NOT transcript_type *pseudo*, ONLY source HAVANA, ONLY unique sj, NOT previously validated sj

primer design stats: gen14.p3primer.stats.txt

Results and Data

Detection of novel exons

Read coverage of newExons withBatchVII canonical using ENCODE long contigs (2012-02-09): NovelexonReadcoverageENCODE_withBatchVII_canonical.tgz AT

Read coverage of newExons withBatchVII canonical using HBM PE50 (2012-02-22): NovelexonReadcoverageHBM_withBatchVII_canonical.tgz AT

Read coverage of newExons withBatchVII no canonical filter using ENCODE long contigs (2012-02-24): NovelexonReadcoverageENCODE_withBatchVII_NOcanonicalfilter.tgz AT

Read coverage of newExons withBatchVII no canonical filter using HBM PE50 (2012-02-24): NovelexonReadcoverageHBM_withBatchVII_NOcanonicalfilter.tgz AT

Subprojects

454-Race

Presentations

Experimental_Validation_Conference_Call_Presentations_2012

Experimental_Validation_Conference_Call_Presentations_2011

Experimental_Validation_Conference_Call_Presentations 2010

Publications

Encode Companions:

GRCP046 The combination of RT-PCR-seq and RNA-seq is essential to catalog all genic elements encoded in the human genome

GRCP030 Genome wide annotation of pseudogenes and analysis of their transcription, functional genomics and evolutionary constraints

GRCP001 GENCODE: The reference human genome annotation for the ENCODE project

Links

Gene_annotation_project_(Hubbard)

Gencode anotation website

Gencode UCSC Genome Browser