NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.


From Encode2 Wiki
Jump to: navigation, search

CONTACT INFO Piero Carnicni Timo Lassmann


All RIKEN datasets with the exceptions of K562 Nucleoplasm, Nucleolus and Chromatin (?? double check with Piero).

PROTOCOL DETAILS: (expand) This protocol generates stranded (?) reads. This protocol is predominately done on Poly-A- RNA (it was also done on K562 Poly-A+).

LIBRARY CONSTRUCTION: To create the tag, a linker was attached to the 5' end of polyA plus or minus reverse transcribed cDNA which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. And then linker was attached to the 3' of restricted 27 bp cDNA. The result 96 bp were followed by PCR amplification and sequencing.

DAY 1:

1. First strand synth
2. proK treatment
3. GFX-CTAB purification
4. Biotinylation

DAY 2:

1. RNaseI treatment
2. Cap trapping by streptaviding-Sepherose beads
3. Release cDNA from beads
4. Single strand linker ligation

DAY 3:

1. GFX-CTAB purification
2. 2nd strand synthesis
3. proK treatment
4. GFX-CTAB purification
5. EcoP151 digestion
6. 3' Linker ligation

Day 4:

1. Purification from the linker
2. PCR cycle check
3. PCR
4. Exonuclease I treatment
5. MinElute column


SEQUENCING: What modification have you had to do to the referenced protocols to accommodate Illumina Seq? How long are your reads?

The CAGE tags are sequenced from the 5' ends of cap trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009).


All tags were mapped to the human genome (hg18) with the program nexalign (T. Lassmann manuscript in preparation). Do you restrict the number of loci that a read can map to and allow in/dels?


After producing 10 millions of more CAGE that can be mapped (at least 50% of tags can be mapped and usually >75%-80%). We check that less than 5% of the CAGE tags are constituted by ribosomal RNA. And among tags expressed at 50 to 100 tags per millions, more than 70% (and up to more than 85%) map on exons or proximal regions (core promoters) of annotated genes. Ultimate validation will be done by comparing transfrags, RACE validation and ChIP derived promoter data (several of these parameters not yet fixed). The 5-diols identified via CAGE can be validated in PET datasets when available.


Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, et al. CAGE: cap analysis of gene expression. Nat Methods. 2006 March 1; 3(3):211-222.

Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, Murata M, Nishiyori H, Lazarevic D, Motti D, et al. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res. 2009 February; 19(2):255-265.

Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics. 1996 November 1; 37(3):327-336.