NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.
This page contains information about the cloning-based RNAPET (18/16) [diPET] that the Genome Institute of Singapore (GIS) produced for ENCODE.
11 long poly A+ RNAPET (18/16) [diPET] libraries were constructed in total:
1) Prostate Tissue
2) K562 Cytosol
3) K562 Nucleus
4) K562 Polysome
5) K562 Nucleolus
6) K562 Chromatin
7) K562 Nuclearplasma
8) GM12878 Nucleus
9) GM12878 Cytosol
10) HepG2 Cytosol
11) HepG2 Nucleus
Library construction and RNAPET (18/16) [diPET] production:
Approximate 10 microgram poly(A) mRNA was used to construct the cloning-based RNA-PET (previously referred to “GIS-PET”) following the procedures described in Nature Method 2005. After flcDNAs were obtained, two bacterial cloning steps were performed and a type II restriction enzyme MmeI was used to extrac 20-20bp tags from the 5’ and 3’ end of a given flcDNA. For the current ENCODE datasets submitted, the 20-20bp ditags (or PET) were modified and concatenated into a diPET structure (3’-16bp18bp-5’--Linker Sequence--5’-18bp16bp-3’), and paired end (PE)-sequenced by Solexa to generate 2 x 36 bp reads. Each 36bp read is an independent PET containing both 5’ and 3’ ends extracted from a given transcript. This special diPET structure made is to facilitate using 2 x 36bp PE sequencing format. The complete diPET template structure is illustrated as below:
[Solexa Adaptor--3’-16bp18bp-5’--Linker Sequence--5’-18bp16bp-3’--Solexa Adaptor].
RNAPET (18/16) [diPET] sequencing and mapping:
After Solexa paired end sequencing, the redundant reads are first filtered out and the unique ones are kept for further analysis. From each 36-bp read, the first 1-19 bp is expected to be the 5’ end of a given transcript, while the next 18-34bp is from the 3’ end of that transcript. Majority of PETs (~80-90%), defined as concordant PETS, are mapped on the same chromosome, same strand and in the same direction to the known transcripts, or splice variants. Whereas there is a small portion of the incorrectly mapped PETs, referred as discordant PETs, are mapped either in the wrong orientations (e.g., 3’ end is mapped before the 5’ end), or on different strands, or on different chromosomes (e.g., one end mapped on chromosome 3, but the other end is mapped on chromosome 8), indicating exist some transcription variations which could be caused by genome rearrangements such as deletion, inversion, tandem replication, translocation or trans-splicing etc. The mapping strategy is illustrated as below.
From the diagram shown in above, though the exact separation point between the 3’ and 5’ tag is not clear, each read should have a 5’ and 3’-end tag within the 1-34bp segment. The 2-bp sequence sit in the middle position of the read (18-19) could be assigned to either 5’- or 3’-end. Once the 5’- and 3’-end tags are demarcated, all sub-sequences of at least 16bp from the 5’- and 14bp for the 3’- are enumerated and mapped against the reference genome. If a 5’-mapping and a 3’-mapping are mapped to the same chromosome, same strand and in correct 5’ -> 3’ orientation in less than 1 million bp distance, such is called a pairing (PET), and it was considered as a possible transcript. Any tags which have no such pairings are defined as PET-0, and ones with a unique pairing (PET) are defined as PET-1. Similarly, PET-2 is defined as having two such pairings, and so on for PET-3, PET4… The PETs in PET-1 group is used for further PET clustering analysis. Raw data has been submitted to DCC with concordant PET’s for visualization.