NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Files for ENCODE Analysis

From Encode2 Wiki
Jump to: navigation, search

This page contains links to files for the analysis group. It is organized alphabetically by contributor. For annotations of "known things" useful for downstream analysis, consider linking them on the ENCODE Analysis Annotations page.

Short files (up to 33 meg) can be uploaded directly into the wiki via the "upload" link on the toolbox to the left. Larger files can be via a http: or ftp: link. You can use http: or FTP servers on your own site, or use the encodeftp.cse.ucsc.edu server.

The encodeftp.cse.ucsc.edu server is anonymously readable. For write access use the shared username and password for this wiki. (Contact kate@soe.ucsc.edu or kent@soe.ucsc.edu if you've forgotten this.) The first time on please make a directory with the same name as your user name in the newest /freeze/<date> directory, and in the /user directory. You can use the ftp mkdir command for this. The FTP root directory is mounted as /hive/groups/encode/dcc/analysis/ftp for people with UCSC Unix accounts.

Please do document files you put in this area, at least if you want other people to be able to understand them (including yourself a few months later). To start the documentation please log into the wiki if you haven't already, and select edit from the top of this page. Then copy the "template" section and paste it where your name goes alphabetically, and change "template" to your name. After this please put the file names with hyperlinks on the left side of the table, and a short description (including version information if necessary) on the right side of the table. See the files under Jim Ken'ts sections for examples of how to make hyperlinks of various sorts on the wiki. (The autoSql.doc is an example of a link to a file that lives in the wiki itself, while barski.zip is an example of one on the encodeftp.cse.ucsc.edu server.)

Note that all files posted here should be considered "on the net." If you need to share data that you want to keep private from the world wide web as a whole, please contact the DCC.

Analysis Groups

RNA

File Description
http://woldlab.caltech.edu/~alim/ENCODERNA/ Wold Lab RNA-Seq in GM12878,K562 (README)
ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev3c_TSS.gff TSSs from GENCODE annotation v3c with expression values from 11 CAGE and 7 DiTAG experiments (TSSs are made from transcripts of the following biotypes: protein_coding, ambiguous_orf, non_coding, antisense, processed_transcript, retained_intron) (a README file is available at ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev3c_TSS.txt)
http://genome.crg.es/~sdjebali/research/data/gencode_exons_rpkm_4_exp.gff GENCODE exons (October 2008) with expression values from 4 RNAseq and 12 tiling array experiments (a README file is available at http://genome.crg.es/~sdjebali/research/data/gencode_exons_rpkm_4_exp.txt)
http://genome.crg.es/~sdjebali/research/data/gencode_exons_rpkm_4_exp_proctr_pcg_gene.gff Same for GENCODE exons of processed transcripts and protein coding genes

Elements

File Description


Angelika Merkel (RNA group, CRG)

File Description
ftp://genome.crg.es/pub/Encode/Gencode_partitions/ GENCODE partitions for hg18 (gencode.v3c.annotation.NCBI36) and hg19 (gencode.v3c.annotation.GRCh37) including tRNA
ftp://genome.crg.es/pub/Encode/data_analysis/Expression RPKM values per gene, transcript and exon calculated with the flux capacitor and RNAseq pipeline at CRG, mapping was done with GEM (reads trimmed to 50nt with two mismatches)


Ewan Birney

File Description
ftp://ftp.ebi.ac.uk/pub/software/ensembl/encode/conservation/ 29-way GERP score - per base conservation score
http://www.ebi.ac.uk/~swilder/.Encode/Cellinetotsignal.txt initial vectorisation of the element data
http://www.ebi.ac.uk/~swilder/.Encode/ensembl_tss.bed Ensembl protein-coding genes transcription start sites


Alex Dobin (RNA group, Gingeras lab)

File Description
ftp://encode:human@ftp2.cshl.edu/Sequencing/Hs_strandedPE_GMK562/RPKM_GENCODEv3c/K562_GM12878_TotalRNA.zip RPKM for GENCODE v3c exons, transcripts and genes of K562/GM12878 total RNA stranded sequencing from Gingeras lab - Fall 2009

Only stranded uniquely mapped reads with 0-2MM trimmed to 50b were used. Format for transcripts and genes: ID <tab> relative coverage <tab> RPKM For each gene the transcript with highest coverage was chosen. Format for exons: chr <tab> start <tab> end <tab> strand <tab> relative coverage <tab> RPKM

ftp://encode:human@ftp2.cshl.edu/Sequencing/Hs_strandedPE_GMK562/RPKM/ RPKM for GENCODE genes of K562/GM12878 total RNA stranded sequencing from Gingeras lab - Fall 2009

Format: GENCODE 3b Gene ID <tab> relative coverage <tab> RPKM For each gene the transcript with highest coverage was chosen; only stranded uniquely mapped reads with 0-2MM trimmed to 50b were used

ftp://encode:human@ftp2.cshl.edu/Sequencing/Hs_strandedPE_GMK562/tagAlign/ Alignments to hg18 (tagAlign) for K562/GM12878 total RNA stranded sequencing from Gingeras lab - Fall 2009
ftp://encode:human@ftp2.cshl.edu/Sequencing/Hs_strandedPE_GMK562/fastqRaw/ Raw fastq files for K562/GM12878 total RNA stranded sequencing from Gingeras lab - Fall 2009

These reads contain strand marking tags, they cannot be mapped directly to the genome

ftp://encode:human@ftp2.cshl.edu/Sequencing/Hs_strandedPE_GMK562/fastqTrim/ Trimmed fastq files for K562/GM12878 total RNA stranded sequencing from Gingeras lab - Fall 2009

The strand marking tags were trimmed off, the strand information is indicated in the reads names as strand0 (strand unknown), strand1 (same strand as original RNA), strand2 (opposite strand to original RNA)

Felix Schlesinger (RNA group, Gingeras lab)

File Description
ftp://encode:human@ftp2.cshl.edu/rnaclusters/rnaclus.prelim.k562.bed.gz Clusters of RNAseq reads from Gingeras Lab Total RNA Sequencing of K562; only stranded uniquely mapped reads with 0-2MM trimmed to 50b were used - Preliminary - 11/09
ftp://encode:human@ftp2.cshl.edu/rnaclusters/rnaclus.prelim.gm.bed.gz Clusters of RNAseq reads from Gingeras Lab Total RNA Sequencing of GM; only stranded uniquely mapped reads with 0-2MM trimmed to 50b were used - Preliminary - 11/09

Ian Dunham

File Description

Anshul Kundaje

File Description
ftp://encode:human@encodeftp.cse.ucsc.edu/users/akundaje/rawdata/labs/Transcriptome/promoterCpG/promoterCpGRatio.gff.gz Normalized CpG Ratio for 3000 bp regions surrounding GENCODEv3c promoters. README and distribution plot of CpG content (DATE: 02/23/2010)
ftp://encode:human@encodeftp.cse.ucsc.edu/users/akundaje/SPP/consistency/results/consistentPointPeaks/ Point Peaks (fixed width narrowPeak) calls using the SPP peak caller on pooled replicates for various transcription factors, cell lines, labs. Replicates were pooled, peaks were called and consistency analysis was used to prune the peaks to a consistency FDR of 0.05. The method of 'passing structure' was used for consistency analysis. (DATE: 07/20/2009)
ftp://encode:human@encodeftp.cse.ucsc.edu/users/akundaje/SPP/consistency/results/consistentRegionPeaks/ Regional Peaks (Point peaks with regions of significant enrichment around them) calls using the SPP peak caller on pooled replicates for various transcription factors, cell lines, labs. Replicates were pooled, peaks were called and consistency analysis was used to prune the peaks to a consistency FDR of 0.05. The method of 'passing structure' was used for consistency analysis. (DATE: 07/17/2009)

Guoliang (ChIA-PET: Yijun Ruan's group)

File Description
http://chiapet.gis.a-star.edu.sg/chiapet/downloads/encode-datasets ChIA-PET (Chromatin Interaction Analysis with Paired-End-Tag sequencing) data from five cell lines: MCF-7, K562, HeLa, HCT-116 and NB4. (Methed description is available at http://chiapet.gis.a-star.edu.sg/chiapet/protocols and File description is available at http://chiapet.gis.a-star.edu.sg/chiapet/downloads/encode-datasets/Description%20of%20ChIA-PET%20shared%20libraries.docx)


Jason Ernst

File Description
http://www.mit.edu/~jernst/NUCPOS/K562_nucpos.bed K562 positioned modified nucleosomes based on the NPS software (http://liulab.dfci.harvard.edu/NPS/) applied to all Broad K562 histone modification data. File is bed format except with header lines for loading into the browser.
http://www.mit.edu/~jernst/NUCPOS/GM12878_nucpos.bed GM12878 positioned modified nucleosomes based on the NPS software (http://liulab.dfci.harvard.edu/NPS/) applied to all Broad GM12878 histone modification data. File is bed format except with header lines for loading into the browser.
http://compbio.csail.mit.edu/jernst/CONSORTIUM25STATE/maxposterior_25state.bed This file contains a UCSC custom track header and data in bed format of the maximum-a-posteriori assignments of 200bp intervals of the genome to states based on the preliminary 25-state HMM model integrating 33 ENCODE consortium datasets presented during the March 2009 meeting (slides 18-25 http://encodewiki.ucsc.edu/EncodeDCC/images/5/57/Ernst%2C_J_Integrative_Analysis_ENCODE.ppt). (Note the first and last blocks of length 0 and 1 respectively on each line are present to control the ordering of states in the browser but should be ignored otherwise.)
http://compbio.csail.mit.edu/jernst/CONSORTIUM25STATE/viterbi_25state.bed This file contains a UCSC custom track header and data in bed format of the viterbi (most likely) path of states along the genome based on the preliminary 25-state HMM model integrating 33 ENCODE consortium datasets presented during the March 2009 meeting. (See note above about first and last blocks.)
http://compbio.csail.mit.edu/jernst/CONSORTIUM25STATE/posterior0.wig.gz This file contains a UCSC custom track header and data in wig file format of the posterior probability that a location is in state 0 based on the preliminary 25-state HMM model integrating 33 ENCODE consortium datasets presented during the March 2009 meeting.
http://compbio.csail.mit.edu/jernst/CONSORTIUM25STATE/posteriorN.wig.gz Replace N with an integer between 0 and 24, and then this file contains the posterior probability that a location is in state N based on the preliminary 25-state HMM model integrating 33 ENCODE consortium datasets presented during the March 2009 meeting.

Ross Hardison

Curated Sets of Known Cis Regulatory Modules

File Description
CTCF_OS_validated_hg18.txt Validated DNA segments occupied by CTCF in hg18 coordinates, derived from work by Kim et al. (2007, Cell 128: 1231-1245) (PMID 17382889). They used two rounds of ChIP-chip to map CTCF-occupied DNA segments (CTCF OSs) in human IMR90 cells. As part of their quality assessment, they validated 80 (out of 84 tested) CTCF OSs by ChIP-qPCR (Supplementary Table 1). They also examined 60 CTCF OSs from the previous literature, and they confirmed binding at 36 of these (Supplementary Tables 1 and 2). We combined these validated CTCF OSs to obtain a set of 116, with coordinates in hg17. All but one (on chr3ranDom) liftedOver to hg18. The 115 CTCF OSs in hg18 are in the file CTCF_OS_validated_hg18.txt. Submitted by Ross Hardison and Yong Cheng, Mar. 05, 2010.
GATA1osHumV3_hg18.txt‎ Human DNA segments occupied in vivo by GATA1 in erythroid cells. This list is much shorter than the list of erythroid CRMs (next entry) because many of the latter do not bind GATA1. Version 3 (Posted by Ross Hardison; March 07, 2010) includes GATA1 OSs from Steiner et al. 2009 (PMID 19687298) and TRANSFAC; total of 58 GATA1 OSs. Coordinates are for hg18.
ErythroidCRMsHumV3_hg18.txt‎ Erythroid cis regulatory modules (CRMs) from curation of the literature. This is close to complete for the HBB (PMID 16024817) and HBA (PMID 15998734) clusters, and several other CRMs are included. These DNA segments have been experimentally shown to affect regulation of gene expression, or are DNase hypersensitive sites, or are bound by transcription factors in vivo. The HBA annotations are from Jim Hughes and Doug Higgs. Version 3 (posted March 07, 2010) removed some from version 1 that were only characterized in mouse, and added several more from human. Coordinates are for hg18.
ErythCRMsMusLOHum.txt Erythroid CRMs determined by experimental validation of enhancer activity of predictions (Regulatory potential + conserved consensus binding site for GATA1 (PMID 17038566) or of ChIP-validated GATA1-occupied segments (PMID 18818370). These were determined in mouse and lifted over to human.
KnownCRMs93RP.txt Known cis regulatory modules (CRMs) in humans from curation (by Laura Elnitski), including some previously published sets; they include CRMs regulating in a wide variety of tissues, not just erythroid. They are a mix of promoters and enhancers. These are the 93 CRMs used as the positive training set for regulatory potential (PMID 12529307) (PMID 17053093).
HumanKnownCRMs_hg19.txt This is a combination of the CRMs in "ErythroidCRMsHumV3_hg18.txt" and "KnownCRMs93RP.txt" to give a set of 207 known CRMs active in a variety of tissues, but still a lot of them are erythroid. I also lifted them over to hg19 to facilitate analysis of the segmentations. Submitted by Ross Hardison, July 09, 2011.
allDevEnh.bed Ive expanded the set to 475 developmental enhancers by doing some literature searches for elements that act positively in in vivo enhancer assays as well as extracting elements from other databases (e.g. CONDOR). Many of the additional sequences were originally tested using sequence from another species (mouse, Fugu, zebrafish, chicken etc) but are all deeply conserved and are highly likely to act as an enhancer in human too. The coordinates are therefore the human sequence that shows conservation to this element. I also took the liberty of clipping the VISTA Enhancer elements down to the sections that are covered by phastCon elements, as these elements are often tested with a significant amount of flanking sequence which is unlikely to be contributing to the function. The references for the published source of each element (fourth column in file):

vista_element[x]: VISTA Enhancer Browser--a database of tissue-specific human enhancers. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. Nucleic Acids Res. 2007 Jan;35(Database issue):D88-92. Epub 2006 Nov 27.

CRCNEAC[x]: CONDOR: a database resource of developmentally associated conserved non-coding elements. Woolfe A, Goode DK, Cooke J, Callaway H, Smith S, Snell P, McEwen GK, Elgar G. BMC Dev Biol. 2007 Aug 30;7:100

Suster_element[x]: A novel conserved evx1 enhancer links spinal interneuron morphology and cis-regulation from fish to mammals.Suster ML, Kania A, Liao M, Asakawa K, Charron F, Kawakami K, Drapeau P. Dev Biol. 2008 Oct 18

Ishihara_element[x]: Multiple evolutionarily conserved enhancers control expression of Eya1.shihara T, Sato S, Ikeda K, Yajima H, Kawakami K. Dev Dyn. 2008 Nov;237(11):3142-56.

Wassef_elementC: Rostral hindbrain patterning involves the direct activation of a Krox20 transcriptional enhancer by Hox/Pbx and Meis factors. Wassef MA, Chomette D, Pouilhe M, Stedman A, Havis E, Desmarquet-Trin Dinh C, Schneider-Maunoury S, Gilardi-Hebenstreit P, Charnay P, Ghislain J. Development. 2008 Oct;135(20):3369-78

Antonellis_MCS: Identification of neural crest and glial enhancers at the mouse Sox10 locus through transgenesis in zebrafish. Antonellis A, Huynh JL, Lee-Lin SQ, Vinton RM, Renaud G, Loftus SK, Elliot G, Wolfsberg TG, Green ED, McCallion AS, Pavan WJ. PLoS Genet. 2008 Sep 5;4(9):e1000174

HAR2: Human-specific gain of function in a developmental enhancer. Prabhakar S, Visel A, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Morrison H, Fitzpatrick DR, Afzal V, Pennacchio LA, Rubin EM, Noonan JP. Science. 2008 Sep 5;321(5894):1346-50.

Calle_element[x]: A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts. de la Calle-Mustienes E, Feijóo CG, Manzanares M, Tena JJ, Rodríguez-Seguel E, Letizia A, Allende ML, Gómez-Skarmeta JL. Genome Res. 2005 Aug;15(8):1061-72

McEwen_element[x]: Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis. McEwen GK, Woolfe A, Goode D, Vavouri T, Callaway H, Elgar G. Genome Res. 2006 Apr;16(4):451-65

Shin_element[x]: Human-zebrafish non-coding conserved elements act in vivo to regulate transcription. Shin JT, Priest JR, Ovcharenko I, Ronco A, Moore RK, Burns CG, MacRae CA. Nucleic Acids Res. 2005 Sep 22;33(17):5437-45

Menke_element[x]: Dual hindlimb control elements in the Tbx4 gene and region-specific control of bone size in vertebrate limbs. Menke DB, Guenther C, Kingsley DM. Development. 2008. 135:2543-53.

 - Adam Woolfe (19th December 2008).

Human Variants in Disease-Associated Loci

File Description
http://genome-test.cse.ucsc.edu/ Locus variants track at UCSC genome-test; in the section Variation and Repeats. This has many variants, including large deletions, at disease-associated loci (from Locus Specific Databases) plus data from SwissProt. See link below for more information. PMID 17326095
http://phencode.bx.psu.edu/ PhenCode site with 28,000 variants in over 1,000 disease associated loci (thanks to the LSDBs). PMID 17326095

Manolis Kellis

Motif instances at varying conservation cutoffs are provided by Pouya Kheradpour in the Kellis Lab.

File Description
http://compbio2.csail.mit.edu/instances/vert/analysis/20090317-genome-instances/ Conserved motif instances for AP1, ATF2, CTCF, Fos, Myc, STAT1

Michael Hoffman

File Description
Segway.chromatin-map.parzen.20081121.bed.gz Initial Segway segmentation of Duke/UNC/UT open chromatin map (BED 3+1 format) on pilot regions only--for test purposes only, segmentation and number of segment labels (currently just 0, 1) will change. Some exploratory data analysis. Original presentation on the method. E-mail mmh1@washington.edu with questions.
Segway.chromatin.20081210.singletrack.tar.gz Initial Segway segmentations of all of the chromatin tracks (for test purposes only, segmentation and number of segment labels (currently just 0, 1) will change).
Media:Chromatin.20090216.norm.tar chromatin multi-track and single-track segmentations (tar file) (see Large-scale Behaviour)
Media:Histone.20090217.norm.tar histone single-track segmentations (tar file) (see Large-scale Behaviour)
Media:Tf.20090218.gamma.tar transcription factor single-track segmentations (tar file) (see Large-scale Behaviour)

Felix Kokocinski, PI:Tim Hubbard (GENCODE)

File Description
gencode.v3c.annotation.NCBI36.gtf.gz GTF dump of GENCODE annotation in NCBI36 (v3c, October 2009 freeze)
gencode.v3c.annotation.GRCh37.gtf.gz GTF dump of GENCODE annotation in GRCh37 (v3c, October 2009 freeze)
gencode.v3c.pc_transcripts.fa.gz Fasta file with GENCODE protein-coding sequences (v3c, October 2009 freeze)
gencode.v3c.pc_translations.fa.gz Fasta file with GENCODE protein-coding translations (v3c, October 2009 freeze)

Jim Kent

Enhancer Picks

File Description
enhPicks.zip Archive of H3K4Me1 + DNAse enhancer picks on 7 cell lines. 20 April 2011. See README inside for more info.

Experiment aligning human X vs. mouse

File Description
chrX.chain.gz output from axtToChain - chained blastz alignment
chrX.net.gz output from chainToNet - best alignments for any particular base
chrX.maf.gz maf files (including bases) for best alignments

Kate Rosenbloom

File Description
phyloP_placental.tar.gz phyloP scoring of placental mammal subset of human-reference 28-way multiz alignment. WARNING: chr7 has double-scoring (will be fixed)

Joel Rozowsky

File Description
ftp://encode@encodeftp.cse.ucsc.edu/users/rozowsky/consisntency/PeakSeq/ Peaks calls using the PeakSeq peak caller on pooled replicates for various transcription factors, cell lines, labs. Replicates were pooled, peaks were called and consistency analysis was used to prune the peaks to a consistency FDR of 0.05. The method of 'passing structure' was used for consistency analysis. (DATE: 07/23/2009)
http://archive.gersteinlab.org/proj/PeakSeq/Sequence_Data/HeLa-S3/ Fastq files and Eland aligned reads for HeLa S3 Pol II and HeLa S3 Input DNA [Rozowsky et al. Nature Biotech (in press)]
ChIPSeqMini.txt ChIPSeqMini Output (default output)
ChIPSeqmini.bed ChIPSeqMini Output (bed format)
MACS_peaks.xls MACS Output (default output)
MACS_peaks.bed MACS Output (bed format)
PeakSeq.txt PeakSeq Output (default output)
PeakSeq.bed PeakSeq Output (bed format)
Quest_extended.txt Quest Output (default output)
Quest.bed Quest Output (bed format)
sissrs.txt SISSRS Output (default output)
sissrs.bed SISSRS Output (bed format)

Chao Cheng

File Description
http://homes.gersteinlab.org/people/cc59/outbox/ K562& Gm12878 BAR+ probability
DensityPlot_BAR+_region_of_all_features.pdf Distribution of signal of 32 chromatin features in BAR+ bins. Note that these DNaseI signal show bimodal distribution, which may explain the high prediction power of DNaseI for TF binding sites when positive and negative training bins are from BAR+.
K562_corr_exp_chr_unified.jpg Correlation pattern between chromatin features and transcript expression levels using unified tracks provided by Michael Hoffman. Note there is little difference between this figure from the one I presented before.

Sarah Djebali (RNA group, CRG)

File Description
ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev3c_TSS.gff TSSs from GENCODE annotation v3c with expression values from 11 CAGE and 7 DiTAG experiments (TSSs are made from transcripts of the following biotypes: protein_coding, ambiguous_orf, non_coding, antisense, processed_transcript, retained_intron) (a README file is available at ftp://genome.crg.es/pub/Encode/data_analysis/TSS/Gencodev3c_TSS.txt)
http://genome.crg.es/~sdjebali/research/data/RNAseq/Gingeras.zip Transcripts and exons from July 2009 GENCODE annotation associated to RPKM values as computed by the flux capacitor using stranded RNAseq single reads of 75 nt obtained in T. Gingeras' lab on PolyA+ RNA from the cytosol of K562, and mapped using GEM (up to 2 mismatches) (you will find a README file in the archive)
http://genome.crg.es/~sdjebali/research/data/RNAseq/Helicos.zip Transcripts and exons from July 2009 GENCODE annotation associated to RPKM values as computed by the flux capacitor using RNAseq single reads obtained and mapped by Helicos on PolyA+ RNA from the cytosol of K562 (you will find a README file in the archive)
http://genome.crg.es/~sdjebali/research/data/RNAseq/GIS.zip Transcripts and exons from July 2009 GENCODE annotation associated to RPKM values as computed by the flux capacitor using RNAseq single reads obtained and mapped by GIS (solid) on PolyA+ RNA from the cytosol of K562 and of GM12878 (you will find a README file in the archive)
http://genome.crg.es/~sdjebali/research/data/RNAseq/Wold.zip Transcripts and exons from July 2009 GENCODE annotation associated to RPKM values as computed by the flux capacitor using RNAseq single and pair end reads obtained in B. Wold's lab on polyA+ total RNA from K562 and GM12878 and mapped by GEM (you will find a README file in the archive)

Ensembl multiple alignments (Javier Herrero)

31-way alignments

File Scope Description
elements directory genome-wide GERP elements from the 31-way MSA (bed format, bzip'ed)
scores directory genome-wide GERP scores from the 31-way MSA (wig format, bzip'ed)

29-way alignments

File Scope Description
MD5SUM directory md5sum for the following files
dupl_in_pilot.bed 1% pilot Duplications in the 29-way MSA (bed format). Score is the number of human segments in the block
29gerp_elem_in_pilot.bed.bz2 1% pilot GERP elements from the 29-way MSA (bed format, bzip'ed). Score is the GERP score (expected-observed)
dupl.bed.bz2 genome-wide Duplications in the 29-way MSA (bed format). Score is the number of human segments in the block
29gerp_elem.bed.bz2 genome-wide GERP elements from the 29-way MSA (bed format, bzip'ed). Score is the GERP score (expected-observed)

5C-ENm009 (Job Dekker)

File Description
5C_ENm009.zip 5C-ENm009 Data (GM06690 & K562)

Bob Altshuler

File Description
peak_EPO_GERP_het.tar.gz IDR peaks with EPO and GERP scores, and heterozygosity (narrowPeak+5 aka bed6+9 format)

Belinda Giardine

Variants

File Description
gwascatalog.june_16_2011.txt GWAS catalog SNPs from a download on June 16, 2011. Processed into a bed-like format retaining selected data from the original file (build hg19).
pgSnpsCombined24.hg19.noRand.noY.sortChr.bed.gz SNP positions from 24 high coverage personal genomes from various sources and platforms. (hg19, some direct and some lifted)
completeGenomics69.bed.gz SNP positions from the 69 genomes from Complete Genomics. (hg19)
low_coverage.2010_07.hg19.sorted.bed.gz SNP positions from 1000 genomes low coverage genomes July 2010 release. (lifted from hg18 to hg19)

Enrichment Graphs

File Description
segway_round8.tier1-2.coordinated.manyFeatures.pdf Enrichment graph showing various SNP sets and the segway round8 tier1-2 segments
segway_round8.k562.all.manyFeatures.pdf Enrichment graph showing various SNP sets and the segway round8 K562 segments
chromhmm_round8.k562.all.manyFeatures.pdf Enrichment graph showing various SNP sets and the chromhmm round8 K562 segments
segway_round8.tier1-2.coordinated.gwas_catalog.june_2011_and_samplesPlus.pdf Segway round8 tier1-2 segmentation - Horizontal enrichment graph showing mean of 1000 samples with same frequency as GWAS SNPs, and a sample offset by 3-5k bases. Also various other SNP sets. Error bars 1 stddev, red color is more than 2 stddev.
segway_round8.k562.all.gwas_catalog.june_2011_and_samplesPlus.pdf Segway round8 K562 segmentation - Horizontal enrichment graph showing mean of 1000 samples with same frequency as GWAS SNPs, and a sample offset by 3-5k bases. Also various other SNP sets. Error bars 1 stddev, red color is more than 2 stddev.
chromhmm_round8.k562.all.gwas_catalog.june_2011_and_samplesPlus.pdf chromhmm round8 K562 segmentation - Horizontal enrichment graph showing mean of 1000 samples with same frequency as GWAS SNPs, and a sample offset by 3-5k bases. Also various other SNP sets. Error bars 1 stddev, red color is more than 2 stddev.

DHS hotspots

File Description
dhsHotspots.merged.bed.gz "comprehensive set": everything that is implicated as DNase sensitive (UW and Duke ENCODE tracks)

Template

File Description