NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.


From Encode2 Wiki
Jump to: navigation, search


Previous analysis page

Cat-herder - Elliott Margulies


  • Elliott Margulies
  • Steve Parker
  • Javier Herrero
  • Kathryn Beal
  • Bob Harris
  • GilTae Song
  • Chris Morrissey
  • Benedict Paten
  • Brian Raney

Conference Calls

AWG updates

2010 July Barcelona workshop

  • Day 1. Pre-workshop update: ppt


  • 15-04-2010: pdf

Data sets

Comparative data sets

33-way EPO multiple alignments on GRCh37/hg19

File Description Date Format
epo_33_eutherian.tar 33-way alignments, by chromosome on GRCh37/hg19 1 Feb 2010 MAF files
GERP scores GERP conservation scores on GRCh37/hg19 9 Feb 2010 wigFix
GERP elements GERP constrained elements on GRCh37/hg19 9 Feb 2010 BED
epo_33_eutherian.all.maf.tar eutherian.all.maf.tar.md5 33-way alignments, by chromosome on GRCh37/hg19 25 Feb 2010 MAF files with no segmental duplications
EPO blocks Coverage of 33-way alignments on GRCh37/hg19 24 Sep 2010 BED

33-way EPO multiple alignments back-lifted on NCBI36/hg18

File Description Date Format
GERP elements GERP constrained elements back-lifted on NCBI36/hg18 25 Feb 2010 BED
Chai elements Chai constrained elements back-lifted on NCBI36/hg18 8 March 2010 BED

46-way Multiz multiple alignments on GRCh37/hg19

Directory Description Date Format
Multiple alignments by chrom 46-way alignments, by chromosome on GRCh37/hg19 8 Jan 2010 MAF files
phastCons phastCons conservation scores on GRCh37/hg19 1 Dec 2009 wigFix
phyloP phyloP conservation scores on GRCh37/hg19 10 Nov 2009 wigFix

"Other" data sets


We will use the output of Joel et al. analysis. Here is Joel's description of the process: Different labs score there own data separately. We took the aligned tag read file for all the submitted ChIP-Seq data. These are scored seperately using both SPP and PeakSeq. Uniform thresholding is done by comparing the rank order list of binding sites between biological replica in order to consistently determine thresholds


Sarah Djebali (CRG) has started to compile all the available data sets.

Link: RNA_element_tracks

Large-scale segments

Michael Hoffman confirms that no comparative data has been used to generate the segments.

Link: Large-scale_Behaviour#Results_for_March_2010_meeting


Perl script to compare multiple BED files: (Javier)



OLD ENTRIES (Dec 2008)

What we have today

  • Pilot Project Regions (1%) 
    • Alignments and binCons/Chai conservation scores (Margulies/NISC) 
  • Genome Wide: 
    • 29 mammal alignment and GERP scores (Javier/ENSEMBL) 
    • 44 vertebrates alignment (Paten/Hiram/UCSC) 


  • Compare to 1% data (Bob Harris/PSU) 
  • Connections with 22‐mammals analysis group 
    • Side project: Regions of accelerated evolution 
  • Access to working data (James Taylor) 
    • Find key 5‐6 people to provide working data (next 2 hours) 
    • Provide intermediate datasets in a controlled format (30 days
    • Be in a position 100 days from now to do "AWESOME ANALYSIS"