NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

2008-10-17 DCC Progress

From Encode2 Wiki
Jump to: navigation, search

DCC Progress Report, October 17, 2008

Data Submission

This month we received data from 4 ENCODE labs: HudsonAlpha, Riken, Sanger and Yale. Along with previous data submitted by Duke and NHGRI, we now have 44 datasets covering 12 experiments, that are displayed as subtracks on our test server. As most of these submissions represent the first time the respective labs have submitted data to our ENCODE pipeline, the learning curve had to be negotiated. In particular, there were several cases where file formats had to be worked out or where our validator code was more restrictive than necessary. We have been working through our issues to make a more seamless process and have tried to minimize any difficulties for the labs. We appreciate the efforts that these labs have been making to learn the submission process. We remain committed to ensuring as painless a process as possible. Please use the Submissions How To and contact your data wrangler with any questions or difficulties.

Submissions that have occurred so far can be viewed in the test browser at the listed URLs:

Lab Experiment Type Date URL
Duke:Crawford DNase-seq 6/10 (Preliminary) http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeDukeDnase
Hudson-Alpha:Myers ChIP-seq 9/11, 9/23 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeHudsonalphaChipSeq
NHGRI:Elnitski NRE 9/18 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeNhgriNre
NHGRI:Margulies MSA 9/24 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=encodeMsaTbaDec07
Riken:Gingeras CAGE 9/22 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeRikenCage
Sanger:Hubbard Genes 10/7 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeSangerGencode
Yale:Snyder ChIP-seq 9/243, 9/30 http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeYaleChIPseq

We expect additional submissions from Elnitski at NHGRI and from Yale submission imminently (waiting for the cell growth protocol). We also anticipate submissions from Broad and Duke soon.

Submission Pipeline User Interface

The production version of the submission pipeline, located at http://encodesubmit.ucsc.edu/ is up and running. We have made minor improvements to the process and terminology to simplify the user experience. Additionally, anticipating that submissions may stack up close to the data freeze deadline, we have instituted serialization of the submission processing. This should not prevent any lab from submitting their data or slow down that process, however when many labs submit data simultaneously, it may slow the turn-around time between submission and when the data is viewable in the test browser. The sequential processing is intended to protect the user-critical response times from excessive delay during times of high demand.

Status pages

The Agreements status page shows the current state of data agreements and submissions for each ENCODE lab.
A summary view of each submission can also be found in the Submissions status page.
For those with a submissions account, this information can also be seen in the Submissions pipeline All Submissions page.
Finally, the Experiment progress page should be kept up to date by the participating labs.

ENCODE Related Data

Two sets of external data related to the ENCODE project are being loaded into our database, and tracks are viewable now on our test browser:

Open Issues

Several issues are in the process of resolution:

  • There is an on-going effort to establish appropriate data formats and display for RNA-seq data, and other data types as well. (The ChIP-seq format and display issues have been well documented and are being successfully used).
  • We are expecting a large volume of data submissions close to the data freeze deadline. Especially for labs that have not previously completed a submission, we anticipate that there will be revision of data formats and a learning curve. We strongly encourage labs that have not submitted data yet to avoid waiting till the last minute.
  • Coordination between the NCBI repositories and the ENCODE DCC is under discussion. While we initially envisioned raw data deposits to NCBI (GEO, SRA) preceding deposit of the processed data to the DCC, further communication with NCBI indicates there may be a need for greater coordination. At this time, we will accept submissions with or without GEO or SRA accessions. We will report to the consortium when this interaction is more fully specified.