NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

ENCODE Alternative Data Submissions

From Encode2 Wiki
Jump to: navigation, search

Attic (Download Only)

Each composite track has a downloads directory that can be accessed via a GUI or via Apache. Previous to the concept of the attic, this directory contained a downloadable copy of all of the displayed data as well as the non-displayable files like fastQs. The term attic arose with the advent of data that was displayable (additional peaks, RNA elements, additional experiments) but the UCSC browser was not going to be able to support displaying it. Attic simply refers to additional data that can be associated with individual experiments, but is not being displayed. Files in the attic will have metaData, will be searchable and filterable, and will be validated if they are a standard file type. They will not be available for display in the track or for perusal in the table browser.

An example of the attic in use for additional experiments is the HAIB Methyl RRBS track. In this track, there are experiments that are the same (same cell, same treatment, etc) except for the lab of origin or specific donor information for BioChain data. There are also experiments that have more than 2 replicates. You can see 185 tracks available for display on the configuration page, however there are 212 Sites files available on the downloads UI. For a specific example, look at cell MCF-7 where the cell was grown at three different labs or at BC Skeletal Muscle that has two different donors. To add additional experiments to the attic, the lab needs to work with their wrangler and specify which files in the DDF.

An example of the attic in use for additional files is CSHL Long RNA-Seq. In the list of views in this track, there are both files that could be displayed like the Exons and additional non-displayable files like the Protocols. To add non-displayed views, the wrangler needs to specify this in the DAF.

Supplemental Directory

Along with the downloads directory, a supplemental directory is also available for each composite. Labs can put additional protocols, images, validation documents, specific file format information or data analysis descriptions that are composite (series) level versus experiment (sample) level, in these supplemental directories. The content of this directory is not validated in any way. It is not searchable from the browser. It does not have metadata. It can store any file type. It is accessible from the track description, the downloads UI and the Apache downloads server. An example is found with the CSHL Long RNA Seq supplemental directory. The directory is available from the track description where, in this case, the reference is in the methods section. It is also available on the downloads UI page, where the supplemental directory is always listed near the top along with files.txt and the downloads server. It is seen on the Apache downloads server where it is described in the README and is near the top of the list of available files. The best way to currently submit supplemental data is through the pipeline without a DAF or DDF. The wranglers will special case handle the submission.

Direct GEO submissions

Some data that is either not suited for the browser or that has been deemed validation data, can be directly submitted to GEO. There are at least three options that have been used here. The first is for the microarray data where the CEL files were submitted directly to GEO and given GEO Sample Accessions. The lab then provided UCSC with a mapping of the experimental variables to the GEO Sample Accession. Any data that the DCC had with relation to these experiments was added to the GEO sample and the GEO Sample Acceesion number was added to the metaData of all relavent files at UCSC. An example of this is the UW Affy Exon series GSE19090, where the GEO Sample Accession is associated with an experiment in the Browser downloads UI in the columns, in the Browser track configuration page subtracks list under the metaData arrow.

The second option is to add supplementary data to the GEO Series for the composite. This works very much like the supplemental directory on the UCSC Browser where the files are relavent to the whole series (composite) and not to a specific sample (experiment). This GEO Series number can then be added to the track description of the composite under the Verification section of the track description.

A third option is for validation experiments that really should be a GEO Series of their own. In this case, the collection is submitted to GEO and gets its own GEO Series Accession with GEO Sample Accessions for the individual samples. The Series is then added to GEO's ENCODE project (mouse or human) and linked to the Series that it validates. The GEO Series Accession of the validation experiments can then be included in the track description under the Verification section and in the composite level metaData.

In all cases, the GEO submission is handled according to the GEO guidelines and a special note must be made that the data is intended to go into the ENCODE super-series. The wranglers and the GEO staff need to be in coordination.

Track Hubs

If there is data that labs would like to display in browser tracks, but it is not production data, we have Track Hubs for labs to create their own composites. Information about how to make Track Hubs along with examples of existing public hubs can be found by clicking the track hubs button under the UCSC Genome Browser display. These Hubs can be used to show data to collaborators or can be requested to be into a public hub that will be available to all users.

Comparison

Method Displayed MetaData Searchable Validated File Types Associated with Browser Data
Attic no yes yes yes limited to browser types yes
Supplemental Directory no no no no any yes
Track Hubs yes no no no .bigWig .bam .bigBed no
GEO Direct no maybe no no any maybe