NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

08-01-2009 Minutes

From Encode2 Wiki
Jump to: navigation, search

Minutes of AWG_CC Teleconference 08-01-09

1/2 Intro / Progress on action items (Rick Myers/Ewan Birney)

See minutes.

  • Previous Action: NHGRI to revisit development of standards with ENCODE groups.

The modENCODE Resources Working Group and representatives from the ENCODE Consortium have drafted standards for the validation of commercial histone modification antibodies. The draft standards are available on the wiki at Antibodies.

Otherwise this remains on NHGRIs to do list.

Action: NHGRI to revisit development of standards with ENCODE groups.

  • Previous Action: Identify additional external datasets which could be brought in.

SNPs have already been discussed leading part of the tasks for the genome variation task group. 1000 genome analysis has been released. Rick Myers questioned whether this was the SNPs discovered. Ewan stated that the end point of this analysis was a step before SNP calling, namely the Genome Likelihood Format data that describe the likelihood of particular sequences at that point. We need to call the SNPs from that data. We need Paul Flicek to post information on the format and on how to proceed.

Action: Ian/Ewan to organised coordination between Paul and Gavin Sherlock to understand these data.

Rick asked whether the G12878 line had been genotyped by the 1000 genomes group also. Ewan stated that it had, but that since the correlation of genotypes called by genotyping to sequence genotype calls was extremely high (didn't get the %) we should use the sequence information.

Ewan also noted that at the moment there is only data on SNPs, not CNVs.

Mike Snyder and Mark Gerstein are involved in the plans to call structural variants from the 1000 genomes data, and indicated that there were a number of CNV sets around at present, but consensus had not yet been reached.

Rick described his efforts to work with experts to identify transcription factors that might be usefully added to the ChIP-seq experiments. Later he requested interested persons to volunteer for the process of identifying what we are missing for K562 and the B cell line.

Action: Rick is seeking volunteers with familairity with K562 and lymphoblast biology for this task.

3. Progress report from AWG. (Ian Dunham).

Minutes of the AWG calls are on the wiki. Since the last AWG_CC call the following have been the main activities:

  • Workshop
    • Participants were split up into task groups and distributed task groups - see task groups - as follows
      • Elements
      • RNA
      • Large-scale Behaviour
      • Comparative
      • Integration
      • Cell phenotyping
      • Genome variation
      • Statistics
      • Strategy
      • Annotation
    • Task groups reviewed available data and developed plans for analysis between December 2008 and March 2009 meeting.
  • Post-Workshop
    • Plans and timelines are on the wiki, together with pages for each task group.
    • We established email tag system for communication in group while still within the AWG group. Will review this on Feb AWG_CC call
    • Have held one phone call with progress reports from Elements, Long Range behaviour and Integration task groups.
    • Begun to establish page for data flow coordination and file management
    • Next AWG call will have further updates on progress.
    • There has been the 1000 genomes data release, including NA12878/GM12878 as discussed above.

4. Progress on data submission (Kate Rosenbloom).

Kate reported that there had been a major submission of 9 histone modification, plus polII and CTCF for each of the tier 1 cell lines plus Huvec and NHEK cells from the Bernstein lab. Alll are available on genome-test. There had been a Duke Dase/Faire data resubmission.

Currently UCSC is prioritising getting the currently submitted data uploaded into the public release browser. Also they would like to have the lists of expected data updating as it is now out of date.

Action: Groups need to update their data submission plans

Mike Snyder raised the issue of how to deal with resubmissions of analysis such as peak calls. Yale would like to resubmit some of their peak calls as they think they were originally too liberal.

Kate said these could be resubmitted and versioned, although in general UCSC prefers not to remove data. In this case the data was not yet public.

5. Realignment of reads for the purposes of analysis and the implications for data submissions (Ewan/Ian).

Ian described that this issue had arisen because the analysis group wanted to run consistent alignments of reads to the genome for some analysis purposes (for instance to remove any effect from the aligners when integrating data). There was therefore an issue about how these new alignments should be stored and displayed so that others can see the datasets on which the analysis was performed. The issue was to agree a policy for data release of these realignments so that it both respected the original data producers work, and made the data available for others to see where the analysis came from. Ewan proposed that we should submit the realignments to UCSC (and GEO) on the submission of the paper describing their analysis. There was an issue with GEO, regarding ownership of the data, since this was in effect a reanalysis of other groups data. We would need to workout with GEO a mechanism to block submit the alignments but with authorship remaining with the submitting group.

Action: Resolved to adopt the proposal that we should submit the realignments to UCSC (and GEO) on the submission of the paper describing their analysis en block with authorship remaining with the submitters.

During this discussion, Kate mentioned that the next genome build, hg19 or GRC37, was scheduled for release in May to coincide with the Biology of Genomes meeting. All data sets will need to be realigned to this build at some point. There followed a discussion about how to deal with this. Because the build would not be available until May and was unlikely to be mature in the UCSC browser until 6 months following, there was no enthusiasm for switching to this build at this stage or any time soon, which in any case is currently impossible.

Action: Resolved to continue working with the hg18 for the current analysis.

Mike suggested that there may later be a situation whereby if we were delayed we might want to switch to hg19 and we should bear this in mind.

6. A.O.B.

Elise Feingold mentioned that NHGRI were beginning to work on the draft Agenda for the March meeting. People with thoughts and ideas on the format of the meeting should communicate them to NHGRI. Ewan raised the question of whether it would be possible for their to be a session (afternoon) so that AWG members could convene together. Also Elise requested input and recommendations on how to deliver integrative analysis sessions at the meeting.

Action: Suggestions for the March meeting format to Peter/Elise.

The next calls will be

  • 15th Jan noon EST = AWG call
  • 22nd Jan noon EST = AWG call
  • 5th Feb noon EST = AWG_CC call