NOTE! This is a read-only copy of the ENCODE2 wiki.
Please go to the ENCODE3 wiki for current information.

Observations on TF

From Encode2 Wiki
Jump to: navigation, search

Preamble

This page is for "manual" observations about Transcription Factors. Note the quality metrics are already on a different page (Anshul's metrics and SPOT quality)).

Each TF that we make the observation should have it's own section, and do this alphabetically so the index works naturally.

Potential Outlier datasets

If you find an "interesting" observations/hypotheses related to one of these datasets, you might want to make sure it is not due to data quality issues.

SET 1: FAILED DATASETS These datasets have very poor signal and reproducibility. These should be avoided in analyses unless you have really strong evidence that your results are not due to poor data quality. Some of these datasets have been officially revoked after the Jan Freeze.

wgEncodeOpenChromChipGm12878CmycAlnRep0.bam_VS_wgEncodeOpenChromChipGm12878InputAln.bam.regionPeak.gz

wgEncodeOpenChromChipHelas3CmycAlnRep0.bam_VS_wgEncodeOpenChromChipHelas3InputAln.bam.regionPeak.gz

wgEncodeOpenChromChipHuvecCmycAlnRep1.bam_VS_wgEncodeOpenChromChipHuvecInputAln.bam.regionPeak.gz

wgEncodeHaibTfbsGm12878Egr1Pcr2xAlnRep0.bam_VS_wgEncodeHaibTfbsGm12878ControlPcr2xAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Gcn5StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Gcn5StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Irf3StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeUchicagoTfbsK562Enr4a1ControlAlnRep0.bam_VS_wgEncodeUchicagoTfbsK562InputControlAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878P300n15StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz (One replicate was actually a TAF1 replicate)

wgEncodeSydhTfbsK562Pol2s2StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Spt20StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Spt20StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Srebp1PravastStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputPravastStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Srebp2PravastStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputPravastStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Stat1StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562Xrcc4StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHek293bElk4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsHek293InputUcdAln.bam.regionPeak.gz

wgEncodeUwTfbsHl60CtcfStdAlnRep1.bam_VS_wgEncodeUwTfbsHl60InputStdAlnRep1.bam.regionPeak.gz

wgEncodeUwTfbsGm12801CtcfStdAlnRep1.bam_VS_wgEncodeUwTfbsGm12801InputStdAlnRep1.bam.regionPeak.gz

wgEncodeHaibTfbsK562Hey1Pcr1xAlnRep0.bam_VS_wgEncodeHaibTfbsK562ControlPcr1xAlnRep0.bam.regionPeak.gz (Antibody provided by SantaCruz biotech is suspicious. Looks very much like TAF1 or POL2)

wgEncodeHaibTfbsHepg2Hey1V0416101AlnRep0.bam_VS_wgEncodeHaibTfbsHepg2ControlPcr1xAlnRep0.bam.regionPeak.gz (Antibody provided by SantaCruz biotech is suspicious. Looks very much like TAF1 or POL2)

wgEncodeSydhTfbsGm12878NfkbTnfaStdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz (One replicate was actually a NFKB-no treatment replicate)

SET2: MEDIUM DATA QUALITY The datasets below are generally fine and can definitely be used in analyses. However, many of them show low number of reliable peaks < 1000 and low signal to noise. Just the low number of peaks itself could bias some analyses. NOTE: Some of the low numbers are due to the biology of the factor i.e. the TF does infact bind only few locations. Some other datasets are sub-optimal in the sense that they are likely detecting only a subset of the complete binding landscape of the factor. So deeper sequencing and/or better antibodies would very likely substantially increase the number of peaks for some of these datasets. In integrative analyses, you should take into account these characteristics mentioned above.

wgEncodeSydhTfbsK562Atf3StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Baf170IggmusAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputIggmusAlnRep0.bam.regionPeak.gz

wgEncodeHaibTfbsH1hescBcl11aPcr1xAlnRep0.bam_VS_wgEncodeHaibTfbsH1hescControlPcr1xAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Bdp1StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562Bdp1StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Brca1cIggmusAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputIggmusAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Brf1StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562Brf1StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Brf2StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562Brf2StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Brg1IggmusAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputIggmusAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2ErraForsklnStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputForsklnStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsK562bGata1UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Grp20ForsklnStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputForsklnStdAlnRep0.bam.regionPeak.gz

wgEncodeUchicagoTfbsK562Ehdac8ControlAlnRep0.bam_VS_wgEncodeUchicagoTfbsK562InputControlAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Hsf1ForsklnStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputForsklnStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Irf3IggrabAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputIggrabAln.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878JundStdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562JundStdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878MaxStdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562NelfeStdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Nfe2hStdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsGm10847NfkbIggrabAlnRep0.bam_VS_wgEncodeSydhTfbsGm10847InputIggrabAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12892NfkbIggrabAlnRep0.bam_VS_wgEncodeSydhTfbsGm12892InputIggrabAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm18526NfkbIggrabAlnRep0.bam_VS_wgEncodeSydhTfbsGm18526InputIggrabAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Pgc1aForsklnStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputForsklnStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm10847Pol2IggmusAlnRep1.bam_VS_wgEncodeSydhTfbsGm10847InputIggmusAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHek293bPol2StdAlnRep0.bam_VS_wgEncodeSydhTfbsHek293bInputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Pol2PravastStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputPravastStdAlnRep0.bam.regionPeak.gz

wgEncodeBroadHistoneHelas3Pol2bStdAlnRep0.bam_VS_wgEncodeBroadHistoneHelas3ControlStdAlnRep0.bam.regionPeak.gz

wgEncodeBroadHistoneHuvecPol2bStdAlnRep0.bam_VS_wgEncodeBroadHistoneHuvecControlStdAlnRep0.bam.regionPeak.gz

wgEncodeBroadHistoneNhekPol2bStdAlnRep0.bam_VS_wgEncodeBroadHistoneNhekControlStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Pol3StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562Pol3StdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2Srebp1InslnStdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2InputInslnStdAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Stat3IggmusAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputIggmusAlnRep0.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Tr4StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562bTr4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Znf274UcdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputUcdAln.bam.regionPeak.gz

wgEncodeSydhTfbsHepg2bZnf274UcdAlnRep0.bam_VS_wgEncodeSydhTfbsHepg2bInputUcdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562bZnf274UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsNt2d1Znf274UcdAlnRep0.bam_VS_wgEncodeSydhTfbsNt2d1InputUcdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Zzz3StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHelas3Zzz3StdAlnRep0.bam_VS_wgEncodeSydhTfbsHelas3InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878Yy1StdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsHek293bElk4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsHek293InputUcdAln.bam.regionPeak.gz

wgEncodeSydhTfbsGm12878CfosStdAlnRep0.bam_VS_wgEncodeSydhTfbsGm12878InputStdAlnRep1.bam.regionPeak.gz

wgEncodeSydhTfbsK562CfosStdAlnRep0.bam_VS_wgEncodeSydhTfbsK562InputStdAlnRep0.bam.regionPeak.gz

ATF3

Ewan thinks this is a bad chip and/or outlier by histone analysis

BCL3

Ewan's thinks this is a bad chip; so does Anshul and Peggy

BRF4

Ewan thinks this is a bad chip and/or an outlier by histone analysis

NR4A1

Definitely a failed chip (Anshul)

Ewan thinks this is a bad chip and/or an outlier by histone analysis

p300

Pouya: "Known" motif for p300 on http://www.broadinstitute.org/~pouyak/motif-disc/human/ likely does not represent a true DNA binding specificity for p300. The discovered motifs appear to be for cell-type specific regulators.

TR4

Ewan thinks this is a bad chip and/or outlier by histone analysis.

Peggy: Which TR4 dataset are we talking about? We have shown in our published paper (BMC Genomics. 2010 Dec 2;11:689.) that the K562 TR4 peaks are over the start site, in between the two nucleosome peaks. Does the histone analysis down by ENCODE not agree with this? I just looked at our TR4 data. We have peaks in 4 cell types and they all overlap significantly. The peaks are all right over the start site in K562, HepG2, GM12878, and Hela. Note that these were all done so long ago that they are hg18; we should check to make sure the liftover worked correctly.

SIN3A

This has high H3k79me2 data. Mike notes this paper:


SIN3B and transcribed regions (SIN3b is a sin3a paralog, I think):

Mol Cell Biol. 2011 Jan;31(1):54-62. Epub 2010 Nov 1. A novel mammalian complex containing Sin3B mitigates histone acetylation and RNA polymerase II progression within transcribed loci.


Jelinic P, Pellegrino J, David G.

Transcription requires the progression of RNA polymerase II (RNAP II) through a permissive chromatin structure. Recent studies of Saccharomyces cerevisiae have demonstrated that the yeast Sin3 protein contributes to the restoration of the repressed chromatin structure at actively transcribed loci. Yet, the mechanisms underlying the restoration of the repressive chromatin structure at transcribed loci and its significance in gene expression have not been investigated in mammals. We report here the identification of a mammalian complex containing the corepressor Sin3B, the histone deacetylase HDAC1, Mrg15, and the PHD finger-containing Pf1 and show that this complex plays important roles in regulation of transcription. We demonstrate that this complex localizes at discrete loci approximately 1 kb downstream of the transcription start site of transcribed genes, and this localization requires both Pf1's and Mrg15's interaction with chromatin. Inactivation of this mammalian complex promotes increased RNAP II progression within transcribed regions and subsequent increased transcription. Our results define a novel mammalian complex that contributes to the regulation of transcription and point to divergent uses of the Sin3 protein homologues throughout evolution in the modulation of transcription. PMID:21041482 PMCID: PMC3019848


Sin3 and H3K36 methylation

Mol Cancer. 2006 Jun 28;5:26. Identification and characterization of Smyd2: a split SET/MYND domain-containing histone H3 lysine 36-specific methyltransferase that interacts with the Sin3 histone deacetylase complex. Brown MA, Sims RJ 3rd, Gottlieb PD, Tucker PW. BACKGROUND: Disrupting the balance of histone lysine methylation alters the expression of genes involved in tumorigenesis including proto-oncogenes and cell cycle regulators. Methylation of lysine residues is commonly catalyzed by a family of proteins that contain the SET domain. Here, we report the identification and characterization of the SET domain-containing protein, Smyd2. RESULTS: Smyd2 mRNA is most highly expressed in heart and brain tissue, as demonstrated by northern analysis and in situ hybridization. Over-expressed Smyd2 localizes to the cytoplasm and the nucleus in 293T cells. Although accumulating evidence suggests that methylation of histone 3, lysine 36 (H3K36) is associated with actively transcribed genes, we show that the SET domain of Smyd2 mediates H3K36 dimethylation and that Smyd2 represses transcription from an SV40-luciferase reporter. Smyd2 associates specifically with the Sin3A histone deacetylase complex, which was recently linked to H3K36 methylation within the coding regions of active genes in yeast. Finally, we report that exogenous expression of Smyd2 suppresses cell proliferation. CONCLUSION: We propose that Sin3A-mediated deacetylation within the coding regions of active genes is directly linked to the histone methyltransferase activity of Smyd2. Moreover, Smyd2 appears to restrain cell proliferation, likely through direct modulation of chromatin structure. PMID:16805913 PMCID: PMC1524980


HDAC8

Definitely a failed chip (Anshul)

This is a histone deactylase and has strongly biased histone modification patterns around it


HDACs and active gene bodies:

Cell. 2009 Sep 4;138(5):1019-31. Epub 2009 Aug 20. Genome-wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Wang Z, Zang C, Cui K, Schones DE, Barski A, Peng W, Zhao K. Histone acetyltransferases (HATs) and deacetylases (HDACs) function antagonistically to control histone acetylation. As acetylation is a histone mark for active transcription, HATs have been associated with active and HDACs with inactive genes. We describe here genome-wide mapping of HATs and HDACs binding on chromatin and find that both are found at active genes with acetylated histones. Our data provide evidence that HATs and HDACs are both targeted to transcribed regions of active genes by phosphorylated RNA Pol II. Furthermore, the majority of HDACs in the human genome function to reset chromatin by removing acetylation at active genes. Inactive genes that are primed by MLL-mediated histone H3K4 methylation are subject to a dynamic cycle of acetylation and deacetylation by transient HAT/HDAC binding, preventing Pol II from binding to these genes but poising them for future activation. Silent genes without any H3K4 methylation signal show no evidence of being bound by HDACs. PMID:19698979 PMCID: PMC2750862