Epigenetics

Marinov GK, Kundaje A, Park PJ, Wold BJ. Large-scale quality analysis of published ChIP-seq data. G3 2014;4(2):209-23.Abstract

ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of datasets scored as being highly successful, but a substantial minority (20%) were of apparently poor quality, and another ∼25% were of intermediate quality. We discuss how different uses of ChIP-seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e., no immunoprecipitation and mock immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high-quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.

Gorchakov AA, Alekseyenko AA, Kharchenko P, Park PJ, Kuroda MI. Long-range spreading of dosage compensation in Drosophila captures transcribed autosomal genes inserted on X. Genes Dev 2009;23(19):2266-71.Abstract

Dosage compensation in Drosophila melanogaster males is achieved via targeting of male-specific lethal (MSL) complex to X-linked genes. This is proposed to involve sequence-specific recognition of the X at approximately 150-300 chromatin entry sites, and subsequent spreading to active genes. Here we ask whether the spreading step requires transcription and is sequence-independent. We find that MSL complex binds, acetylates, and up-regulates autosomal genes inserted on X, but only if transcriptionally active. We conclude that a long-sought specific DNA sequence within X-linked genes is not obligatory for MSL binding. Instead, linkage and transcription play the pivotal roles in MSL targeting irrespective of gene origin and DNA sequence.

Larschan E*, Bishop EP*, Kharchenko PV, Core LJ, Lis JT, Park PJ**, Kuroda MI**. X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila. Nature 2011;471(7336):115-8.Abstract

The evolution of sex chromosomes has resulted in numerous species in which females inherit two X chromosomes but males have a single X, thus requiring dosage compensation. MSL (Male-specific lethal) complex increases transcription on the single X chromosome of Drosophila males to equalize expression of X-linked genes between the sexes. The biochemical mechanisms used for dosage compensation must function over a wide dynamic range of transcription levels and differential expression patterns. It has been proposed that the MSL complex regulates transcriptional elongation to control dosage compensation, a model subsequently supported by mapping of the MSL complex and MSL-dependent histone 4 lysine 16 acetylation to the bodies of X-linked genes in males, with a bias towards 3' ends. However, experimental analysis of MSL function at the mechanistic level has been challenging owing to the small magnitude of the chromosome-wide effect and the lack of an in vitro system for biochemical analysis. Here we use global run-on sequencing (GRO-seq) to examine the specific effect of the MSL complex on RNA Polymerase II (RNAP II) on a genome-wide level. Results indicate that the MSL complex enhances transcription by facilitating the progression of RNAP II across the bodies of active X-linked genes. Improving transcriptional output downstream of typical gene-specific controls may explain how dosage compensation can be imposed on the diverse set of genes along an entire chromosome.

Woo CJ, Kharchenko PV, Daheron L, Park PJ, Kingston RE. Variable requirements for DNA-binding proteins at polycomb-dependent repressive regions in human HOX clusters. Mol Cell Biol 2013;33(16):3274-85.Abstract

Polycomb group (PcG)-mediated repression is an evolutionarily conserved process critical for cell fate determination and maintenance of gene expression during embryonic development. However, the mechanisms underlying PcG recruitment in mammals remain unclear since few regulatory sites have been identified. We report two novel prospective PcG-dependent regulatory elements within the human HOXB and HOXC clusters and compare their repressive activities to a previously identified element in the HOXD cluster. These regions recruited the PcG proteins BMI1 and SUZ12 to a reporter construct in mesenchymal stem cells and conferred repression that was dependent upon PcG expression. Furthermore, we examined the potential of two DNA-binding proteins, JARID2 and YY1, to regulate PcG activity at these three elements. JARID2 has differential requirements, whereas YY1 appears to be required for repressive activity at all 3 sites. We conclude that distinct elements of the mammalian HOX clusters can recruit components of the PcG complexes and confer repression, similar to what has been seen in Drosophila. These elements, however, have diverse requirements for binding factors, which, combined with previous data on other loci, speaks to the complexity of PcG targeting in mammals.

Namekawa SH, Park PJ, Zhang L-F, Shima JE, McCarrey JR, Griswold MD, Lee JT. Postmeiotic sex chromatin in the male germline of mice. Curr Biol 2006;16(7):660-7.Abstract

In mammals, the X and Y chromosomes are subject to meiotic sex chromosome inactivation (MSCI) during prophase I in the male germline, but their status thereafter is currently unclear. An abundance of X-linked spermatogenesis genes has spawned the view that the X must be active . On the other hand, the idea that the imprinted paternal X of the early embryo may be preinactivated by MSCI suggests that silencing may persist longer . To clarify this issue, we establish a comprehensive X-expression profile during mouse spermatogenesis. Here, we discover that the X and Y occupy a novel compartment in the postmeiotic spermatid and adopt a non-Rabl configuration. We demonstrate that this postmeiotic sex chromatin (PMSC) persists throughout spermiogenesis into mature sperm and exhibits epigenetic similarity to the XY body. In the spermatid, 87% of X-linked genes remain suppressed postmeiotically, while autosomes are largely active. We conclude that chromosome-wide X silencing continues from meiosis to the end of spermiogenesis, and we discuss implications for proposed mechanisms of imprinted X-inactivation.

Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin.
Riddle NC*, Minoda A*, Kharchenko PV*, Alekseyenko AA, Schwartz YB, Tolstorukov MY, Gorchakov AA, Jaffe JD, Kennedy C, Linder-Basso D, Peach SE, Shanower G, Zheng H, Kuroda MI, Pirrotta V, Park PJ, Elgin SCR**, Karpen GH**. Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin. Genome Res 2011;21(2):147-63.Abstract

Eukaryotic genomes are packaged in two basic forms, euchromatin and heterochromatin. We have examined the composition and organization of Drosophila melanogaster heterochromatin in different cell types using ChIP-array analysis of histone modifications and chromosomal proteins. As anticipated, the pericentric heterochromatin and chromosome 4 are on average enriched for the "silencing" marks H3K9me2, H3K9me3, HP1a, and SU(VAR)3-9, and are generally depleted for marks associated with active transcription. The locations of the euchromatin-heterochromatin borders identified by these marks are similar in animal tissues and most cell lines, although the amount of heterochromatin is variable in some cell lines. Combinatorial analysis of chromatin patterns reveals distinct profiles for euchromatin, pericentric heterochromatin, and the 4th chromosome. Both silent and active protein-coding genes in heterochromatin display complex patterns of chromosomal proteins and histone modifications; a majority of the active genes exhibit both "activation" marks (e.g., H3K4me3 and H3K36me3) and "silencing" marks (e.g., H3K9me2 and HP1a). The hallmark of active genes in heterochromatic domains appears to be a loss of H3K9 methylation at the transcription start site. We also observe complex epigenomic profiles of intergenic regions, repeated transposable element (TE) sequences, and genes in the heterochromatic extensions. An unexpectedly large fraction of sequences in the euchromatic chromosome arms exhibits a heterochromatic chromatin signature, which differs in size, position, and impact on gene expression among cell types. We conclude that patterns of heterochromatin/euchromatin packaging show greater complexity and plasticity than anticipated. This comprehensive analysis provides a foundation for future studies of gene activity and chromosomal functions that are influenced by or dependent upon heterochromatin.

Alekseyenko AA, Ellison CE, Gorchakov AA, Zhou Q, Kaiser VB, Toda N, Walton Z, Peng S, Park PJ, Bachtrog D, Kuroda MI. Conservation and de novo acquisition of dosage compensation on newly evolved sex chromosomes in Drosophila. Genes Dev 2013;27(8):853-8.Abstract

Dosage compensation has arisen in response to the evolution of distinct male (XY) and female (XX) karyotypes. In Drosophila melanogaster, the MSL complex increases male X transcription approximately twofold. X-specific targeting is thought to occur through sequence-dependent binding to chromatin entry sites (CESs), followed by spreading in cis to active genes. We tested this model by asking how newly evolving sex chromosome arms in Drosophila miranda acquired dosage compensation. We found evidence for the creation of new CESs, with the analogous sequence and spacing as in D. melanogaster, providing strong support for the spreading model in the establishment of dosage compensation.

Peng S, Kuroda MI, Park PJ. Quantized correlation coefficient for measuring reproducibility of ChIP-chip data. BMC Bioinformatics 2010;11:399.Abstract

BACKGROUND: Chromatin immunoprecipitation followed by microarray hybridization (ChIP-chip) is used to study protein-DNA interactions and histone modifications on a genome-scale. To ensure data quality, these experiments are usually performed in replicates, and a correlation coefficient between replicates is used often to assess reproducibility. However, the correlation coefficient can be misleading because it is affected not only by the reproducibility of the signal but also by the amount of binding signal present in the data. RESULTS: We develop the Quantized correlation coefficient (QCC) that is much less dependent on the amount of signal. This involves discretization of data into set of quantiles (quantization), a merging procedure to group the background probes, and recalculation of the Pearson correlation coefficient. This procedure reduces the influence of the background noise on the statistic, which then properly focuses more on the reproducibility of the signal. The performance of this procedure is tested in both simulated and real ChIP-chip data. For replicates with different levels of enrichment over background and coverage, we find that QCC reflects reproducibility more accurately and is more robust than the standard Pearson or Spearman correlation coefficients. The quantization and the merging procedure can also suggest a proper quantile threshold for separating signal from background for further analysis. CONCLUSIONS: To measure reproducibility of ChIP-chip data correctly, a correlation coefficient that is robust to the amount of signal present should be used. QCC is one such measure. The QCC statistic can also be applied in a variety of other contexts for measuring reproducibility, including analysis of array CGH data for DNA copy number and gene expression data.

Kang H, McElroy KA, Jung YL, Alekseyenko AA, Zee BM, Park PJ, Kuroda MI. Sex comb on midleg (Scm) is a functional link between PcG-repressive complexes in Drosophila. Genes Dev 2015;29(11):1136-50.Abstract

The Polycomb group (PcG) proteins are key regulators of development in Drosophila and are strongly implicated in human health and disease. How PcG complexes form repressive chromatin domains remains unclear. Using cross-linked affinity purifications of BioTAP-Polycomb (Pc) or BioTAP-Enhancer of zeste [E(z)], we captured all PcG-repressive complex 1 (PRC1) or PRC2 core components and Sex comb on midleg (Scm) as the only protein strongly enriched with both complexes. Although previously not linked to PRC2, we confirmed direct binding of Scm and PRC2 using recombinant protein expression and colocalization of Scm with PRC1, PRC2, and H3K27me3 in embryos and cultured cells using ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing). Furthermore, we found that RNAi knockdown of Scm and overexpression of the dominant-negative Scm-SAM (sterile α motif) domain both affected the binding pattern of E(z) on polytene chromosomes. Aberrant localization of the Scm-SAM domain in long contiguous regions on polytene chromosomes revealed its independent ability to spread on chromatin, consistent with its previously described ability to oligomerize in vitro. Pull-downs of BioTAP-Scm captured PRC1 and PRC2 and additional repressive complexes, including PhoRC, LINT, and CtBP. We propose that Scm is a key mediator connecting PRC1, PRC2, and transcriptional silencing. Combined with previous structural and genetic analyses, our results strongly suggest that Scm coordinates PcG complexes and polymerizes to produce broad domains of PcG silencing.

Sohn K-A*, Ho JWK*, Djordjevic D, Jeong H-H, Park PJ**, Kim JH**. hiHMM: Bayesian non-parametric joint inference of chromatin state maps. Bioinformatics 2015;31(13):2066-74.Abstract

MOTIVATION: Genome-wide mapping of chromatin states is essential for defining regulatory elements and inferring their activities in eukaryotic genomes. A number of hidden Markov model (HMM)-based methods have been developed to infer chromatin state maps from genome-wide histone modification data for an individual genome. To perform a principled comparison of evolutionarily distant epigenomes, we must consider species-specific biases such as differences in genome size, strength of signal enrichment and co-occurrence patterns of histone modifications. RESULTS: Here, we present a new Bayesian non-parametric method called hierarchically linked infinite HMM (hiHMM) to jointly infer chromatin state maps in multiple genomes (different species, cell types and developmental stages) using genome-wide histone modification data. This flexible framework provides a new way to learn a consistent definition of chromatin states across multiple genomes, thus facilitating a direct comparison among them. We demonstrate the utility of this method using synthetic data as well as multiple modENCODE ChIP-seq datasets. CONCLUSION: The hierarchical and Bayesian non-parametric formulation in our approach is an important extension to the current set of methodologies for comparative chromatin landscape analysis. AVAILABILITY AND IMPLEMENTATION: Source codes are available at https://github.com/kasohn/hiHMM. Chromatin data are available at http://encode-x.med.harvard.edu/data_sets/chromatin/.

Biagioli M*, Ferrari F*, Mendenhall EM, Zhang Y, Erdin S, Vijayvargia R, Vallabh SM, Solomos N, Manavalan P, Ragavendran A, Ozsolak F, Lee JM, Talkowski ME, Gusella JF, Macdonald ME, Park PJ, Seong IS. Htt CAG repeat expansion confers pleiotropic gains of mutant huntingtin function in chromatin regulation. Hum Mol Genet 2015;Abstract

The CAG repeat expansion in the Huntington's disease gene HTT extends a polyglutamine tract in mutant huntingtin that enhances its ability to facilitate polycomb repressive complex 2 (PRC2). To gain insight into this dominant gain of function, we mapped histone modifications genome-wide across an isogenic panel of mouse embryonic stem cell (ESC) and neuronal progenitor cell (NPC) lines, comparing the effects of Htt null and different size Htt CAG mutations. We found that Htt is required in ESC for the proper deposition of histone H3K27me3 at a subset of 'bivalent' loci but in NPC it is needed at 'bivalent' loci for both the proper maintenance and the appropriate removal of this mark. In contrast, Htt CAG size, though changing histone H3K27me3, is prominently associated with altered histone H3K4me3 at 'active' loci. The sets of ESC and NPC genes with altered histone marks delineated by the lack of huntingtin or the presence of mutant huntingtin, though distinct, are enriched in similar pathways with apoptosis specifically highlighted for the CAG mutation. Thus, the manner by which huntingtin function facilitates PRC2 may afford mutant huntingtin with multiple opportunities to impinge upon the broader machinery that orchestrates developmentally appropriate chromatin status.

Ho JWK*, Jung YL*, Liu T*, Alver BH, Lee S, Ikegami K, Sohn K-A, Minoda A, Tolstorukov MY, Appert A, Parker SCJ, Gu T, Kundaje A, Riddle NC, Bishop EP, Egelhofer TA, Hu S'en S, Alekseyenko AA, Rechtsteiner A, Asker D, Belsky JA, Bowman SK, Chen BQ, Chen RA-J, Day DS, Dong Y, Dose AC, Duan X, Epstein CB, Ercan S, Feingold EA, Ferrari F, Garrigues JM, Gehlenborg N, Good PJ, Haseley P, He D, Herrmann M, Hoffman MM, Jeffers TE, Kharchenko PV, Kolasinska-Zwierz P, Kotwaliwale CV, Kumar N, Langley SA, Larschan EN, Latorre I, Libbrecht MW, Lin X, Park R, Pazin MJ, Pham HN, Plachetka A, Qin B, Schwartz YB, Shoresh N, Stempor P, Vielle A, Wang C, Whittle CM, Xue H, Kingston RE, Kim JH, Bernstein BE, Dernburg AF, Pirrotta V, Kuroda MI, Noble WS, Tullius TD, Kellis M, MacAlpine DM**, Strome S**, Elgin SCR**, Liu XS**, Lieb JD**, Ahringer J**, Karpen GH**, Park PJ**. Comparative analysis of metazoan chromatin organization. Nature 2014;512(7515):449-52.Abstract

Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.

Kann M, Ettou S*, Jung YL*, Lenz MO, Taglienti ME, Park PJ, Schermer B, Benzing T, Kreidberg JA. Genome-Wide Analysis of Wilms' Tumor 1-Controlled Gene Expression in Podocytes Reveals Key Regulatory Mechanisms. J Am Soc Nephrol 2015;26(9):2097-104.Abstract

The transcription factor Wilms' tumor suppressor 1 (WT1) is key to podocyte development and viability; however, WT1 transcriptional networks in podocytes remain elusive. We provide a comprehensive analysis of the genome-wide WT1 transcriptional network in podocytes in vivo using chromatin immunoprecipitation followed by sequencing (ChIPseq) and RNA sequencing techniques. Our data show a specific role for WT1 in regulating the podocyte-specific transcriptome through binding to both promoters and enhancers of target genes. Furthermore, we inferred a podocyte transcription factor network consisting of WT1, LMX1B, TCF21, Fox-class and TEAD family transcription factors, and MAFB that uses tissue-specific enhancers to control podocyte gene expression. In addition to previously described WT1-dependent target genes, ChIPseq identified novel WT1-dependent signaling systems. These targets included components of the Hippo signaling system, underscoring the power of genome-wide transcriptional-network analyses. Together, our data elucidate a comprehensive gene regulatory network in podocytes suggesting that WT1 gene regulatory function and podocyte cell-type specification can best be understood in the context of transcription factor-regulatory element network interplay.

Merlo P, Frost B, Peng S, Yang YJ, Park PJ, Feany M. p53 prevents neurodegeneration by regulating synaptic genes. Proc Natl Acad Sci U S A 2014;111(50):18055-60.Abstract

DNA damage has been implicated in neurodegenerative disorders, including Alzheimer's disease and other tauopathies, but the consequences of genotoxic stress to postmitotic neurons are poorly understood. Here we demonstrate that p53, a key mediator of the DNA damage response, plays a neuroprotective role in a Drosophila model of tauopathy. Further, through a whole-genome ChIP-chip analysis, we identify genes controlled by p53 in postmitotic neurons. We genetically validate a specific pathway, synaptic function, in p53-mediated neuroprotection. We then demonstrate that the control of synaptic genes by p53 is conserved in mammals. Collectively, our results implicate synaptic function as a central target in p53-dependent protection from neurodegeneration.

West JA*, Cook A*, Alver BH, Stadtfeld M, Deaton AM, Hochedlinger K, Park PJ**, Tolstorukov MY**, Kingston RE**. Nucleosomal occupancy changes locally over key regulatory regions during cell differentiation and reprogramming. Nat Commun 2014;5:4719.Abstract

Chromatin structure determines DNA accessibility. We compare nucleosome occupancy in mouse and human embryonic stem cells (ESCs), induced-pluripotent stem cells (iPSCs) and differentiated cell types using MNase-seq. To address variability inherent in this technique, we developed a bioinformatic approach to identify regions of difference (RoD) in nucleosome occupancy between pluripotent and somatic cells. Surprisingly, most chromatin remains unchanged; a majority of rearrangements appear to affect a single nucleosome. RoDs are enriched at genes and regulatory elements, including enhancers associated with pluripotency and differentiation. RoDs co-localize with binding sites of key developmental regulators, including the reprogramming factors Klf4, Oct4/Sox2 and c-Myc. Nucleosomal landscapes in ESC enhancers are extensively altered, exhibiting lower nucleosome occupancy in pluripotent cells than in somatic cells. Most changes are reset during reprogramming. We conclude that changes in nucleosome occupancy are a hallmark of cell differentiation and reprogramming and likely identify regulatory regions essential for these processes.

Pages