BACKGROUND: For many genes, RNA polymerase II stably pauses before transitioning to productive elongation. Although polymerase II pausing has been shown to be a mechanism for regulating transcriptional activation, the extent to which it is involved in control of mammalian gene expression and its relationship to chromatin structure remain poorly understood. RESULTS: Here, we analyze 85 RNA polymerase II chromatin immunoprecipitation (ChIP)-sequencing experiments from 35 different murine and human samples, as well as related genome-wide datasets, to gain new insights into the relationship between polymerase II pausing and gene regulation. Across cell and tissue types, paused genes (pausing index > 2) comprise approximately 60 % of expressed genes and are repeatedly associated with specific biological functions. Paused genes also have lower cell-to-cell expression variability. Increased pausing has a non-linear effect on gene expression levels, with moderately paused genes being expressed more highly than other paused genes. The highest gene expression levels are often achieved through a novel pause-release mechanism driven by high polymerase II initiation. In three datasets examining the impact of extracellular signals, genes responsive to stimulus have slightly lower pausing index on average than non-responsive genes, and rapid gene activation is linked to conditional pause-release. Both chromatin structure and local sequence composition near the transcription start site influence pausing, with divergent features between mammals and Drosophila. Most notably, in mammals pausing is positively correlated with histone H2A.Z occupancy at promoters. CONCLUSIONS: Our results provide new insights into the contribution of RNA polymerase II pausing in mammalian gene regulation and chromatin structure.
Whole-genome sequencing data allow detection of copy number variation (CNV) at high resolution. However, estimation based on read coverage along the genome suffers from bias due to GC content and other factors. Here, we develop an algorithm called BIC-seq2 that combines normalization of the data at the nucleotide level and Bayesian information criterion-based segmentation to detect both somatic and germline CNVs accurately. Analysis of simulation data showed that this method outperforms existing methods. We apply this algorithm to low coverage whole-genome sequencing data from peripheral blood of nearly a thousand patients across eleven cancer types in The Cancer Genome Atlas (TCGA) to identify cancer-predisposing CNV regions. We confirm known regions and discover new ones including those covering KMT2C, GOLPH3, ERBB2 and PLAG1 Analysis of colorectal cancer genomes in particular reveals novel recurrent CNVs including deletions at two chromatin-remodeling genes RERE and NPM2 This method will be useful to many researchers interested in profiling CNVs from whole-genome sequencing data.
Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions. We find that the 5' fusion partners of functional fusions are often housekeeping genes, whereas the 3' fusion partners are enriched in tyrosine kinases. We establish the oncogenic potential of ROR1-DNAJC6 and CEP85L-ROS1 fusions by showing that they can promote cell proliferation in vitro and tumor formation in vivo. Furthermore, we found that ∼4% of the samples have massively rearranged chromosomes, many of which are associated with upregulation of oncogenes such as ERBB2 and TERT. Although the sensitivity of detecting structural alterations from exomes is considerably lower than that from whole genomes, this approach will be fruitful for the multitude of exomes that have been and will be generated, both in cancer and in other diseases.
Chromatin accessibility plays a fundamental role in gene regulation. Nucleosome placement, usually measured by quantifying protection of DNA from enzymatic digestion, can regulate accessibility. We introduce a metric that uses micrococcal nuclease (MNase) digestion in a novel manner to measure chromatin accessibility by combining information from several digests of increasing depths. This metric, MACC (MNase accessibility), quantifies the inherent heterogeneity of nucleosome accessibility in which some nucleosomes are seen preferentially at high MNase and some at low MNase. MACC interrogates each genomic locus, measuring both nucleosome location and accessibility in the same assay. MACC can be performed either with or without a histone immunoprecipitation step, and thereby compares histone and non-histone protection. We find that changes in accessibility at enhancers, promoters and other regulatory regions do not correlate with changes in nucleosome occupancy. Moreover, high nucleosome occupancy does not necessarily preclude high accessibility, which reveals novel principles of chromatin regulation.
BACKGROUND: While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking. RESULTS: Here we present the computational approach TIGER (Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic transductions in fifteen genomes from non-human primate species (chimpanzee, orangutan and rhesus macaque), as well as in a human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of transduction between primate species. CONCLUSIONS: By enabling detection of polymorphic transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis.
During tumor evolution, cancer cells can accumulate numerous genetic alterations, ranging from single nucleotide mutations to whole-chromosomal changes. Although a great deal of progress has been made in the past decades in characterizing genomic alterations, recent cancer genome sequencing studies have provided a wealth of information on the detailed molecular profiles of such alterations in various types of cancers. Here, we review our current understanding of the mechanisms and consequences of cancer genome instability, focusing on the findings uncovered through analysis of exome and whole-genome sequencing data. These analyses have shown that most cancers have evidence of genome instability, and the degree of instability is variable within and between cancer types. Importantly, we describe some recent evidence supporting the idea that chromosomal instability could be a major driving force in tumorigenesis and cancer evolution, actively shaping the genomes of cancer cells to maximize their survival advantage. Expected final online publication date for the Annual Review of Pathology: Mechanisms of Disease Volume 11 is May 23, 2016. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates.
Whether somatic mutations contribute functional diversity to brain cells is a long-standing question. Single-neuron genomics enables direct measurement of somatic mutation rates in human brain and promises to answer this question. A recent study (Upton et al., 2015) reported high rates of somatic LINE-1 element (L1) retrotransposition in the hippocampus and cerebral cortex that would have major implications for normal brain function, and further claimed these mutation events preferentially impact genes important for neuronal function. We identify errors in single-cell sequencing approach, bioinformatic analysis, and validation methods that led to thousands of false-positive artifacts being mistakenly interpreted as somatic mutation events. Our reanalysis of the data supports a corrected mutation frequency (0.2 per cell) more than fifty-fold lower than reported, inconsistent with the authors' conclusion of 'ubiquitous' L1 mosaicism, but consistent with L1 elements mobilizing occasionally. Through consideration of the challenges and pitfalls identified, we provide a foundation and framework for designing single-cell genomics studies.
Whole-exome sequencing (WES) has become a standard method for detecting genetic variants in human diseases. Although the primary use of WES data has been the identification of single nucleotide variations and indels, these data also offer a possibility of detecting copy number variations (CNVs) at high resolution. However, WES data have uneven read coverage along the genome owing to the target capture step, and the development of a robust WES-based CNV tool is challenging. Here, we evaluate six WES somatic CNV detection tools: ADTEx, CONTRA, Control-FREEC, EXCAVATOR, ExomeCNV and Varscan2. Using WES data from 50 kidney chromophobe, 50 bladder urothelial carcinoma, and 50 stomach adenocarcinoma patients from The Cancer Genome Atlas, we compared the CNV calls from the six tools with a reference CNV set that was identified by both single nucleotide polymorphism array 6.0 and whole-genome sequencing data. We found that these algorithms gave highly variable results: visual inspection reveals significant differences between the WES-based segmentation profiles and the reference profile, as well as among the WES-based profiles. Using a 50% overlap criterion, 13-77% of WES CNV calls were covered by CNVs from the reference set, up to 21% of the copy gains were called as losses or vice versa, and dramatic differences in CNV sizes and CNV numbers were observed. Overall, ADTEx and EXCAVATOR had the best performance with relatively high precision and sensitivity. We suggest that the current algorithms for somatic CNV detection from WES data are limited in their performance and that more robust algorithms are needed.
Zheng S, Cherniack AD, Dewal N, Moffitt RA, Danilova L, Murray BA, Lerario AM, Else T, Knijnenburg TA, Ciriello G, Kim S, Assie G, Morozova O, Akbani R, Shih J, Hoadley KA, Choueiri TK, Waldmann J, Mete O, Robertson GA, Wu H-T, Raphael BJ, Shao L, Meyerson M, Demeure MJ, Beuschlein F, Gill AJ, Sidhu SB, Almeida MQ, Fragoso MCBV, Cope LM, Kebebew E, Habra MA, Whitsett TG, Bussey KJ, Rainey WE, Asa SL, Bertherat J, Fassnacht M, Wheeler DA, Hammer GD, Giordano TJ, Verhaak RGW. Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. Cancer Cell 2016;30(2):363.
The Polycomb group (PcG) proteins are key conserved regulators of development, initially discovered in Drosophila and now strongly implicated in human disease. Nevertheless, differing silencing properties between the Drosophila and mammalian PcG systems have been observed. While specific DNA targeting sites for PcG proteins called Polycomb response elements (PREs) have been identified only in Drosophila, involvement of non-coding RNAs for PcG targeting has been favored in mammals. Another difference lies in the distribution patterns of PcG proteins. In mouse and human cells, PcG proteins show broad distributions, significantly overlapping with H3K27me3 domains. In contrast, only sharp peaks on PRE regions are observed for most PcG proteins in Drosophila, raising the question of how large domains of H3K27me3, up to many tens of kilobases, are formed and maintained in Drosophila. In this Extra View, we provide evidence that PcG distributions on silent chromatin in Drosophila are considerably broader than previously detected. Using BioTAP-XL, a chromatin crosslinking and tandem affinity purification approach, we find a broad, rather than PRE-limited overlap of PcG proteins with H3K27me3, suggesting a conserved spreading mechanism for PcG in flies and mammals.
Intravenous leiomyomatosis is an unusual smooth muscle neoplasm with quasi-malignant intravascular growth but a histologically banal appearance. Herein, we report expression and molecular cytogenetic analyses of a series of 12 intravenous leiomyomatosis cases to better understand the pathogenesis of intravenous leiomyomatosis. All cases were analyzed for the expression of HMGA2, MDM2, and CDK4 proteins by immunohistochemistry based on our previous finding of der(14)t(12;14)(q14.3;q24) in intravenous leiomyomatosis. Seven of 12 (58%) intravenous leiomyomatosis cases expressed HMGA2, and none expressed MDM2 or CDK4. Colocalization of hybridization signals for probes from the HMGA2 locus (12q14.3) and from 14q24 by interphase fluorescence in situ hybridization (FISH) was detected in a mean of 89.2% of nuclei in HMGA2-positive cases by immunohistochemistry, but in only 12.4% of nuclei in negative cases, indicating an association of HMGA2 expression and this chromosomal rearrangement (P=8.24 × 10(-10)). Four HMGA2-positive cases had greater than two HMGA2 hybridization signals per cell. No cases showed loss of a hybridization signal by interphase FISH for the frequently deleted region of 7q22 in uterine leiomyomata. One intravenous leiomyomatosis case analyzed by array comparative genomic hybridization revealed complex copy number variations. Finally, expression profiling was performed on three intravenous leiomyomatosis cases. Interestingly, hierarchical cluster analysis of the expression profiles revealed segregation of the intravenous leiomyomatosis cases with leiomyosarcoma rather than with myometrium, uterine leiomyoma of the usual histological type, or plexiform leiomyoma. These findings suggest that intravenous leiomyomatosis cases share some molecular cytogenetic characteristics with uterine leiomyoma, and expression profiles similar to that of leiomyosarcoma cases, further supporting their intermediate, quasi-malignant behavior.Modern Pathology advance online publication, 19 February 2016; doi:10.1038/modpathol.2016.36.
Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, Morozova O, Newton Y, Radenbaugh A, Pagnotta SM, Anjum S, Wang J, Manyam G, Zoppoli P, Ling S, Rao AA, Grifford M, Cherniack AD, Zhang H, Poisson L, Carlotti CG, da Tirapelli DPC, Rao A, Mikkelsen T, Lau CC, Yung AWK, Rabadan R, Huse J, Brat DJ, Lehman NL, Barnholtz-Sloan JS, Zheng S, Hess K, Rao G, Meyerson M, Beroukhim R, Cooper L, Akbani R, Wrensch M, Haussler D, Aldape KD, Laird PW, Gutmann DH, Gutmann DH, Noushmehr H, Iavarone A, Verhaak RGW. Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 2016;164(3):550-63.Abstract
Therapy development for adult diffuse glioma is hindered by incomplete knowledge of somatic glioma driving alterations and suboptimal disease classification. We defined the complete set of genes associated with 1,122 diffuse grade II-III-IV gliomas from The Cancer Genome Atlas and used molecular profiles to improve disease classification, identify molecular correlations, and provide insights into the progression from low- to high-grade disease. Whole-genome sequencing data analysis determined that ATRX but not TERT promoter mutations are associated with increased telomere length. Recent advances in glioma classification based on IDH mutation and 1p/19q co-deletion status were recapitulated through analysis of DNA methylation profiles, which identified clinically relevant molecular subsets. A subtype of IDH mutant glioma was associated with DNA demethylation and poor outcome; a group of IDH-wild-type diffuse glioma showed molecular similarity to pilocytic astrocytoma and relatively favorable survival. Understanding of cohesive disease groups may aid improved clinical outcomes.
Chen F, Zhang Y, Şenbabaoğlu Y, Ciriello G, Yang L, Reznik E, Shuch B, Micevic G, De Velasco G, Shinbrot E, Noble MS, Lu Y, Covington KR, Xi L, Drummond JA, Muzny D, Kang H, Lee J, Tamboli P, Reuter V, Shelley CS, Kaipparettu BA, Bottaro DP, Godwin AK, Gibbs RA, Getz G, Kucherlapati R, Park PJ, Sander C, Henske EP, Zhou JH, Kwiatkowski DJ, Ho TH, Choueiri TK, Hsieh JJ, Akbani R, Mills GB, Hakimi AA, Wheeler DA, Creighton CJ. Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma. Cell Rep 2016;Abstract
On the basis of multidimensional and comprehensive molecular characterization (including DNA methalylation and copy number, RNA, and protein expression), we classified 894 renal cell carcinomas (RCCs) of various histologic types into nine major genomic subtypes. Site of origin within the nephron was one major determinant in the classification, reflecting differences among clear cell, chromophobe, and papillary RCC. Widespread molecular changes associated with TFE3 gene fusion or chromatin modifier genes were present within a specific subtype and spanned multiple subtypes. Differences in patient survival and in alteration of specific pathways (including hypoxia, metabolism, MAP kinase, NRF2-ARE, Hippo, immune checkpoint, and PI3K/AKT/mTOR) could further distinguish the subtypes. Immune checkpoint markers and molecular signatures of T cell infiltrates were both highest in the subtype associated with aggressive clear cell RCC. Differences between the genomic subtypes suggest that therapeutic strategies could be tailored to each RCC disease subset.