It has long been hypothesized that aging and neurodegeneration are associated with somatic mutation in neurons; however, methodological hurdles have prevented testing this hypothesis directly. We used single-cell whole-genome sequencing to perform genome-wide somatic single-nucleotide variant (sSNV) identification on DNA from 161 single neurons from the prefrontal cortex and hippocampus of fifteen normal individuals (aged 4 months to 82 years) as well as nine individuals affected by early-onset neurodegeneration due to genetic disorders of DNA repair (Cockayne syndrome and Xeroderma pigmentosum). sSNVs increased approximately linearly with age in both areas (with a higher rate in hippocampus) and were more abundant in neurodegenerative disease. The accumulation of somatic mutations with age-which we term genosenium-shows age-related, region-related, and disease-related molecular signatures, and may be important in other human age-associated conditions.
Single cell whole-genome sequencing (scWGS) is providing novel insights into the nature of genetic heterogeneity in normal and diseased cells. However, the whole-genome amplification process required for scWGS introduces biases into the resulting sequencing that can confound downstream analysis. Here, we present a statistical method, with an accompanying package PaSD-qc (Power Spectral Density-qc), that evaluates the properties and quality of single cell libraries. It uses a modified power spectral density to assess amplification uniformity, amplicon size distribution, autocovariance and inter-sample consistency as well as to identify chromosomes with aberrant read-density profiles due either to copy alterations or poor amplification. These metrics provide a standard way to compare the quality of single cell samples as well as yield information necessary to improve variant calling strategies. We demonstrate the usefulness of this tool in comparing the properties of scWGS protocols, identifying potential chromosomal copy number variation, determining chromosomal and subchromosomal regions of poor amplification, and selecting high-quality libraries from low-coverage data for deep sequencing. The software is available free and open-source at https://github.com/parklab/PaSDqc.
Microsatellite instability (MSI) refers to the hypermutability of short repetitive sequences in the genome caused by impaired DNA mismatch repair. Although MSI has been studied for decades, large amounts of sequencing data now available allows us to examine the molecular fingerprints of MSI in greater detail. Here, we analyse ∼8,000 exomes and ∼1,000 whole genomes of cancer patients across 23 cancer types. Our analysis reveals that the frequency of MSI events is highly variable within and across tumour types. We also identify genes in DNA repair and oncogenic pathways recurrently subject to MSI and uncover non-coding loci that frequently display MSI. Finally, we propose a highly accurate exome-based predictive model for the MSI phenotype. These results advance our understanding of the genomic drivers and consequences of MSI, and our comprehensive catalogue of tumour-type-specific MSI loci will enable panel-based MSI testing to identify patients who are likely to benefit from immunotherapy.
The 4D Nucleome Network aims to develop and apply approaches to map the structure and dynamics of the human and mouse genomes in space and time with the goal of gaining deeper mechanistic insights into how the nucleus is organized and functions. The project will develop and benchmark experimental and computational approaches for measuring genome conformation and nuclear organization, and investigate how these contribute to gene regulation and other genome functions. Validated experimental technologies will be combined with biophysical approaches to generate quantitative models of spatial genome organization in different biological states, both in cell populations and in single cells.
Regulatory decisions in Drosophila require Polycomb group (PcG) proteins to maintain the silent state and Trithorax group (TrxG) proteins to oppose silencing. Since PcG and TrxG are ubiquitous and lack apparent sequence specificity, a long-standing model is that targeting occurs via protein interactions; for instance, between repressors and PcG proteins. Instead, we found that Pc-repressive complex 1 (PRC1) purifies with coactivators Fs(1)h [female sterile (1) homeotic] and Enok/Br140 during embryogenesis. Fs(1)h is a TrxG member and the ortholog of BRD4, a bromodomain protein that binds to acetylated histones and is a key transcriptional coactivator in mammals. Enok and Br140, another bromodomain protein, are orthologous to subunits of a mammalian MOZ/MORF acetyltransferase complex. Here we confirm PRC1-Br140 and PRC1-Fs(1)h interactions and identify their genomic binding sites. PRC1-Br140 bind developmental genes in fly embryos, with analogous co-occupancy of PRC1 and a Br140 ortholog, BRD1, at bivalent loci in human embryonic stem (ES) cells. We propose that identification of PRC1-Br140 "bivalent complexes" in fly embryos supports and extends the bivalency model posited in mammalian cells, in which the coexistence of H3K4me3 and H3K27me3 at developmental promoters represents a poised transcriptional state. We further speculate that local competition between acetylation and deacetylation may play a critical role in the resolution of bivalent protein complexes during development.
Molecular profiling of actionable mutations in refractory cancer patients has the potential to enable "precision medicine," wherein individualized therapies are guided based on genomic profiling. The molecular-screening program was intended to route participants to different candidate drugs in trials based on clinical-sequencing reports. In this screening program, we used a custom target-enrichment panel consisting of cancer-related genes to interrogate single-nucleotide variants, insertions and deletions, copy number variants, and a subset of gene fusions. From August 2014 through April 2015, 654 patients consented to participate in the program at Samsung Medical Center. Of these patients, 588 passed the quality control process for the 381-gene cancer-panel test, and 418 patients were included in the final analysis as being eligible for any anticancer treatment (127 gastric cancer, 122 colorectal cancer, 62 pancreatic/biliary tract cancer, 67 sarcoma/other cancer, and 40 genitourinary cancer patients). Of the 418 patients, 55 (12%) harbored a biomarker that guided them to a biomarker-selected clinical trial, and 184 (44%) patients harbored at least one genomic alteration that was potentially targetable. This study demonstrated that the panel-based sequencing program resulted in an increased rate of trial enrollment of metastatic cancer patients into biomarker-selected clinical trials. Given the expanding list of biomarker-selected trials, the guidance percentage to matched trials is anticipated to increase. IMPLICATIONS FOR PRACTICE: This study demonstrated that the panel-based sequencing program resulted in an increased rate of trial enrollment of metastatic cancer patients into biomarker-selected clinical trials. Given the expanding list of biomarker-selected trials, the guidance percentage to matched trials is anticipated to increase.
Accurate detection of genomic alterations using high-throughput sequencing is an essential component of precision cancer medicine. We characterize the variant allele fractions (VAFs) of somatic single nucleotide variants and indels across 5095 clinical samples profiled using a custom panel, CancerSCAN. Our results demonstrate that a significant fraction of clinically actionable variants have low VAFs, often due to low tumor purity and treatment-induced mutations. The percentages of mutations under 5% VAF across hotspots in EGFR, KRAS, PIK3CA, and BRAF are 16%, 11%, 12%, and 10%, respectively, with 24% for EGFR T790M and 17% for PIK3CA E545. For clinical relevance, we describe two patients for whom targeted therapy achieved remission despite low VAF mutations. We also characterize the read depths necessary to achieve sensitivity and specificity comparable to current laboratory assays. These results show that capturing low VAF mutations at hotspots by sufficient sequencing coverage and carefully tuned algorithms is imperative for a clinical assay.
Release of promoter-proximally paused RNA polymerase II (RNAPII) is a recently recognized transcriptional regulatory checkpoint. The biological roles of RNAPII pause release and the mechanisms by which extracellular signals control it are incompletely understood. Here we show that VEGF stimulates RNAPII pause release by stimulating acetylation of ETS1, a master endothelial cell transcriptional regulator. In endothelial cells, ETS1 binds transcribed gene promoters and stimulates their expression by broadly increasing RNAPII pause release. 34 VEGF enhances ETS1 chromatin occupancy and increases ETS1 acetylation, enhancing its binding to BRD4, which recruits the pause release machinery and increases RNAPII pause release. Endothelial cell angiogenic responses in vitro and in vivo require ETS1-mediated transduction of VEGF signaling to release paused RNAPII. Our results define an angiogenic pathway in which VEGF enhances ETS1-BRD4 interaction to broadly promote RNAPII pause release and drive angiogenesis.Promoter proximal RNAPII pausing is a rate-limiting transcriptional mechanism. Chen et al. show that this process is essential in angiogenesis by demonstrating that the endothelial master transcription factor ETS1 promotes global RNAPII pause release, and that this process is governed by VEGF.
Purpose Histologic transformation of EGFR mutant lung adenocarcinoma (LADC) into small-cell lung cancer (SCLC) has been described as one of the major resistant mechanisms for epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs). However, the molecular pathogenesis is still unclear. Methods We investigated 21 patients with advanced EGFR-mutant LADCs that were transformed into EGFR TKI-resistant SCLCs. Among them, whole genome sequencing was applied for nine tumors acquired at various time points from four patients to reconstruct their clonal evolutionary history and to detect genetic predictors for small-cell transformation. The findings were validated by immunohistochemistry in 210 lung cancer tissues. Results We identified that EGFR TKI-resistant LADCs and SCLCs share a common clonal origin and undergo branched evolutionary trajectories. The clonal divergence of SCLC ancestors from the LADC cells occurred before the first EGFR TKI treatments, and the complete inactivation of both RB1 and TP53 were observed from the early LADC stages in sequenced tumors. We extended the findings by immunohistochemistry in the early-stage LADC tissues of 75 patients treated with EGFR TKIs; inactivation of both Rb and p53 was strikingly more frequent in the small-cell-transformed group than in the nontransformed group (82% v 3%; odds ratio, 131; 95% CI, 19.9 to 859). Among patients registered in a predefined cohort (n = 65), an EGFR mutant LADC that harbored completely inactivated Rb and p53 had a 43× greater risk of small-cell transformation (relative risk, 42.8; 95% CI, 5.88 to 311). Branch-specific mutational signature analysis revealed that apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC)-induced hypermutation was frequent in the branches toward small-cell transformation. Conclusion EGFR TKI-resistant SCLCs are branched out early from the LADC clones that harbor completely inactivated RB1 and TP53. The evaluation of RB1 and TP53 status in EGFR TKI-treated LADCs is informative in predicting small-cell transformation.
Zhang Y, Kwok-Shing Ng P, Kucherlapati M, Chen F, Liu Y, Tsang YH, De Velasco G, Jeong KJ, Akbani R, Hadjipanayis A, Pantazi A, Bristow CA, Lee E, Mahadeshwar HS, Tang J, Zhang J, Yang L, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette LJ, Xi R, Chin L, Protopopov A, Westbrook TF, Shelley CS, Choueiri TK, Ittmann M, Van Waes C, Weinstein JN, Liang H, Henske EP, Godwin AK, Park PJ, Kucherlapati R, Scott KL, Mills GB, Kwiatkowski DJ, Creighton CJ. A Pan-Cancer Proteogenomic Atlas of PI3K/AKT/mTOR Pathway Alterations. Cancer Cell 2017;31(6):820-832.e3.Abstract
Molecular alterations involving the PI3K/AKT/mTOR pathway (including mutation, copy number, protein, or RNA) were examined across 11,219 human cancers representing 32 major types. Within specific mutated genes, frequency, mutation hotspot residues, in silico predictions, and functional assays were all informative in distinguishing the subset of genetic variants more likely to have functional relevance. Multiple oncogenic pathways including PI3K/AKT/mTOR converged on similar sets of downstream transcriptional targets. In addition to mutation, structural variations and partial copy losses involving PTEN and STK11 showed evidence for having functional relevance. A substantial fraction of cancers showed high mTOR pathway activity without an associated canonical genetic or genomic alteration, including cancers harboring IDH1 or VHL mutations, suggesting multiple mechanisms for pathway activation.
Blastocyst-derived embryonic stem cells (ESCs) and gonad-derived embryonic germ cells (EGCs) represent two classic types of pluripotent cell lines, yet their molecular equivalence remains incompletely understood. Here, we compare genome-wide methylation patterns between isogenic ESC and EGC lines to define epigenetic similarities and differences. Surprisingly, we find that sex rather than cell type drives methylation patterns in ESCs and EGCs. Cell fusion experiments further reveal that the ratio of X chromosomes to autosomes dictates methylation levels, with female hybrids being hypomethylated and male hybrids being hypermethylated. We show that the X-linked MAPK phosphatase DUSP9 is upregulated in female compared to male ESCs, and its heterozygous loss in female ESCs leads to male-like methylation levels. However, male and female blastocysts are similarly hypomethylated, indicating that sex-specific methylation differences arise in culture. Collectively, our data demonstrate the epigenetic similarity of sex-matched ESCs and EGCs and identify DUSP9 as a regulator of female-specific hypomethylation.
McConnell MJ, Moran JV, Abyzov A, Akbarian S, Bae T, Cortes-Ciriano I, Erwin JA, Fasching L, Flasch DA, Freed D, Ganz J, Jaffe AE, Kwan KY, Kwon M, Lodato MA, Mills RE, Paquola ACM, Rodin RE, Rosenbluh C, Sestan N, Sherman MA, Shin JH, Song S, Straub RE, Thorpe J, Weinberger DR, Urban AE, Zhou B, Gage FH, Lehner T, Senthil G, Walsh CA, Chess A, Courchesne E, Gleeson JG, Kidd JM, Park PJ, Pevsner J, Vaccarino FM, Brain Somatic Mosaicism Network BSM. Intersection of diverse neuronal genomes and neuropsychiatric disease: The Brain Somatic Mosaicism Network. Science 2017;356(6336)Abstract
Neuropsychiatric disorders have a complex genetic architecture. Human genetic population-based studies have identified numerous heritable sequence and structural genomic variants associated with susceptibility to neuropsychiatric disease. However, these germline variants do not fully account for disease risk. During brain development, progenitor cells undergo billions of cell divisions to generate the ~80 billion neurons in the brain. The failure to accurately repair DNA damage arising during replication, transcription, and cellular metabolism amid this dramatic cellular expansion can lead to somatic mutations. Somatic mutations that alter subsets of neuronal transcriptomes and proteomes can, in turn, affect cell proliferation and survival and lead to neurodevelopmental disorders. The long life span of individual neurons and the direct relationship between neural circuits and behavior suggest that somatic mutations in small populations of neurons can significantly affect individual neurodevelopment. The Brain Somatic Mosaicism Network has been founded to study somatic mosaicism both in neurotypical human brains and in the context of complex neuropsychiatric disorders.
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. AVAILABILITY: https://github.com/parklab/NGSCheckMate.
Spt5 is an essential and conserved factor that functions in transcription and co-transcriptional processes. However, many aspects of the requirement for Spt5 in transcription are poorly understood. We have analyzed the consequences of Spt5 depletion in Schizosaccharomyces pombe using four genome-wide approaches. Our results demonstrate that Spt5 is crucial for a normal rate of RNA synthesis and distribution of RNAPII over transcription units. In the absence of Spt5, RNAPII localization changes dramatically, with reduced levels and a relative accumulation over the first ∼500 bp, suggesting that Spt5 is required for transcription past a barrier. Spt5 depletion also results in widespread antisense transcription initiating within this barrier region. Deletions of this region alter the distribution of RNAPII on the sense strand, suggesting that the barrier observed after Spt5 depletion is normally a site at which Spt5 stimulates elongation. Our results reveal a global requirement for Spt5 in transcription elongation.
Genes encoding subunits of SWI/SNF (BAF) chromatin remodelling complexes are collectively altered in over 20% of human malignancies, but the mechanisms by which these complexes alter chromatin to modulate transcription and cell fate are poorly understood. Utilizing mouse embryonic fibroblast and cancer cell line models, here we show via ChIP-seq and biochemical assays that SWI/SNF complexes are preferentially targeted to distal lineage specific enhancers and interact with p300 to modulate histone H3 lysine 27 acetylation. We identify a greater requirement for SWI/SNF at typical enhancers than at most super-enhancers and at enhancers in untranscribed regions than in transcribed regions. Our data further demonstrate that SWI/SNF-dependent distal enhancers are essential for controlling expression of genes linked to developmental processes. Our findings thus establish SWI/SNF complexes as regulators of the enhancer landscape and provide insight into the roles of SWI/SNF in cellular fate control.
SMARCB1 (also known as SNF5, INI1, and BAF47), a core subunit of the SWI/SNF (BAF) chromatin-remodeling complex, is inactivated in nearly all pediatric rhabdoid tumors. These aggressive cancers are among the most genomically stable, suggesting an epigenetic mechanism by which SMARCB1 loss drives transformation. Here we show that, despite having indistinguishable mutational landscapes, human rhabdoid tumors exhibit distinct enhancer H3K27ac signatures, which identify remnants of differentiation programs. We show that SMARCB1 is required for the integrity of SWI/SNF complexes and that its loss alters enhancer targeting-markedly impairing SWI/SNF binding to typical enhancers, particularly those required for differentiation, while maintaining SWI/SNF binding at super-enhancers. We show that these retained super-enhancers are essential for rhabdoid tumor survival, including some that are shared by all subtypes, such as SPRY1, and other lineage-specific super-enhancers, such as SOX2 in brain-derived rhabdoid tumors. Taken together, our findings identify a new chromatin-based epigenetic mechanism underlying the tumor-suppressive activity of SMARCB1.
Genes encoding subunits of SWI/SNF (BAF) chromatin-remodeling complexes are collectively mutated in ∼20% of all human cancers. Although ARID1A is the most frequent target of mutations, the mechanism by which its inactivation promotes tumorigenesis is unclear. Here we demonstrate that Arid1a functions as a tumor suppressor in the mouse colon, but not the small intestine, and that invasive ARID1A-deficient adenocarcinomas resemble human colorectal cancer (CRC). These tumors lack deregulation of APC/β-catenin signaling components, which are crucial gatekeepers in common forms of intestinal cancer. We find that ARID1A normally targets SWI/SNF complexes to enhancers, where they function in coordination with transcription factors to facilitate gene activation. ARID1B preserves SWI/SNF function in ARID1A-deficient cells, but defects in SWI/SNF targeting and control of enhancer activity cause extensive dysregulation of gene expression. These findings represent an advance in colon cancer modeling and implicate enhancer-mediated gene regulation as a principal tumor-suppressor function of ARID1A.
Chromatin plays a critical role in faithful implementation of gene expression programs. Different post-translational modifications (PTMs) of histone proteins reflect the underlying state of gene activity, and many chromatin proteins write, erase, bind, or are repelled by, these histone marks. One such protein is UpSET, the Drosophila homolog of yeast Set3 and mammalian KMT2E (MLL5). Here, we show that UpSET is necessary for the proper balance between active and repressed states. Using CRISPR/Cas-9 editing, we generated S2 cells that are mutant for upSET We found that loss of UpSET is tolerated in S2 cells, but that heterochromatin is misregulated, as evidenced by a strong decrease in H3K9me2 levels assessed by bulk histone PTM quantification. To test whether this finding was consistent in the whole organism, we deleted the upSET coding sequence using CRISPR/Cas-9, which we found to be lethal in both sexes in flies. We were able to rescue this lethality using a tagged upSET transgene, and found that UpSET protein localizes to transcriptional start sites (TSS) of active genes throughout the genome. Misregulated heterochromatin is apparent by suppressed position effect variegation of the w(m4) allele in heterozygous upSET-deleted flies. Using nascent-RNA sequencing in the upSET-mutant S2 lines, we show that this result applies to heterochromatin genes generally. Our findings support a critical role for UpSET in maintaining heterochromatin, perhaps by delimiting the active chromatin environment.
Cervical cancer remains one of the leading causes of cancer-related deaths worldwide. Here we report the extensive molecular characterization of 228 primary cervical cancers, the largest comprehensive genomic study of cervical cancer to date. We observed striking APOBEC mutagenesis patterns and identified SHKBP1, ERBB3, CASP8, HLA-A, and TGFBR2 as novel significantly mutated genes in cervical cancer. We also discovered novel amplifications in immune targets CD274/PD-L1 and PDCD1LG2/PD-L2, and the BCAR4 lncRNA that has been associated with response to lapatinib. HPV integration was observed in all HPV18-related cases and 76% of HPV16-related cases, and was associated with structural aberrations and increased target gene expression. We identified a unique set of endometrial-like cervical cancers, comprised predominantly of HPV-negative tumors with high frequencies of KRAS, ARID1A, and PTEN mutations. Integrative clustering of 178 samples identified Keratin-low Squamous, Keratin-high Squamous, and Adenocarcinoma-rich subgroups. These molecular analyses reveal new potential therapeutic targets for cervical cancers.