Goldman MJ*, Zhang J*, Fonseca NA*, Cortés-Ciriano I*, Xiang Q, Craft B, Piñeiro-Yáñez E, O'Connor BD, Bazant W, Barrera E, Muñoz-Pomer A, Petryszak R, Füllgrabe A, Al-Shahrour F, Keays M, Haussler D, Weinstein JN, Huber W, Valencia A, Park PJ, Papatheodorou I, Zhu J, Ferretti V, Vazquez M. A user guide for the online exploration and visualization of PCAWG data. Nat Commun 2020;11(1):3400.Abstract
The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.
The three-dimensional conformation of a genome can be profiled using Hi-C, a technique that combines chromatin conformation capture with high-throughput sequencing. However, structural variations often yield features that can be mistaken for chromosomal interactions. Here, we describe a computational method HiNT (Hi-C for copy Number variation and Translocation detection), which detects copy number variations and interchromosomal translocations within Hi-C data with breakpoints at single base-pair resolution. We demonstrate that HiNT outperforms existing methods on both simulated and real data. We also show that Hi-C can supplement whole-genome sequencing in structure variant detection by locating breakpoints in repetitive regions.
Combined PARP and immune checkpoint inhibition has yielded encouraging results in ovarian cancer, but predictive biomarkers are lacking. We performed immunogenomic profiling and highly multiplexed single-cell imaging on tumor samples from patients enrolled in a Phase I/II trial of niraparib and pembrolizumab in ovarian cancer (NCT02657889). We identify two determinants of response; mutational signature 3 reflecting defective homologous recombination DNA repair, and positive immune score as a surrogate of interferon-primed exhausted CD8 + T-cells in the tumor microenvironment. Presence of one or both features associates with an improved outcome while concurrent absence yields no responses. Single-cell spatial analysis reveals prominent interactions of exhausted CD8 + T-cells and PD-L1 + macrophages and PD-L1 + tumor cells as mechanistic determinants of response. Furthermore, spatial analysis of two extreme responders shows differential clustering of exhausted CD8 + T-cells with PD-L1 + macrophages in the first, and exhausted CD8 + T-cells with cancer cells harboring genomic PD-L1 and PD-L2 amplification in the second.
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10-18.
A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes1-7. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types8. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions-as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2-7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and-in liver cancer-frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.
Rodriguez-Martin B, Alvarez EG, Baez-Ortega A, Zamora J, Supek F, Demeulemeester J, Santamarina M, Ju YS, Temes J, Garcia-Souto D, Detering H, Li Y, Rodriguez-Castro J, Dueso-Barroso A, Bruzos AL, Dentro SC, Blanco MG, Contino G, Ardeljan D, Tojo M, Roberts ND, Zumalave S, Edwards PAW, Weischenfeldt J, Puiggròs M, Chong Z, Chen K, Lee EA, Wala JA, Raine K, Butler A, Waszak SM, Navarro FCP, Schumacher SE, Monlong J, Maura F, Bolli N, Bourque G, Gerstein M, Park PJ, Wedge DC, Beroukhim R, Torrents D, Korbel JO, Martincorena I, Fitzgerald RC, Van Loo P, Kazazian HH, Burns KH, Group PCAWGSVW, Campbell PJ, Tubio JMC, Consortium PCAWG. Pan-cancer analysis of whole genome identifies driver rearrangements promoted by LINE-1 retrotransposition [Internet]. Nature Genetics 2020;52(3):306-319. Publisher's VersionAbstract
About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.
Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXXtrunc) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXXtrunc tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.
Single-cell Hi-C (scHi-C) allows the study of cell-to-cell variability in chromatin structure and dynamics. However, the high level of noise inherent in current scHi-C protocols necessitates careful assessment of data quality before biological conclusions can be drawn. Here we present GiniQC, which quantifies unevenness in the distribution of inter-chromosomal reads in the scHi-C contact matrix to measure the level of noise. Our examples show the utility of GiniQC in assessing the quality of scHi-C data as a complement to existing quality control measures. We also demonstrate how GiniQC can help inform the impact of various data processing steps on data quality.
Kim J, Hu C, Moufawad El Achkar C, Black LE, Douville J, Larson A, Pendergast MK, Goldkind SF, Lee EA, Kuniholm A, Soucy A, Vaze J, Belur NR, Fredriksen K, Stojkovska I, Tsytsykova A, Armant M, DiDonato RL, Choi J, Cornelissen L, Pereira LM, Augustine EF, Genetti CA, Dies K, Barton B, Williams L, Goodlett BD, Riley BL, Pasternak A, Berry ER, Pflock KA, Chu S, Reed C, Tyndall K, Agrawal PB, Beggs AH, Grant EP, Urion DK, Snyder RO, Waisbren SE, Poduri A, Park PJ, Patterson A, Biffi A, Mazzulli JR, Bodamer O, Berde CB, Yu TW. Patient-Customized Oligonucleotide Therapy for a Rare Genetic Disease. N Engl J Med 2019;Abstract
Genome sequencing is often pivotal in the diagnosis of rare diseases, but many of these conditions lack specific treatments. We describe how molecular diagnosis of a rare, fatal neurodegenerative condition led to the rational design, testing, and manufacture of milasen, a splice-modulating antisense oligonucleotide drug tailored to a particular patient. Proof-of-concept experiments in cell lines from the patient served as the basis for launching an "N-of-1" study of milasen within 1 year after first contact with the patient. There were no serious adverse events, and treatment was associated with objective reduction in seizures (determined by electroencephalography and parental reporting). This study offers a possible template for the rapid development of patient-customized treatments. (Funded by Mila's Miracle Foundation and others.).
Recent advances in single cell technology have enabled dissection of cellular heterogeneity in great detail. However, analysis of single cell DNA sequencing data remains challenging due to bias and artifacts that arise during DNA extraction and whole-genome amplification, including allelic imbalance and dropout. Here, we present a framework for statistical estimation of allele-specific amplification imbalance at any given position in single cell whole-genome sequencing data by utilizing the allele frequencies of heterozygous single nucleotide polymorphisms in the neighborhood. The resulting allelic imbalance profile is critical for determining whether the variant allele fraction of an observed mutation is consistent with the expected fraction for a true variant. This method, implemented in SCAN-SNV (Single Cell ANalysis of SNVs), substantially improves the identification of somatic variants in single cells. Our allele balance framework is broadly applicable to genotype analysis of any variant type in any data that might exhibit allelic imbalance.
Mutations in BRCA1 and/or BRCA2 (BRCA1/2) are the most common indication of deficiency in the homologous recombination (HR) DNA repair pathway. However, recent genome-wide analyses have shown that the same pattern of mutations found in BRCA1/2-mutant tumors is also present in several other tumors. Here, we present a new computational tool called Signature Multivariate Analysis (SigMA), which can be used to accurately detect the mutational signature associated with HR deficiency from targeted gene panels. Whereas previous methods require whole-genome or whole-exome data, our method detects the HR-deficiency signature even from low mutation counts, by using a likelihood-based measure combined with machine-learning techniques. Cell lines that we identify as HR deficient show a significant response to poly (ADP-ribose) polymerase (PARP) inhibitors; patients with ovarian cancer whom we found to be HR deficient show a significantly longer overall survival with platinum regimens. By enabling panel-based identification of mutational signatures, our method substantially increases the number of patients that may be considered for treatments targeting HR deficiency.
Whole-genome sequencing of DNA from single cells has the potential to reshape our understanding of mutational heterogeneity in normal and diseased tissues. However, a major difficulty is distinguishing amplification artifacts from biologically derived somatic mutations. Here, we describe linked-read analysis (LiRA), a method that accurately identifies somatic singlenucleotide variants (sSNVs) by using read-level phasing with nearby germline heterozygous polymorphisms, thereby enabling the characterization of mutational signatures and estimation of somatic mutation rates in single cells.
Cancer is often seen as a disease of mutations and chromosomal abnormalities. However, some cancers, including pediatric rhabdoid tumors (RTs), lack recurrent alterations targetable by current drugs and need alternative, informed therapeutic options. To nominate potential targets, we performed a high-throughput small-molecule screen complemented by a genome-scale CRISPR-Cas9 gene-knockout screen in a large number of RT and control cell lines. These approaches converged to reveal several receptor tyrosine kinases (RTKs) as therapeutic targets, with RTK inhibition effective in suppressing RT cell growth in vitro and against a xenograft model in vivo. RT cell lines highly express and activate (phosphorylate) different RTKs, creating dependency without mutation or amplification. Downstream of RTK signaling, we identified PTPN11, encoding the pro-growth signaling protein SHP2, as a shared dependency across all RT cell lines. This study demonstrates that large-scale perturbational screening can uncover vulnerabilities in cancers with "quiet" genomes.
BACKGROUND: Genomic rearrangements exert a heavy influence on the molecular landscape of cancer. New analytical approaches integrating somatic structural variants (SSVs) with altered gene features represent a framework by which we can assign global significance to a core set of genes, analogous to established methods that identify genes non-randomly targeted by somatic mutation or copy number alteration. While recent studies have defined broad patterns of association involving gene transcription and nearby SSV breakpoints, global alterations in DNA methylation in the context of SSVs remain largely unexplored. RESULTS: By data integration of whole genome sequencing, RNA sequencing, and DNA methylation arrays from more than 1400 human cancers, we identify hundreds of genes and associated CpG islands (CGIs) for which the nearby presence of a somatic structural variant (SSV) breakpoint is recurrently associated with altered expression or DNA methylation, respectively, independently of copy number alterations. CGIs with SSV-associated increased methylation are predominantly promoter-associated, while CGIs with SSV-associated decreased methylation are enriched for gene body CGIs. Rearrangement of genomic regions normally having higher or lower methylation is often involved in SSV-associated CGI methylation alterations. Across cancers, the overall structural variation burden is associated with a global decrease in methylation, increased expression in methyltransferase genes and DNA damage response genes, and decreased immune cell infiltration. CONCLUSION: Genomic rearrangement appears to have a major role in shaping the cancer DNA methylome, to be considered alongside commonly accepted mechanisms including histone modifications and disruption of DNA methyltransferases.
We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network.
Mutational processes giving rise to lung adenocarcinomas (LADCs) in non-smokers remain elusive. We analyzed 138 LADC whole genomes, including 83 cases with minimal contribution of smoking-associated mutational signature. Genomic rearrangements were not correlated with smoking-associated mutations and frequently served as driver events of smoking-signature-low LADCs. Complex genomic rearrangements, including chromothripsis and chromoplexy, generated 74% of known fusion oncogenes, including EML4-ALK, CD74-ROS1, and KIF5B-RET. Unlike other collateral rearrangements, these fusion-oncogene-associated rearrangements were frequently copy-number-balanced, representing a genomic signature of early oncogenesis. Analysis of mutation timing revealed that fusions and point mutations of canonical oncogenes were often acquired in the early decades of life. During a long latency, cancer-related genes were disrupted or amplified by complex rearrangements. The genomic landscape was different between subgroups-EGFR-mutant LADCs had frequent whole-genome duplications with p53 mutations, whereas fusion-oncogene-driven LADCs had frequent SETD2 mutations. Our study highlights LADC oncogenesis driven by endogenous mutational processes.
Howard TP, Arnoff TE, Song MR, Giacomelli AO, Wang X, Hong AL, Dharia NV, Wang S, Vazquez F, Pham M-T, Morgan AM, Wachter F, Bird GH, Kugener G, Oberlick EM, Rees MG, Tiv HL, Hwang JH, Walsh KH, Cook A, Krill-Burger JM, Tsherniak A, Gokhale PC, Park PJ, Stegmaier K, Walensky LD, Hahn WC, Roberts CWM. MDM2 and MDM4 Are Therapeutic Vulnerabilities in Malignant Rhabdoid Tumors. Cancer Research 2019;79(9)Abstract
Malignant rhabdoid tumors (MRT) are highly aggressive pediatric cancers that respond poorly to current therapies. In this study, we screened several MRT cell lines with large-scale RNAi, CRISPR-Cas9, and small-molecule libraries to identify potential drug targets specific for these cancers. We discovered MDM2 and MDM4, the canonical negative regulators of p53, as significant vulnerabilities. Using two compounds currently in clinical development, idasanutlin (MDM2-specific) and ATSP-7041 (MDM2/4-dual), we show that MRT cells were more sensitive than other p53 wild-type cancer cell lines to inhibition of MDM2 alone as well as dual inhibition of MDM2/4. These compounds caused significant upregulation of the p53 pathway in MRT cells, and sensitivity was ablated by CRISPR-Cas9–mediated inactivation of TP53. We show that loss of SMARCB1, a subunit of the SWI/SNF (BAF) complex mutated in nearly all MRTs, sensitized cells to MDM2 and MDM2/4 inhibition by enhancing p53-mediated apoptosis. Both MDM2 and MDM2/4 inhibition slowed MRT xenograft growth in vivo, with a 5-day idasanutlin pulse causing marked regression of all xenografts, including durable complete responses in 50% of mice. Together, these studies identify a genetic connection between mutations in the SWI/SNF chromatin-remodeling complex and the tumor suppressor gene TP53 and provide preclinical evidence to support the targeting of MDM2 and MDM4 in this often-fatal pediatric cancer.
Glioblastoma is a malignant brain tumor characterized by rapid growth, diffuse invasion and therapeutic resistance. We recently used microRNA expression profiles to subclassify glioblastoma into five genetically and clinically distinct subclasses, and showed that microRNAs both define and contribute to the phenotypes of these subclasses. Here we show that miR-29a activates a multi-faceted growth and invasion program that promotes glioblastoma aggressiveness.
Cell behaviors are dictated by epigenetic and transcriptional programs. Little is known about how extracellular stimuli modulate these programs to reshape gene expression and control cell behavioral responses. Here, we interrogated the epigenetic and transcriptional response of endothelial cells to VEGFA treatment and found rapid chromatin changes that mediate broad transcriptomic alterations. VEGFA-responsive genes were associated with active promoters, but changes in promoter histone marks were not tightly linked to gene expression changes. VEGFA altered transcription factor occupancy and the distal epigenetic landscape, which profoundly contributed to VEGFA-dependent changes in gene expression. Integration of gene expression, dynamic enhancer, and transcription factor occupancy changes induced by VEGFA yielded a VEGFA-regulated transcriptional regulatory network, which revealed that the small MAF transcription factors are master regulators of the VEGFA transcriptional program and angiogenesis. Collectively these results revealed that extracellular stimuli rapidly reconfigure the chromatin landscape to coordinately regulate biological responses.