Publications

In Press
Luquette JL, Bohrson CL, Sherman M, Park PJ. Identification of somatic mutations in single cell DNA sequencing data using a spatial model of allelic imbalance. Nature Communications In Press;
Cortés-Ciriano I, Lee JJ-K, Xi R, Jain D, Jung YL, Yang L, Gordenin D, Klimczak LJ, Zhang C-Z, Pellman DS, Park PJ. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing [Internet]. Nature Genetics In Press; Publisher's VersionAbstract
Chromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.
Yang L, Wang S, Lee JJ-K, Lee S, Lee E, Shinbrot E, Wheeler DA, Kucherlapati R, Park PJ. An enhanced genetic model of colorectal cancer progression history. Genome Biology In Press;
2019
Gulhan DC, Lee JJ-K, Melloni GEM, Cortés-Ciriano I, Park PJ. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nature Genetics 2019;51(5):912-919.Abstract
Mutations in BRCA1 and/or BRCA2 (BRCA1/2) are the most common indication of deficiency in the homologous recombination (HR) DNA repair pathway. However, recent genome-wide analyses have shown that the same pattern of mutations found in BRCA1/2-mutant tumors is also present in several other tumors. Here, we present a new computational tool called Signature Multivariate Analysis (SigMA), which can be used to accurately detect the mutational signature associated with HR deficiency from targeted gene panels. Whereas previous methods require whole-genome or whole-exome data, our method detects the HR-deficiency signature even from low mutation counts, by using a likelihood-based measure combined with machine-learning techniques. Cell lines that we identify as HR deficient show a significant response to poly (ADP-ribose) polymerase (PARP) inhibitors; patients with ovarian cancer whom we found to be HR deficient show a significantly longer overall survival with platinum regimens. By enabling panel-based identification of mutational signatures, our method substantially increases the number of patients that may be considered for treatments targeting HR deficiency.
Bohrson CL, Barton AR, Lodato MA, Rodin RE, Luquette LJ, Viswanadham VV, Gulhan DC, Cortés-Ciriano I, Sherman MA, Kwon M, Coulter ME, Galor A, Walsh CA, Park PJ. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nature Genetics 2019;51:749-754.Abstract
Whole-genome sequencing of DNA from single cells has the potential to reshape our understanding of mutational heterogeneity in normal and diseased tissues. However, a major difficulty is distinguishing amplification artifacts from biologically derived somatic mutations. Here, we describe linked-read analysis (LiRA), a method that accurately identifies somatic singlenucleotide variants (sSNVs) by using read-level phasing with nearby germline heterozygous polymorphisms, thereby enabling the characterization of mutational signatures and estimation of somatic mutation rates in single cells.
Lee S, Johnson J, Vitzthum C, Kirli K, Alver BH, Park PJ. Tibanna: software for scalable execution of portable pipelines on the cloud [Internet]. Bioinformatics 2019; Publisher's VersionAbstract
We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network.
Lee JJ-K, Park S, Park H, Kim S, Lee J, Lee J, Youk J, Yi K, An Y, Park IK, Kang CH, Chung DH, Kim TM, Jeon YK, Hong D, Park PJ, Ju YS, Kim YT. Tracing Oncogene Rearrangements in the Mutational History of Lung Adenocarcinoma [Internet]. Cell 2019;177(7):1842-1857. Publisher's VersionAbstract
Mutational processes giving rise to lung adenocarcinomas (LADCs) in non-smokers remain elusive. We analyzed 138 LADC whole genomes, including 83 cases with minimal contribution of smoking-associated mutational signature. Genomic rearrangements were not correlated with smoking-associated mutations and frequently served as driver events of smoking-signature-low LADCs. Complex genomic rearrangements, including chromothripsis and chromoplexy, generated 74% of known fusion oncogenes, including EML4-ALK, CD74-ROS1, and KIF5B-RET. Unlike other collateral rearrangements, these fusion-oncogene-associated rearrangements were frequently copy-number-balanced, representing a genomic signature of early oncogenesis. Analysis of mutation timing revealed that fusions and point mutations of canonical oncogenes were often acquired in the early decades of life. During a long latency, cancer-related genes were disrupted or amplified by complex rearrangements. The genomic landscape was different between subgroups-EGFR-mutant LADCs had frequent whole-genome duplications with p53 mutations, whereas fusion-oncogene-driven LADCs had frequent SETD2 mutations. Our study highlights LADC oncogenesis driven by endogenous mutational processes.
Howard TP, Arnoff TE, Song MR, Giacomelli AO, Wang X, Hong AL, Dharia NV, Wang S, Vazquez F, Pham M-T, Morgan AM, Wachter F, Bird GH, Kugener G, Oberlick EM, Rees MG, Tiv HL, Hwang JH, Walsh KH, Cook A, Krill-Burger JM, Tsherniak A, Gokhale PC, Park PJ, Stegmaier K, Walensky LD, Hahn WC, Roberts CWM. MDM2 and MDM4 Are Therapeutic Vulnerabilities in Malignant Rhabdoid Tumors. Cancer Research 2019;79(9)Abstract
Malignant rhabdoid tumors (MRT) are highly aggressive pediatric cancers that respond poorly to current therapies. In this study, we screened several MRT cell lines with large-scale RNAi, CRISPR-Cas9, and small-molecule libraries to identify potential drug targets specific for these cancers. We discovered MDM2 and MDM4, the canonical negative regulators of p53, as significant vulnerabilities. Using two compounds currently in clinical development, idasanutlin (MDM2-specific) and ATSP-7041 (MDM2/4-dual), we show that MRT cells were more sensitive than other p53 wild-type cancer cell lines to inhibition of MDM2 alone as well as dual inhibition of MDM2/4. These compounds caused significant upregulation of the p53 pathway in MRT cells, and sensitivity was ablated by CRISPR-Cas9–mediated inactivation of TP53. We show that loss of SMARCB1, a subunit of the SWI/SNF (BAF) complex mutated in nearly all MRTs, sensitized cells to MDM2 and MDM2/4 inhibition by enhancing p53-mediated apoptosis. Both MDM2 and MDM2/4 inhibition slowed MRT xenograft growth in vivo, with a 5-day idasanutlin pulse causing marked regression of all xenografts, including durable complete responses in 50% of mice. Together, these studies identify a genetic connection between mutations in the SWI/SNF chromatin-remodeling complex and the tumor suppressor gene TP53 and provide preclinical evidence to support the targeting of MDM2 and MDM4 in this often-fatal pediatric cancer.
Zhao Y, Huang W, Kim T-M, Jung Y, Menon LG, Xing H, Li H, Carroll RS, Park PJ, Yang HW, Johnson MD. MicroRNA-29a activates a multicomponent growth and invasion program in glioblastoma. Journal of Experimental & Clinical Cancer Research 2019;38(36)Abstract
Glioblastoma is a malignant brain tumor characterized by rapid growth, diffuse invasion and therapeutic resistance. We recently used microRNA expression profiles to subclassify glioblastoma into five genetically and clinically distinct subclasses, and showed that microRNAs both define and contribute to the phenotypes of these subclasses. Here we show that miR-29a activates a multi-faceted growth and invasion program that promotes glioblastoma aggressiveness.
Wang S, Chen J, Garcia SP, Liang X, Zhang F, Yan P, Yu H, Wei W, Li Z, Wang J, Le H, Han Z, Luo X, Day DS, Stevens SM, Zhang Y, Park PJ, Liu Z-jie, Sun K, Yuan G-C, Pu WT, Zhang B. A dynamic and integrated epigenetic program at distal regions orchestrates transcriptional responses to VEGFA. Genome Research 2019;29:193-207.Abstract
Cell behaviors are dictated by epigenetic and transcriptional programs. Little is known about how extracellular stimuli modulate these programs to reshape gene expression and control cell behavioral responses. Here, we interrogated the epigenetic and transcriptional response of endothelial cells to VEGFA treatment and found rapid chromatin changes that mediate broad transcriptomic alterations. VEGFA-responsive genes were associated with active promoters, but changes in promoter histone marks were not tightly linked to gene expression changes. VEGFA altered transcription factor occupancy and the distal epigenetic landscape, which profoundly contributed to VEGFA-dependent changes in gene expression. Integration of gene expression, dynamic enhancer, and transcription factor occupancy changes induced by VEGFA yielded a VEGFA-regulated transcriptional regulatory network, which revealed that the small MAF transcription factors are master regulators of the VEGFA transcriptional program and angiogenesis. Collectively these results revealed that extracellular stimuli rapidly reconfigure the chromatin landscape to coordinately regulate biological responses.
Wang X, Wang S, Troisi EC, Howard TP, Haswell JR, Wolf BK, Hawk WH, Ramos P, Oberlick EM, Tzvetkov EP, Vazquez F, Hahn WC, Park PJ**, Roberts CWM**. BRD9 defines a SWI/SNF sub-complex and constitutes a specific vulnerability in malignant rhabdoid tumors. Nature Communications 2019;Abstract
Bromodomain-containing protein 9 (BRD9) is a recently identified subunit of SWI/SNF(BAF) chromatin remodeling complexes, yet its function is poorly understood. Here, using a genome-wide CRISPR-Cas9 screen, we show that BRD9 is a specific vulnerability in pediatric malignant rhabdoid tumors (RTs), which are driven by inactivation of the SMARCB1 subunit of SWI/SNF. We find that BRD9 exists in a unique SWI/SNF sub-complex that lacks SMARCB1, which has been considered a core subunit. While SMARCB1-containing SWI/SNF complexes are bound preferentially at enhancers, we show that BRD9-containing complexes exist at both promoters and enhancers. Mechanistically, we show that SMARCB1 loss causes increased BRD9 incorporation into SWI/SNF thus providing insight into BRD9 vulnerability in RTs. Underlying the dependency, while its bromodomain is dispensable, the DUF3512 domain of BRD9 is essential for SWI/SNF integrity in the absence of SMARCB1. Collectively, our results reveal a BRD9-containing SWI/SNF subcomplex is required for the survival of SMARCB1-mutant RTs.
2018
Lodato MA*, Rodin RE*, Bohrson CL*, Coulter ME*, Barton AR*, Kwon M*, Sherman MA, Vitzthum CM, Luquette LJ, Yandava C, Yang P, Chittenden TW, Hatem NE, Ryu SC, Woodworth MB, Park PJ**, Walsh CA**. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 2018;359(6375):555-559.Abstract
It has long been hypothesized that aging and neurodegeneration are associated with somatic mutation in neurons; however, methodological hurdles have prevented testing this hypothesis directly. We used single-cell whole-genome sequencing to perform genome-wide somatic single-nucleotide variant (sSNV) identification on DNA from 161 single neurons from the prefrontal cortex and hippocampus of fifteen normal individuals (aged 4 months to 82 years) as well as nine individuals affected by early-onset neurodegeneration due to genetic disorders of DNA repair (Cockayne syndrome and Xeroderma pigmentosum). sSNVs increased approximately linearly with age in both areas (with a higher rate in hippocampus) and were more abundant in neurodegenerative disease. The accumulation of somatic mutations with age-which we term genosenium-shows age-related, region-related, and disease-related molecular signatures, and may be important in other human age-associated conditions.
Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, Hwang J, Lee S, Alver BH, Pfister H, Mirny LA, Park PJ, Gehlenborg N. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 2018;19(1):125.Abstract
We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.
Fan J*, Lee HO*, Lee S, Ryu DE, Lee S, Xue C, Kim SJ, Kim K, Barkas N, Park PJ, Park WY, Karchenko PV. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Research 2018;28(8):1217-1227.Abstract
Characterization of intratumoral heterogeneity is critical to cancer therapy, as the presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss of heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct the underlying subclonal architecture. By examining several tumor types, we show that HoneyBADGER is effective at identifying deletions, amplifications, and copy-neutral loss-of-heterozygosity events and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure and were likely driven by alternative, nonclonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer.
Zhang L, Ettou S, Khalid M, Taglienti M, Jain D, Jung YL, Seager C, Liu Y, Ng KH, Park PJ, Kreidberg JA. EED, a member of the polycomb group, is required for nephron differentiation and the maintenance of nephron progenitor cells. Development 2018;145(14)Abstract
Epigenetic regulation of gene expression has a crucial role allowing for the self-renewal and differentiation of stem and progenitor populations during organogenesis. The mammalian kidney maintains a population of self-renewing stem cells that differentiate to give rise to thousands of nephrons, which are the functional units that carry out filtration to maintain physiological homeostasis. The polycomb repressive complex 2 (PRC2) epigenetically represses gene expression during development by placing the H3K27me3 mark on histone H3 at promoter and enhancer sites, resulting in gene silencing. To understand the role of PRC2 in nephron differentiation, we conditionally inactivated the Eed gene, which encodes a nonredundant component of the PRC2 complex, in nephron progenitor cells. Resultant kidneys were smaller and showed premature loss of progenitor cells. The progenitors in Eedmutant mice that were induced to differentiate did not develop into properly formed nephrons. Lhx1, normally expressed in the renal vesicle, was overexpressed in kidneys of Eed mutant mice. Thus, PRC2 has a crucial role in suppressing the expression of genes that maintain the progenitor state, allowing nephron differentiation to proceed.
Holm IA, Agrawal PB, Ceyhan-Birsoy O, Christensen KD, Fayer S, Frankel LA, Genetti CA, Krier JB, LaMay RC, Levy HL, McGuire AL, Parad RB, Park PJ, Pereira S, Rehm HL, Schwartz TS, Waisbren SE, Yu TW, Team BSP, Green RC, Beggs AH. The BabySeq project: implementing genomic sequencing in newborns. BMC Pediatrics 2018;18(1):225.Abstract

BACKGROUND:

The greatest opportunity for lifelong impact of genomic sequencing is during the newborn period. The "BabySeq Project" is a randomized trial that explores the medical, behavioral, and economic impacts of integrating genomic sequencing into the care of healthy and sick newborns.

METHODS:

Families of newborns are enrolled from Boston Children's Hospital and Brigham and Women's Hospital nurseries, and half are randomized to receive genomic sequencing and a report that includes monogenic disease variants, recessive carrier variants for childhood onset or actionable disorders, and pharmacogenomic variants. All families participate in a disclosure session, which includes the return of results for those in the sequencing arm. Outcomes are collected through review of medical records and surveys of parents and health care providers and include the rationale for choice of genes and variants to report; what genomic data adds to the medical management of sick and healthy babies; and the medical, behavioral, and economic impacts of integrating genomic sequencing into the care of healthy and sick newborns.

DISCUSSION:

The BabySeq Project will provide empirical data about the risks, benefits and costs of newborn genomic sequencing and will inform policy decisions related to universal genomic screening of newborns.

TRIAL REGISTRATION:

The study is registered in ClinicalTrials.gov Identifier: NCT02422511 . Registration date: 10 April 2015.

KEYWORDS:

Ethical, legal, social implications; Methods; Newborn screening; Newborn sequencing; Randomized trial; Whole exome sequencing

Zhang Y, Yang L, Kucherlapati M, Chen F, Hadjipanayis A, Pantazi A, Bristow CA, Lee EA, Mahadeshwar HS, Tang J, Zhang J, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette LJ, Xi R, Chin L, Protopopov A, Li W, Park PJ, Kucherlapati R, Creighton CJ. A Pan-Cancer Compendium of Genes Deregulated by Somatic Genomic Rearrangement across More Than 1,400 Cases. Cell Reports 2018;24(2):515-527.Abstract
A systematic cataloging of genes affected by genomic rearrangement, using multiple patient cohorts and cancer types, can provide insight into cancer-relevant alterations outside of exomes. By integrative analysis of whole-genome sequencing (predominantly low pass) and gene expression data from 1,448 cancers involving 18 histopathological types in The Cancer Genome Atlas, we identified hundreds of genes for which the nearby presence (within 100 kb) of a somatic structural variant (SV) breakpoint is associated with altered expression. While genomic rearrangements are associated with widespread copy-number alteration (CNA) patterns, approximately 1,100 genes-including overexpressed cancer driver genes (e.g., TERT, ERBB2, CDK12, CDK4) and underexpressed tumor suppressors (e.g., TP53, RB1, PTEN, STK11)-show SV-associated deregulation independent of CNA. SVs associated with the disruption of topologically associated domains, enhancer hijacking, or fusion transcripts are implicated in gene upregulation. For cancer-relevant pathways, SVs considerably expand our understanding of how genes are affected beyond point mutation or CNA.
Dou Y*, Gold HD*, Luquette LJ*, Park PJ. Detecting Somatic Mutations in Normal Cells. Trends in Genetics 2018;35(7):545-557.Abstract
Somatic mutations have been studied extensively in the context of cancer. Recent studies have demonstrated that high-throughput sequencing data can be used to detect somatic mutations in non-tumor cells. Analysis of such mutations allows us to better understand the mutational processes in normal cells, explore cell lineages in development, and examine potential associations with age-related disease. We describe here approaches for characterizing somatic mutations in normal and non-tumor disease tissues. We discuss several experimental designs and common pitfalls in somatic mutation detection, as well as more recent developments such as phasing and linked-read technology. With the dramatically increasing numbers of samples undergoing genome sequencing, bioinformatic analysis will enable the characterization of somatic mutations and their impact on non-cancer tissues.
Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, Ng PK, Jeong KJ, Cao S, Wang Z, Gao J, Gao Q, Wang F, Liu EM, Mularoni L, Rubio-Perez C, Nagarajan N, Cortes-Ciriano I, Zhou DC, Liang WW, Hess JM, Yellapantula VD, Tamborero D, Gonzalez-Perez A, Suphavilai C, Ko JY, Khurana E, Park PJ, Van Allen EM, Liang H, Group MC3 W, Group MC3 W, Lawrence MS, Godzik A, N. L-B, Stuart J, Wheeler D, Getz G, Chen K, Lazar AJ, Mills GB, Karchin R, Ding L. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 2018;173(2):371-385.Abstract
Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.
Sherman MA, Barton AR, Lodato MA, Vitzthum C, Coulter ME, Walsh CA, Park PJ. PaSD-qc: quality control for single cell whole-genome sequencing data using power spectral density estimation. Nucleic Acids Research 2018;46(4):e20.Abstract
Single cell whole-genome sequencing (scWGS) is providing novel insights into the nature of genetic heterogeneity in normal and diseased cells. However, the whole-genome amplification process required for scWGS introduces biases into the resulting sequencing that can confound downstream analysis. Here, we present a statistical method, with an accompanying package PaSD-qc (Power Spectral Density-qc), that evaluates the properties and quality of single cell libraries. It uses a modified power spectral density to assess amplification uniformity, amplicon size distribution, autocovariance and inter-sample consistency as well as to identify chromosomes with aberrant read-density profiles due either to copy alterations or poor amplification. These metrics provide a standard way to compare the quality of single cell samples as well as yield information necessary to improve variant calling strategies. We demonstrate the usefulness of this tool in comparing the properties of scWGS protocols, identifying potential chromosomal copy number variation, determining chromosomal and subchromosomal regions of poor amplification, and selecting high-quality libraries from low-coverage data for deep sequencing. The software is available free and open-source at https://github.com/parklab/PaSDqc.

Pages