Primary

2024
Ganz J, Luquette LJ , Bizzotto S, Miller MB, Zhou Z, Bohrson CL , Jin H, Tran AV , Viswanadham VV, McDonough G, Brown K, Chahine Y, Chhouk B, Galor A, Park PJ, Walsh CA. Contrasting somatic mutation patterns in aging human neurons and oligodendrocytes. Cell 2024;Abstract
Characterizing somatic mutations in the brain is important for disentangling the complex mechanisms of aging, yet little is known about mutational patterns in different brain cell types. Here, we performed whole-genome sequencing (WGS) of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals spanning 0.4–104 years of age and identified >92,000 somatic single-nucleotide variants (sSNVs) and small insertions/deletions (indels). Although both cell types accumulate somatic mutations linearly with age, oligodendrocytes accumulated sSNVs 81% faster than neurons and indels 28% slower than neurons. Correlation of mutations with single-nucleus RNA profiles and chromatin accessibility from the same brains revealed that oligodendrocyte mutations are enriched in inactive genomic regions and are distributed across the genome similarly to mutations in brain cancers. In contrast, neuronal mutations are enriched in open, transcriptionally active chromatin. These stark differences suggest an assortment of active mutagenic processes in oligodendrocytes and neurons.
pdf
Jin H, Gulhan DC, Geiger B, Ben-Isvy D, Geng D, Ljungström V , Park PJ. Accurate and sensitive mutational signature analysis with MuSiCal. Nature Genetics 2024;Abstract
Mutational signature analysis is a recent computational approach for interpreting somatic mutations in the genome. Its application to cancer data has enhanced our understanding of mutational forces driving tumorigenesis and demonstrated its potential to inform prognosis and treatment decisions. However, methodological challenges remain for discovering new signatures and assigning proper weights to existing signatures, thereby hindering broader clinical applications. Here we present Mutational Signature Calculator (MuSiCal), a rigorous analytical framework with algorithms that solve major problems in the standard workflow. Our simulation studies demonstrate that MuSiCal outperforms state-of-the-art algorithms for both signature discovery and assignment. By reanalyzing more than 2,700 cancer genomes, we provide an improved catalog of signatures and their assignments, discover nine indel signatures absent in the current catalog, resolve long-standing issues with the ambiguous ‘flat’ signatures and give insights into signatures with unknown etiologies. We expect MuSiCal and the improved catalog to be a step towards establishing best practices for mutational signature analysis.
pdf
2023
Gao T, Kastriti ME, Ljungström V, Heinzel A, Tischler AS, Oberbauer R, Loh P-R, Adameyko I, Park PJ**, Kharchenko P**. A pan-tissue survey of mosaic chromosomal alterations in 948 individuals. Nature Genetics 2023;Abstract
Genetic mutations accumulate in an organism’s body throughout its lifetime. While somatic single-nucleotide variants have been well characterized in the human body, the patterns and consequences of large chromosomal alterations in normal tissues remain largely unknown. Here, we present a pan-tissue survey of mosaic chromosomal alterations (mCAs) in 948 healthy individuals from the Genotype-Tissue Expression project, augmenting RNA-based allelic imbalance estimation with haplotype phasing. We found that approximately a quarter of the individuals carry a clonally-expanded mCA in at least one tissue, with incidence strongly correlated with age. The prevalence and genome-wide patterns of mCAs vary considerably across tissue types, suggesting tissue-specific mutagenic exposure and selection pressures. The mCA landscapes in normal adrenal and pituitary glands resemble those in tumors arising from these tissues, whereas the same is not true for the esophagus and skin. Together, our findings show a widespread age-dependent emergence of mCAs across normal human tissues with intricate connections to tumorigenesis.
L Yi S, Maziec D, Stevens V, Manz T, Veit A, Berselli M, Park PJ**, Głodzik D**, Gehlenborg N**. Chromoscope: interactive multiscale visualization for structural variation in human genomes. Nature Methods 2023;
Chu C, Lin EW, Tran A, Jin H, Ho NI, Veit A, Cortes-Ciriano I, Burns KH, Ting DT, Park PJ. The landscape of human SVA retrotransposons. Nucleic Acids Research 2023;Abstract
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
pdf
Kim J, Woo S, de Gusmao CM, Zhao B, Chin DH, DiDonato RL, Nguyen MA, Nakayama T, Hu CA, Soucy A, Kuniholm A, Thornton JK, Riccardi O, Friedman DA, Moufawad El Achkar C, Dash Z, Cornelissen L, Donado C, Faour KNW, Bush LW, Suslovitch V, Lentucci C, Park PJ, Lee EA, Patterson A, Philippakis AA, Margus B, Berde CB, Yu TW. A framework for individualized splice-switching oligonucleotide therapy. Nature 2023;619:828-836.Abstract
Splice-switching antisense oligonucleotides (ASOs) could be used to treat a subset of individuals with genetic diseases1, but the systematic identification of such individuals remains a challenge. Here we performed whole-genome sequencing analyses to characterize genetic variation in 235 individuals (from 209 families) with ataxia-telangiectasia, a severely debilitating and life-threatening recessive genetic disorder2,3, yielding a complete molecular diagnosis in almost all individuals. We developed a predictive taxonomy to assess the amenability of each individual to splice-switching ASO intervention; 9% and 6% of the individuals had variants that were ‘probably’ or ‘possibly’ amenable to ASO splice modulation, respectively. Most amenable variants were in deep intronic regions that are inaccessible to exon-targeted sequencing. We developed ASOs that successfully rescued mis-splicing and ATM cellular signalling in patient fibroblasts for two recurrent variants. In a pilot clinical study, one of these ASOs was used to treat a child who had been diagnosed with ataxia-telangiectasia soon after birth, and showed good tolerability without serious adverse events for three years. Our study provides a framework for the prospective identification of individuals with genetic diseases who might benefit from a therapeutic approach involving splice-switching ASOs.
pdf
Lee JJ-K, Jung YL, Cheong T-C, Valle-Inclan JE, Chong C, Gulhan DC, Ljungström V, Jin H, Viswanadham VV, Watson EV, Cortés-Ciriano I, Elledge SJ, Chiarle R, Pellman D, Park PJ. ERα-associated translocations underlie oncogene amplifications in breast cancer [Internet]. Nature 2023; Harvard Medical School NewsAbstract

Focal copy-number amplification is an oncogenic event. Although recent studies have revealed the complex structure1,2,3 and the evolutionary trajectories4 of oncogene amplicons, their origin remains poorly understood. Here we show that focal amplifications in breast cancer frequently derive from a mechanism—which we term translocation–bridge amplification—involving inter-chromosomal translocations that lead to dicentric chromosome bridge formation and breakage. In 780 breast cancer genomes, we observe that focal amplifications are frequently connected to each other by inter-chromosomal translocations at their boundaries. Subsequent analysis indicates the following model: the oncogene neighbourhood is translocated in G1 creating a dicentric chromosome, the dicentric chromosome is replicated, and as dicentric sister chromosomes segregate during mitosis, a chromosome bridge is formed and then broken, with fragments often being circularized in extrachromosomal DNAs. This model explains the amplifications of key oncogenes, including ERBB2 and CCND1. Recurrent amplification boundaries and rearrangement hotspots correlate with oestrogen receptor binding in breast cancer cells. Experimentally, oestrogen treatment induces DNA double-strand breaks in the oestrogen receptor target regions that are repaired by translocations, suggesting a role of oestrogen in generating the initial translocations. A pan-cancer analysis reveals tissue-specific biases in mechanisms initiating focal amplifications, with the breakage–fusion–bridge cycle prevalent in some and the translocation–bridge amplification in others, probably owing to the different timing of DNA break repair. Our results identify a common mode of oncogene amplification and propose oestrogen as its mechanistic origin in breast cancer.

News coverage on this paper:

pdf
Cortes-Ciriano I, Steele CD, Piculell K, Al-Ibraheemi A, Eulo V, Bui MM, Chatzipli A, Dickson BC, Borcherding DC, Feber A, Galor A, , Jones KB, Jordan JT, Kim RH, Lindsay D, Miller C, Nishida Y, Proszek PZ, Serrano J, Sundby TR, Szymanski JJ, Ullrich NJ, Viskochil D, Wang X, Snuderl M, Park PJ, Flanagan AM, Hirbe AC, Pillay N, Miller DT. Genomic patterns of malignant peripheral nerve sheath tumor (MPNST) evolution correlate with clinical outcome and are detectable in cell-free DNA. Cancer Discovery 2023;13(3):654-671.Abstract

Malignant peripheral nerve sheath tumor (MPNST), an aggressive soft-tissue sarcoma, occurs in people with neurofibromatosis type 1 (NF1) and sporadically. Whole-genome and multiregional exome sequencing, transcriptomic, and methylation profiling of 95 tumor samples revealed the order of genomic events in tumor evolution. Following biallelic inactivation of NF1, loss of CDKN2A or TP53 with or without inactivation of polycomb repressive complex 2 (PRC2) leads to extensive somatic copy-number aberrations (SCNA). Distinct pathways of tumor evolution are associated with inactivation of PRC2 genes and H3K27 trimethylation (H3K27me3) status. Tumors with H3K27me3 loss evolve through extensive chromosomal losses followed by whole-genome doubling and chromosome 8 amplification, and show lower levels of immune cell infiltration. Retention of H3K27me3 leads to extensive genomic instability, but an immune cell-rich phenotype. Specific SCNAs detected in both tumor samples and cell-free DNA (cfDNA) act as a surrogate for H3K27me3 loss and immune infiltration, and predict prognosis.

Significance:

MPNST is the most common cause of death and morbidity for individuals with NF1, a relatively common tumor predisposition syndrome. Our results suggest that somatic copy-number and methylation profiling of tumor or cfDNA could serve as a biomarker for early diagnosis and to stratify patients into prognostic and treatment-related subgroups.

pdf
2022
Batalini F, Gulhan DC, Mao V, Tran A, Polak M, Xiong N, Tayob N, Tung NM, Winer EP, Mayer EL, Knappskog S, Lønning PE, Matulonis UA, Konstantinopoulos PA, Solit DB, Won H, Eikesdal HP, Park PJ, Wulf GM. Mutational Signature 3 Detected from Clinical Panel Sequencing is Associated with Responses to Olaparib in Breast and Ovarian Cancers. Clinical Cancer Research 2022;28(21):4714-4723.Abstract

Purpose: The identification of patients with homologous recombination deficiency (HRD) beyond BRCA1/2 mutations is an urgent task, as they may benefit from PARP inhibitors. We have previously developed a method to detect mutational signature 3 (Sig3), termed SigMA, associated with HRD from clinical panel sequencing data, that is able to reliably detect HRD from the limited sequencing data derived from gene-focused panel sequencing.

Experimental design: We apply this method to patients from two independent datasets: (i) high-grade serous ovarian cancer and triple-negative breast cancer (TNBC) from a phase Ib trial of the PARP inhibitor olaparib in combination with the PI3K inhibitor buparlisib (BKM120; NCT01623349), and (ii) TNBC patients who received neoadjuvant olaparib in the phase II PETREMAC trial (NCT02624973).

Results: We find that Sig3 as detected by SigMA is positively associated with improved progression-free survival and objective responses. In addition, comparison of Sig3 detection in panel and exome-sequencing data from the same patient samples demonstrated highly concordant results and superior performance in comparison with the genomic instability score.

Conclusions: Our analyses demonstrate that HRD can be detected reliably from panel-sequencing data that are obtained as part of routine clinical care, and that this approach can identify patients beyond those with germline BRCA1/2mut who might benefit from PARP inhibitors. Prospective clinical utility testing is warranted.

pdf
Luquette LJ, Miller MB, Zhou Z, Bohrson CL, Zhao Y, Jin H, Gulhan D, Ganz J, Bizzotto S, Kirkham S, Hochepied T, Libert C, Galor A, Kim J, Lodato MA, Garaycoechea JI, Gawad C, West J, Walsh CA, Park PJ. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nature Genetics 2022;54:1564-1571.Abstract
Accurate somatic mutation detection from single-cell DNA sequencing is challenging due to amplification-related artifacts. To reduce this artifact burden, an improved amplification technique, primary template-directed amplification (PTA), was recently introduced. We analyzed whole-genome sequencing data from 52 PTA-amplified single neurons using SCAN2, a new genotyper we developed to leverage mutation signatures and allele balance in identifying somatic single-nucleotide variants (SNVs) and small insertions and deletions (indels) in PTA data. Our analysis confirms an increase in nonclonal somatic mutation in single neurons with age, but revises the estimated rate of this accumulation to 16 SNVs per year. We also identify artifacts in other amplification methods. Most importantly, we show that somatic indels increase by at least three per year per neuron and are enriched in functional regions of the genome such as enhancers and promoters. Our data suggest that indels in gene-regulatory elements have a considerable effect on genome integrity in human neurons.
pdf
Reiff SB, Schroeder AJ, Kırlı K, Cosolo A, Bakker C, Mercado L, Lee S, Veit AD, Balashov AK, Vitzthum C, Ronchetti W, Pitman KM, Johnson J, Ehmsen SR, Kerpedjiev P, Abdennur N, Imakaev M, Öztürk SU, Çamoğlu U, Mirny LA, Gehlenborg N*, Alver BH*, Park PJ*. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nature Communications 2022;13(1):2365.Abstract
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
pdf
Cortés-Ciriano I, Gulhan DC, Lee JJ-K, Melloni GEM, Park PJ*. Computational analysis of cancer genome sequencing data. Nature Reviews Genetics 2022;23(5):298-314.Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Lee S, Bakker C, Vitzthum C, Alver BH, Park PJ*. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics 2022;Abstract
SUMMARY: As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files. AVAILABILITY: The format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/pairsqc.
pdf
2021
Chu C, Borges-Monroy R, Viswanadham VV, Lee S, Li H, Lee EA**, Park PJ**. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun 2021;12(1):3836.Abstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
pdf
Luquette LJ, Park PJ. Somatic mutation accumulation seen through a single-molecule lens. Cell Res 2021;31(9):949-950. pdf
Cook JH*, Melloni GEM*, Gulhan DC, Park PJ**, Haigis KM**. The origins and genetic interactions of KRAS mutations are allele- and tissue-specific. Nature Communications 2021;12(1808)Abstract
Mutational activation of KRAS promotes the initiation and progression of cancers, especially in the colorectum, pancreas, lung, and blood plasma, with varying prevalence of specific activating missense mutations. Although epidemiological studies connect specific alleles to clinical outcomes, the mechanisms underlying the distinct clinical characteristics of mutant KRAS alleles are unclear. Here, we analyze 13,492 samples from these four tumor types to examine allele- and tissue-specific genetic properties associated with oncogenic KRAS mutations. The prevalence of known mutagenic mechanisms partially explains the observed spectrum of KRAS activating mutations. However, there are substantial differences between the observed and predicted frequencies for many alleles, suggesting that biological selection underlies the tissue-specific frequencies of mutant alleles. Consistent with experimental studies that have identified distinct signaling properties associated with each mutant form of KRAS, our genetic analysis reveals that each KRAS allele is associated with a distinct tissuespecific comutation network. Moreover, we identify tissue-specific genetic dependencies associated with specific mutant KRAS alleles. Overall, this analysis demonstrates that the genetic interactions of oncogenic KRAS mutations are allele- and tissue-specific, underscoring the complexity that drives their clinical consequences.
pdf
Bizzotto S*, Dou Y*, Ganz J*, Doan RN, Kwon M, Bohrson CL, Kim SN, Bae T, Abyzov A, Network NIMHBSM, Park PJ**, Walsh CA**. Landmarks of human embryonic development inscribed in somatic mutations. Science 2021;371(6535):1249-1253.Abstract
Although cell lineage information is fundamental to understanding organismal development, very little direct information is available for humans. We performed high-depth (250×) whole-genome sequencing of multiple tissues from three individuals to identify hundreds of somatic single-nucleotide variants (sSNVs). Using these variants as "endogenous barcodes" in single cells, we reconstructed early embryonic cell divisions. Targeted sequencing of clonal sSNVs in different organs (about 25,000×) and in more than 1000 cortical single cells, as well as single-nucleus RNA sequencing and single-nucleus assay for transposase-accessible chromatin sequencing of ~100,000 cortical single cells, demonstrated asymmetric contributions of early progenitors to extraembryonic tissues, distinct germ layers, and organs. Our data suggest onset of gastrulation at an effective progenitor pool of about 170 cells and about 50 to 100 founders for the forebrain. Thus, mosaic mutations provide a permanent record of human embryonic development at very high resolution.
pdf
Kwon M, Lee S, Berselli M, Chu C, Park PJ. BamSnap: a lightweight viewer for sequencing reads in BAM files. Bioinformatics 2021;37(2):263-4.Abstract
SUMMARY: Despite the improvement in variant detection algorithms, visual inspection of the read-level data remains an essential step for accurate identification of variants in genome analysis. We developed BamSnap, an efficient BAM file viewer utilizing a graphics library and BAM indexing. In contrast to existing viewers, BamSnap can generate high-quality snapshots rapidly, with customized tracks and layout. As an example, we produced read-level images at 1000 genomic loci for >2500 whole-genomes. AVAILABILITY: BamSnap is freely available at https://github.com/parklab/bamsnap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
pdf
Jain D, Chu C, Alver BH, Lee S, Lee EA, Park PJ*. HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data. Bioinformatics 2021;37(8):1045-1051.Abstract
Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. AVAILABILITY AND IMPLEMENTATION: HiTea is available at https://github.com/parklab/HiTea and as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
pdf
Rodin RE*, Dou Y*, Kwon M, Sherman MA, D'Gama AM, Doan RN, Rento LM, Girskis KM, Bohrson CL, Kim SN, Nadig A, Luquette LJ, Gulhan DC, Brain Somatic Mosaicism Network BSM, Park PJ**, Walsh CA**. The landscape of somatic mutation in cerebral cortex of autistic and neurotypical individuals revealed by ultra-deep whole-genome sequencing. Nat Neurosci 2021;24(2):176-185.Abstract
We characterize the landscape of somatic mutations-mutations occurring after fertilization-in the human brain using ultra-deep (~250×) whole-genome sequencing of prefrontal cortex from 59 donors with autism spectrum disorder (ASD) and 15 control donors. We observe a mean of 26 somatic single-nucleotide variants per brain present in ≥4% of cells, with enrichment of mutations in coding and putative regulatory regions. Our analysis reveals that the first cell division after fertilization produces ~3.4 mutations, followed by 2-3 mutations in subsequent generations. This suggests that a typical individual possesses ~80 somatic single-nucleotide variants present in ≥2% of cells-comparable to the number of de novo germline mutations per generation-with about half of individuals having at least one potentially function-altering somatic mutation somewhere in the cortex. ASD brains show an excess of somatic mutations in neural enhancer sequences compared with controls, suggesting that mosaic enhancer mutations may contribute to ASD risk.
pdf

Pages