Primary

Ganz J*, Luquette LJ* , Bizzotto S*, Miller MB, Zhou Z, Bohrson CL , Jin H, Tran AV , Viswanadham VV, McDonough G, Brown K, Chahine Y, Chhouk B, Galor A, Park PJ**, Walsh CA**. Contrasting somatic mutation patterns in aging human neurons and oligodendrocytes. Cell 2024;Abstract
Characterizing somatic mutations in the brain is important for disentangling the complex mechanisms of aging, yet little is known about mutational patterns in different brain cell types. Here, we performed whole-genome sequencing (WGS) of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals spanning 0.4–104 years of age and identified >92,000 somatic single-nucleotide variants (sSNVs) and small insertions/deletions (indels). Although both cell types accumulate somatic mutations linearly with age, oligodendrocytes accumulated sSNVs 81% faster than neurons and indels 28% slower than neurons. Correlation of mutations with single-nucleus RNA profiles and chromatin accessibility from the same brains revealed that oligodendrocyte mutations are enriched in inactive genomic regions and are distributed across the genome similarly to mutations in brain cancers. In contrast, neuronal mutations are enriched in open, transcriptionally active chromatin. These stark differences suggest an assortment of active mutagenic processes in oligodendrocytes and neurons.
Jin H, Gulhan DC, Geiger B, Ben-Isvy D, Geng D, Ljungström V , Park PJ. Accurate and sensitive mutational signature analysis with MuSiCal. Nature Genetics 2024;Abstract
Mutational signature analysis is a recent computational approach for interpreting somatic mutations in the genome. Its application to cancer data has enhanced our understanding of mutational forces driving tumorigenesis and demonstrated its potential to inform prognosis and treatment decisions. However, methodological challenges remain for discovering new signatures and assigning proper weights to existing signatures, thereby hindering broader clinical applications. Here we present Mutational Signature Calculator (MuSiCal), a rigorous analytical framework with algorithms that solve major problems in the standard workflow. Our simulation studies demonstrate that MuSiCal outperforms state-of-the-art algorithms for both signature discovery and assignment. By reanalyzing more than 2,700 cancer genomes, we provide an improved catalog of signatures and their assignments, discover nine indel signatures absent in the current catalog, resolve long-standing issues with the ambiguous ‘flat’ signatures and give insights into signatures with unknown etiologies. We expect MuSiCal and the improved catalog to be a step towards establishing best practices for mutational signature analysis.
Gao T, Kastriti ME, Ljungström V, Heinzel A, Tischler AS, Oberbauer R, Loh P-R, Adameyko I, Park PJ**, Kharchenko P**. A pan-tissue survey of mosaic chromosomal alterations in 948 individuals. Nature Genetics 2023;Abstract
Genetic mutations accumulate in an organism’s body throughout its lifetime. While somatic single-nucleotide variants have been well characterized in the human body, the patterns and consequences of large chromosomal alterations in normal tissues remain largely unknown. Here, we present a pan-tissue survey of mosaic chromosomal alterations (mCAs) in 948 healthy individuals from the Genotype-Tissue Expression project, augmenting RNA-based allelic imbalance estimation with haplotype phasing. We found that approximately a quarter of the individuals carry a clonally-expanded mCA in at least one tissue, with incidence strongly correlated with age. The prevalence and genome-wide patterns of mCAs vary considerably across tissue types, suggesting tissue-specific mutagenic exposure and selection pressures. The mCA landscapes in normal adrenal and pituitary glands resemble those in tumors arising from these tissues, whereas the same is not true for the esophagus and skin. Together, our findings show a widespread age-dependent emergence of mCAs across normal human tissues with intricate connections to tumorigenesis.
Chu C, Lin EW, Tran A, Jin H, Ho NI, Veit A, Cortes-Ciriano I, Burns KH, Ting DT, Park PJ. The landscape of human SVA retrotransposons. Nucleic Acids Research 2023;Abstract
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Kim J, Woo S, de Gusmao CM, Zhao B, Chin DH, DiDonato RL, Nguyen MA, Nakayama T, Hu CA, Soucy A, Kuniholm A, Thornton JK, Riccardi O, Friedman DA, Moufawad El Achkar C, Dash Z, Cornelissen L, Donado C, Faour KNW, Bush LW, Suslovitch V, Lentucci C, Park PJ, Lee EA, Patterson A, Philippakis AA, Margus B, Berde CB, Yu TW. A framework for individualized splice-switching oligonucleotide therapy. Nature 2023;619:828-836.Abstract
Splice-switching antisense oligonucleotides (ASOs) could be used to treat a subset of individuals with genetic diseases1, but the systematic identification of such individuals remains a challenge. Here we performed whole-genome sequencing analyses to characterize genetic variation in 235 individuals (from 209 families) with ataxia-telangiectasia, a severely debilitating and life-threatening recessive genetic disorder2,3, yielding a complete molecular diagnosis in almost all individuals. We developed a predictive taxonomy to assess the amenability of each individual to splice-switching ASO intervention; 9% and 6% of the individuals had variants that were ‘probably’ or ‘possibly’ amenable to ASO splice modulation, respectively. Most amenable variants were in deep intronic regions that are inaccessible to exon-targeted sequencing. We developed ASOs that successfully rescued mis-splicing and ATM cellular signalling in patient fibroblasts for two recurrent variants. In a pilot clinical study, one of these ASOs was used to treat a child who had been diagnosed with ataxia-telangiectasia soon after birth, and showed good tolerability without serious adverse events for three years. Our study provides a framework for the prospective identification of individuals with genetic diseases who might benefit from a therapeutic approach involving splice-switching ASOs.
Lee JJ-K, Jung YL, Cheong T-C, Valle-Inclan JE, Chong C, Gulhan DC, Ljungström V, Jin H, Viswanadham VV, Watson EV, Cortés-Ciriano I, Elledge SJ, Chiarle R, Pellman D, Park PJ. ERα-associated translocations underlie oncogene amplifications in breast cancer [Internet]. Nature 2023; Harvard Medical School NewsAbstract

Focal copy-number amplification is an oncogenic event. Although recent studies have revealed the complex structure1,2,3 and the evolutionary trajectories4 of oncogene amplicons, their origin remains poorly understood. Here we show that focal amplifications in breast cancer frequently derive from a mechanism—which we term translocation–bridge amplification—involving inter-chromosomal translocations that lead to dicentric chromosome bridge formation and breakage. In 780 breast cancer genomes, we observe that focal amplifications are frequently connected to each other by inter-chromosomal translocations at their boundaries. Subsequent analysis indicates the following model: the oncogene neighbourhood is translocated in G1 creating a dicentric chromosome, the dicentric chromosome is replicated, and as dicentric sister chromosomes segregate during mitosis, a chromosome bridge is formed and then broken, with fragments often being circularized in extrachromosomal DNAs. This model explains the amplifications of key oncogenes, including ERBB2 and CCND1. Recurrent amplification boundaries and rearrangement hotspots correlate with oestrogen receptor binding in breast cancer cells. Experimentally, oestrogen treatment induces DNA double-strand breaks in the oestrogen receptor target regions that are repaired by translocations, suggesting a role of oestrogen in generating the initial translocations. A pan-cancer analysis reveals tissue-specific biases in mechanisms initiating focal amplifications, with the breakage–fusion–bridge cycle prevalent in some and the translocation–bridge amplification in others, probably owing to the different timing of DNA break repair. Our results identify a common mode of oncogene amplification and propose oestrogen as its mechanistic origin in breast cancer.

News coverage on this paper:

Cortes-Ciriano I, Steele CD, Piculell K, Al-Ibraheemi A, Eulo V, Bui MM, Chatzipli A, Dickson BC, Borcherding DC, Feber A, Galor A, , Jones KB, Jordan JT, Kim RH, Lindsay D, Miller C, Nishida Y, Proszek PZ, Serrano J, Sundby TR, Szymanski JJ, Ullrich NJ, Viskochil D, Wang X, Snuderl M, Park PJ, Flanagan AM, Hirbe AC, Pillay N, Miller DT. Genomic patterns of malignant peripheral nerve sheath tumor (MPNST) evolution correlate with clinical outcome and are detectable in cell-free DNA. Cancer Discovery 2023;13(3):654-671.Abstract

Malignant peripheral nerve sheath tumor (MPNST), an aggressive soft-tissue sarcoma, occurs in people with neurofibromatosis type 1 (NF1) and sporadically. Whole-genome and multiregional exome sequencing, transcriptomic, and methylation profiling of 95 tumor samples revealed the order of genomic events in tumor evolution. Following biallelic inactivation of NF1, loss of CDKN2A or TP53 with or without inactivation of polycomb repressive complex 2 (PRC2) leads to extensive somatic copy-number aberrations (SCNA). Distinct pathways of tumor evolution are associated with inactivation of PRC2 genes and H3K27 trimethylation (H3K27me3) status. Tumors with H3K27me3 loss evolve through extensive chromosomal losses followed by whole-genome doubling and chromosome 8 amplification, and show lower levels of immune cell infiltration. Retention of H3K27me3 leads to extensive genomic instability, but an immune cell-rich phenotype. Specific SCNAs detected in both tumor samples and cell-free DNA (cfDNA) act as a surrogate for H3K27me3 loss and immune infiltration, and predict prognosis.

Significance:

MPNST is the most common cause of death and morbidity for individuals with NF1, a relatively common tumor predisposition syndrome. Our results suggest that somatic copy-number and methylation profiling of tumor or cfDNA could serve as a biomarker for early diagnosis and to stratify patients into prognostic and treatment-related subgroups.

Batalini F, Gulhan DC, Mao V, Tran A, Polak M, Xiong N, Tayob N, Tung NM, Winer EP, Mayer EL, Knappskog S, Lønning PE, Matulonis UA, Konstantinopoulos PA, Solit DB, Won H, Eikesdal HP, Park PJ, Wulf GM. Mutational Signature 3 Detected from Clinical Panel Sequencing is Associated with Responses to Olaparib in Breast and Ovarian Cancers. Clinical Cancer Research 2022;28(21):4714-4723.Abstract

Purpose: The identification of patients with homologous recombination deficiency (HRD) beyond BRCA1/2 mutations is an urgent task, as they may benefit from PARP inhibitors. We have previously developed a method to detect mutational signature 3 (Sig3), termed SigMA, associated with HRD from clinical panel sequencing data, that is able to reliably detect HRD from the limited sequencing data derived from gene-focused panel sequencing.

Experimental design: We apply this method to patients from two independent datasets: (i) high-grade serous ovarian cancer and triple-negative breast cancer (TNBC) from a phase Ib trial of the PARP inhibitor olaparib in combination with the PI3K inhibitor buparlisib (BKM120; NCT01623349), and (ii) TNBC patients who received neoadjuvant olaparib in the phase II PETREMAC trial (NCT02624973).

Results: We find that Sig3 as detected by SigMA is positively associated with improved progression-free survival and objective responses. In addition, comparison of Sig3 detection in panel and exome-sequencing data from the same patient samples demonstrated highly concordant results and superior performance in comparison with the genomic instability score.

Conclusions: Our analyses demonstrate that HRD can be detected reliably from panel-sequencing data that are obtained as part of routine clinical care, and that this approach can identify patients beyond those with germline BRCA1/2mut who might benefit from PARP inhibitors. Prospective clinical utility testing is warranted.

Luquette LJ*, Miller MB*, Zhou Z*, Bohrson CL, Zhao Y, Jin H, Gulhan D, Ganz J, Bizzotto S, Kirkham S, Hochepied T, Libert C, Galor A, Kim J, Lodato MA, Garaycoechea JI, Gawad C, West J, Walsh CA**, Park PJ**. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nature Genetics 2022;54:1564-1571.Abstract
Accurate somatic mutation detection from single-cell DNA sequencing is challenging due to amplification-related artifacts. To reduce this artifact burden, an improved amplification technique, primary template-directed amplification (PTA), was recently introduced. We analyzed whole-genome sequencing data from 52 PTA-amplified single neurons using SCAN2, a new genotyper we developed to leverage mutation signatures and allele balance in identifying somatic single-nucleotide variants (SNVs) and small insertions and deletions (indels) in PTA data. Our analysis confirms an increase in nonclonal somatic mutation in single neurons with age, but revises the estimated rate of this accumulation to 16 SNVs per year. We also identify artifacts in other amplification methods. Most importantly, we show that somatic indels increase by at least three per year per neuron and are enriched in functional regions of the genome such as enhancers and promoters. Our data suggest that indels in gene-regulatory elements have a considerable effect on genome integrity in human neurons.
Lee S, Bakker C, Vitzthum C, Alver BH, Park PJ*. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics 2022;Abstract
SUMMARY: As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files. AVAILABILITY: The format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/pairsqc.
Reiff SB, Schroeder AJ, Kırlı K, Cosolo A, Bakker C, Mercado L, Lee S, Veit AD, Balashov AK, Vitzthum C, Ronchetti W, Pitman KM, Johnson J, Ehmsen SR, Kerpedjiev P, Abdennur N, Imakaev M, Öztürk SU, Çamoğlu U, Mirny LA, Gehlenborg N*, Alver BH*, Park PJ*. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nature Communications 2022;13(1):2365.Abstract
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Cortés-Ciriano I, Gulhan DC, Lee JJ-K, Melloni GEM, Park PJ*. Computational analysis of cancer genome sequencing data. Nature Reviews Genetics 2022;23(5):298-314.Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Chu C, Borges-Monroy R, Viswanadham VV, Lee S, Li H, Lee EA**, Park PJ**. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun 2021;12(1):3836.Abstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .

Pages