Publications by Year: 2022

Batalini F, Gulhan DC, Mao V, Tran A, Polak M, Xiong N, Tayob N, Tung NM, Winer EP, Mayer EL, Knappskog S, Lønning PE, Matulonis UA, Konstantinopoulos PA, Solit DB, Won H, Eikesdal HP, Park PJ, Wulf GM. Mutational Signature 3 Detected from Clinical Panel Sequencing is Associated with Responses to Olaparib in Breast and Ovarian Cancers. Clinical Cancer Research 2022;28(21):4714-4723.Abstract

Purpose: The identification of patients with homologous recombination deficiency (HRD) beyond BRCA1/2 mutations is an urgent task, as they may benefit from PARP inhibitors. We have previously developed a method to detect mutational signature 3 (Sig3), termed SigMA, associated with HRD from clinical panel sequencing data, that is able to reliably detect HRD from the limited sequencing data derived from gene-focused panel sequencing.

Experimental design: We apply this method to patients from two independent datasets: (i) high-grade serous ovarian cancer and triple-negative breast cancer (TNBC) from a phase Ib trial of the PARP inhibitor olaparib in combination with the PI3K inhibitor buparlisib (BKM120; NCT01623349), and (ii) TNBC patients who received neoadjuvant olaparib in the phase II PETREMAC trial (NCT02624973).

Results: We find that Sig3 as detected by SigMA is positively associated with improved progression-free survival and objective responses. In addition, comparison of Sig3 detection in panel and exome-sequencing data from the same patient samples demonstrated highly concordant results and superior performance in comparison with the genomic instability score.

Conclusions: Our analyses demonstrate that HRD can be detected reliably from panel-sequencing data that are obtained as part of routine clinical care, and that this approach can identify patients beyond those with germline BRCA1/2mut who might benefit from PARP inhibitors. Prospective clinical utility testing is warranted.

Luquette LJ, Miller MB, Zhou Z, Bohrson CL, Zhao Y, Jin H, Gulhan D, Ganz J, Bizzotto S, Kirkham S, Hochepied T, Libert C, Galor A, Kim J, Lodato MA, Garaycoechea JI, Gawad C, West J, Walsh CA, Park PJ. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nature Genetics 2022;54:1564-1571.Abstract
Accurate somatic mutation detection from single-cell DNA sequencing is challenging due to amplification-related artifacts. To reduce this artifact burden, an improved amplification technique, primary template-directed amplification (PTA), was recently introduced. We analyzed whole-genome sequencing data from 52 PTA-amplified single neurons using SCAN2, a new genotyper we developed to leverage mutation signatures and allele balance in identifying somatic single-nucleotide variants (SNVs) and small insertions and deletions (indels) in PTA data. Our analysis confirms an increase in nonclonal somatic mutation in single neurons with age, but revises the estimated rate of this accumulation to 16 SNVs per year. We also identify artifacts in other amplification methods. Most importantly, we show that somatic indels increase by at least three per year per neuron and are enriched in functional regions of the genome such as enhancers and promoters. Our data suggest that indels in gene-regulatory elements have a considerable effect on genome integrity in human neurons.
Bae T, Fasching L, Wang Y, Shin JH, Suvakov M, Jang Y, Norton S, Dias C, Mariani J, Jourdon A, Wu F, Panda A, Pattni R, Chahine Y, Yeh R, Roberts RC, Huttner A, Kleinman JE, Hyde TM, Straub RE, Walsh CA, Brain Somatic Mosaicism Network BSM, Urban AE, Leckman JF, Weinberger DR, Vaccarino FM, Abyzov A. Analysis of somatic mutations in 131 human brains reveals aging-associated hypermutability. Science 2022;377(6605):511-517.Abstract
We analyzed 131 human brains (44 neurotypical, 19 with Tourette syndrome, 9 with schizophrenia, and 59 with autism) for somatic mutations after whole genome sequencing to a depth of more than 200×. Typically, brains had 20 to 60 detectable single-nucleotide mutations, but ~6% of brains harbored hundreds of somatic mutations. Hypermutability was associated with age and damaging mutations in genes implicated in cancers and, in some brains, reflected in vivo clonal expansions. Somatic duplications, likely arising during development, were found in ~5% of normal and diseased brains, reflecting background mutagenesis. Brains with autism were associated with mutations creating putative transcription factor binding motifs in enhancer-like regions in the developing brain. The top-ranked affected motifs corresponded to MEIS (myeloid ectopic viral integration site) transcription factors, suggesting a potential link between their involvement in gene regulation and autism.
Reiff SB, Schroeder AJ, Kırlı K, Cosolo A, Bakker C, Mercado L, Lee S, Veit AD, Balashov AK, Vitzthum C, Ronchetti W, Pitman KM, Johnson J, Ehmsen SR, Kerpedjiev P, Abdennur N, Imakaev M, Öztürk SU, Çamoğlu U, Mirny LA, Gehlenborg N*, Alver BH*, Park PJ*. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nature Communications 2022;13(1):2365.Abstract
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Cortés-Ciriano I, Gulhan DC, Lee JJ-K, Melloni GEM, Park PJ*. Computational analysis of cancer genome sequencing data. Nature Reviews Genetics 2022;23(5):298-314.Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Lee S, Bakker C, Vitzthum C, Alver BH, Park PJ*. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics 2022;Abstract
SUMMARY: As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files. AVAILABILITY: The format specification and source code are available at, and
Rajurkar M, Parikh AR, Solovyov A, You E, Kulkarni AS, Chu C, Xu KH, Jaicks C, Taylor MS, Wu C, Alexander KA, Good CR, Szabolcs A, Gerstberger S, Tran AV, Xu N, Ebright RY, Van Seventer EE, Vo KD, Tai EC, Lu C, Joseph-Chazan J, Raabe MJ, Nieman LT, Desai N, Arora KS, Ligorio M, Thapar V, Cohen L, Garden PM, Senussi Y, Zheng H, Allen JN, Blaszkowsky LS, Clark JW, Goyal L, Wo JY, Ryan DP, Corcoran RB, Deshpande V, Rivera MN, Aryee MJ, Hong TS, Berger SL, Walt DR, Burns KH, Park PJ, Greenbaum BD, Ting DT. Reverse Transcriptase Inhibition Disrupts Repeat Element Life Cycle in Colorectal Cancer. Cancer Discov 2022;Abstract
Altered RNA expression of repetitive sequences and retrotransposition are frequently seen in colorectal cancer (CRC) implicating a functional importance of repeat activity in cancer progression. We show the nucleoside reverse transcriptase inhibitor 3TC targets activities of these repeat elements in CRC pre-clinical models with a preferential effect in P53 mutant cell lines linked with direct binding of P53 to repeat elements. We translate these findings to a human Phase 2 trial of single agent 3TC treatment in metastatic CRC with demonstration of clinical benefit in 9 of 32 patients. Analysis of 3TC effects on CRC tumorspheres demonstrates accumulation of immunogenic RNA:DNA hybrids linked with induction of interferon response genes and DNA damage response. Epigenetic and DNA damaging agents induce repeat RNAs and have enhanced cytotoxicity with 3TC. These findings identify a vulnerability in CRC by targeting the viral mimicry of repeat elements.
Jin Z, Huang W, Shen N, Li J, Wang X, Dong J, Park PJ, Xi R. Single-cell gene fusion detection by scFusion. Nat Commun 2022;13(1):1084.Abstract
Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.
Breuss MW, Yang X, Schlachetzki JCM, Antaki D, Lana AJ, Xu X, Chung C, Chai G, Stanley V, Song Q, Newmeyer TF, Nguyen A, O'Brien S, Hoeksema MA, Cao B, Nott A, McEvoy-Venneri J, Pasillas MP, Barton ST, Copeland BR, Nahas S, Van Der Kraan L, Ding Y, Network NIMHBSM, Glass CK, Gleeson JG. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 2022;604(7907):689-696.Abstract
The structure of the human neocortex underlies species-specific traits and reflects intricate developmental programs. Here we sought to reconstruct processes that occur during early development by sampling adult human tissues. We analysed neocortical clones in a post-mortem human brain through a comprehensive assessment of brain somatic mosaicism, acting as neutral lineage recorders1,2. We combined the sampling of 25 distinct anatomic locations with deep whole-genome sequencing in a neurotypical deceased individual and confirmed results with 5 samples collected from each of three additional donors. We identified 259 bona fide mosaic variants from the index case, then deconvolved distinct geographical, cell-type and clade organizations across the brain and other organs. We found that clones derived after the accumulation of 90-200 progenitors in the cerebral cortex tended to respect the midline axis, well before the anterior-posterior or ventral-dorsal axes, representing a secondary hierarchy following the overall patterning of forebrain and hindbrain domains. Clones across neocortically derived cells were consistent with a dual origin from both dorsal and ventral cellular populations, similar to rodents, whereas the microglia lineage appeared distinct from other resident brain cells. Our data provide a comprehensive analysis of brain somatic mosaicism across the neocortex and demonstrate cellular origins and progenitor distribution patterns within the human brain.