Publications by Year: 2022

Reiff SB, Schroeder AJ, Kırlı K, Cosolo A, Bakker C, Lee S, Veit AD, Balashov AK, Vitzthum C, Ronchetti W, Pitman KM, Johnson J, Ehmsen SR, Kerpedjiev P, Abdennur N, Imakaev M, Öztürk SU, Çamoğlu U, Mirny LA, Gehlenborg N*, Alver BH*, Park PJ*. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat Commun 2022;13(1):2365.Abstract
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Cortés-Ciriano I, Gulhan DC, Lee JJ-K, Melloni GEM, Park PJ*. Computational analysis of cancer genome sequencing data. Nat Rev Genet 2022;23(5):298-314.Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Lee S, Bakker C, Vitzthum C, Alver BH, Park PJ*. Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs. Bioinformatics 2022;Abstract
SUMMARY: As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files. AVAILABILITY: The format specification and source code are available at, and
Rajurkar M, Parikh AR, Solovyov A, You E, Kulkarni AS, Chu C, Xu KH, Jaicks C, Taylor MS, Wu C, Alexander KA, Good CR, Szabolcs A, Gerstberger S, Tran AV, Xu N, Ebright RY, Van Seventer EE, Vo KD, Tai EC, Lu C, Joseph-Chazan J, Raabe MJ, Nieman LT, Desai N, Arora KS, Ligorio M, Thapar V, Cohen L, Garden PM, Senussi Y, Zheng H, Allen JN, Blaszkowsky LS, Clark JW, Goyal L, Wo JY, Ryan DP, Corcoran RB, Deshpande V, Rivera MN, Aryee MJ, Hong TS, Berger SL, Walt DR, Burns KH, Park PJ, Greenbaum BD, Ting DT. Reverse Transcriptase Inhibition Disrupts Repeat Element Life Cycle in Colorectal Cancer. Cancer Discov 2022;Abstract
Altered RNA expression of repetitive sequences and retrotransposition are frequently seen in colorectal cancer (CRC) implicating a functional importance of repeat activity in cancer progression. We show the nucleoside reverse transcriptase inhibitor 3TC targets activities of these repeat elements in CRC pre-clinical models with a preferential effect in P53 mutant cell lines linked with direct binding of P53 to repeat elements. We translate these findings to a human Phase 2 trial of single agent 3TC treatment in metastatic CRC with demonstration of clinical benefit in 9 of 32 patients. Analysis of 3TC effects on CRC tumorspheres demonstrates accumulation of immunogenic RNA:DNA hybrids linked with induction of interferon response genes and DNA damage response. Epigenetic and DNA damaging agents induce repeat RNAs and have enhanced cytotoxicity with 3TC. These findings identify a vulnerability in CRC by targeting the viral mimicry of repeat elements.
Jin Z, Huang W, Shen N, Li J, Wang X, Dong J, Park PJ, Xi R. Single-cell gene fusion detection by scFusion. Nat Commun 2022;13(1):1084.Abstract
Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.
Breuss MW, Yang X, Schlachetzki JCM, Antaki D, Lana AJ, Xu X, Chung C, Chai G, Stanley V, Song Q, Newmeyer TF, Nguyen A, O'Brien S, Hoeksema MA, Cao B, Nott A, McEvoy-Venneri J, Pasillas MP, Barton ST, Copeland BR, Nahas S, Van Der Kraan L, Ding Y, Glass CK, Gleeson JG. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 2022;604(7907):689-696.Abstract
The structure of the human neocortex underlies species-specific traits and reflects intricate developmental programs. Here we sought to reconstruct processes that occur during early development by sampling adult human tissues. We analysed neocortical clones in a post-mortem human brain through a comprehensive assessment of brain somatic mosaicism, acting as neutral lineage recorders1,2. We combined the sampling of 25 distinct anatomic locations with deep whole-genome sequencing in a neurotypical deceased individual and confirmed results with 5 samples collected from each of three additional donors. We identified 259 bona fide mosaic variants from the index case, then deconvolved distinct geographical, cell-type and clade organizations across the brain and other organs. We found that clones derived after the accumulation of 90-200 progenitors in the cerebral cortex tended to respect the midline axis, well before the anterior-posterior or ventral-dorsal axes, representing a secondary hierarchy following the overall patterning of forebrain and hindbrain domains. Clones across neocortically derived cells were consistent with a dual origin from both dorsal and ventral cellular populations, similar to rodents, whereas the microglia lineage appeared distinct from other resident brain cells. Our data provide a comprehensive analysis of brain somatic mosaicism across the neocortex and demonstrate cellular origins and progenitor distribution patterns within the human brain.