Purpose: The identification of patients with homologous recombination deficiency (HRD) beyond BRCA1/2 mutations is an urgent task, as they may benefit from PARP inhibitors. We have previously developed a method to detect mutational signature 3 (Sig3), termed SigMA, associated with HRD from clinical panel sequencing data, that is able to reliably detect HRD from the limited sequencing data derived from gene-focused panel sequencing.
Experimental design: We apply this method to patients from two independent datasets: (i) high-grade serous ovarian cancer and triple-negative breast cancer (TNBC) from a phase Ib trial of the PARP inhibitor olaparib in combination with the PI3K inhibitor buparlisib (BKM120; NCT01623349), and (ii) TNBC patients who received neoadjuvant olaparib in the phase II PETREMAC trial (NCT02624973).
Results: We find that Sig3 as detected by SigMA is positively associated with improved progression-free survival and objective responses. In addition, comparison of Sig3 detection in panel and exome-sequencing data from the same patient samples demonstrated highly concordant results and superior performance in comparison with the genomic instability score.
Conclusions: Our analyses demonstrate that HRD can be detected reliably from panel-sequencing data that are obtained as part of routine clinical care, and that this approach can identify patients beyond those with germline BRCA1/2mut who might benefit from PARP inhibitors. Prospective clinical utility testing is warranted.
Accurate somatic mutation detection from single-cell DNA sequencing is challenging due to amplification-related artifacts. To reduce this artifact burden, an improved amplification technique, primary template-directed amplification (PTA), was recently introduced. We analyzed whole-genome sequencing data from 52 PTA-amplified single neurons using SCAN2, a new genotyper we developed to leverage mutation signatures and allele balance in identifying somatic single-nucleotide variants (SNVs) and small insertions and deletions (indels) in PTA data. Our analysis confirms an increase in nonclonal somatic mutation in single neurons with age, but revises the estimated rate of this accumulation to 16 SNVs per year. We also identify artifacts in other amplification methods. Most importantly, we show that somatic indels increase by at least three per year per neuron and are enriched in functional regions of the genome such as enhancers and promoters. Our data suggest that indels in gene-regulatory elements have a considerable effect on genome integrity in human neurons.
We analyzed 131 human brains (44 neurotypical, 19 with Tourette syndrome, 9 with schizophrenia, and 59 with autism) for somatic mutations after whole genome sequencing to a depth of more than 200×. Typically, brains had 20 to 60 detectable single-nucleotide mutations, but ~6% of brains harbored hundreds of somatic mutations. Hypermutability was associated with age and damaging mutations in genes implicated in cancers and, in some brains, reflected in vivo clonal expansions. Somatic duplications, likely arising during development, were found in ~5% of normal and diseased brains, reflecting background mutagenesis. Brains with autism were associated with mutations creating putative transcription factor binding motifs in enhancer-like regions in the developing brain. The top-ranked affected motifs corresponded to MEIS (myeloid ectopic viral integration site) transcription factors, suggesting a potential link between their involvement in gene regulation and autism.
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
SUMMARY: As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files. AVAILABILITY: The format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/pairsqc.
Rajurkar M, Parikh AR, Solovyov A, You E, Kulkarni AS, Chu C, Xu KH, Jaicks C, Taylor MS, Wu C, Alexander KA, Good CR, Szabolcs A, Gerstberger S, Tran AV, Xu N, Ebright RY, Van Seventer EE, Vo KD, Tai EC, Lu C, Joseph-Chazan J, Raabe MJ, Nieman LT, Desai N, Arora KS, Ligorio M, Thapar V, Cohen L, Garden PM, Senussi Y, Zheng H, Allen JN, Blaszkowsky LS, Clark JW, Goyal L, Wo JY, Ryan DP, Corcoran RB, Deshpande V, Rivera MN, Aryee MJ, Hong TS, Berger SL, Walt DR, Burns KH, Park PJ, Greenbaum BD, Ting DT. Reverse Transcriptase Inhibition Disrupts Repeat Element Life Cycle in Colorectal Cancer. Cancer Discov 2022;Abstract
Altered RNA expression of repetitive sequences and retrotransposition are frequently seen in colorectal cancer (CRC) implicating a functional importance of repeat activity in cancer progression. We show the nucleoside reverse transcriptase inhibitor 3TC targets activities of these repeat elements in CRC pre-clinical models with a preferential effect in P53 mutant cell lines linked with direct binding of P53 to repeat elements. We translate these findings to a human Phase 2 trial of single agent 3TC treatment in metastatic CRC with demonstration of clinical benefit in 9 of 32 patients. Analysis of 3TC effects on CRC tumorspheres demonstrates accumulation of immunogenic RNA:DNA hybrids linked with induction of interferon response genes and DNA damage response. Epigenetic and DNA damaging agents induce repeat RNAs and have enhanced cytotoxicity with 3TC. These findings identify a vulnerability in CRC by targeting the viral mimicry of repeat elements.
Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.
The structure of the human neocortex underlies species-specific traits and reflects intricate developmental programs. Here we sought to reconstruct processes that occur during early development by sampling adult human tissues. We analysed neocortical clones in a post-mortem human brain through a comprehensive assessment of brain somatic mosaicism, acting as neutral lineage recorders1,2. We combined the sampling of 25 distinct anatomic locations with deep whole-genome sequencing in a neurotypical deceased individual and confirmed results with 5 samples collected from each of three additional donors. We identified 259 bona fide mosaic variants from the index case, then deconvolved distinct geographical, cell-type and clade organizations across the brain and other organs. We found that clones derived after the accumulation of 90-200 progenitors in the cerebral cortex tended to respect the midline axis, well before the anterior-posterior or ventral-dorsal axes, representing a secondary hierarchy following the overall patterning of forebrain and hindbrain domains. Clones across neocortically derived cells were consistent with a dual origin from both dorsal and ventral cellular populations, similar to rodents, whereas the microglia lineage appeared distinct from other resident brain cells. Our data provide a comprehensive analysis of brain somatic mosaicism across the neocortex and demonstrate cellular origins and progenitor distribution patterns within the human brain.
For quality, interpretation, reproducibility and sharing value, microscopy images should be accompanied by detailed descriptions of the conditions that were used to produce them. Micro-Meta App is an intuitive, highly interoperable, open-source software tool that was developed in the context of the 4D Nucleome (4DN) consortium and is designed to facilitate the extraction and collection of relevant microscopy metadata as specified by the recent 4DN-BINA-OME tiered-system of Microscopy Metadata specifications. In addition to substantially lowering the burden of quality assurance, the visual nature of Micro-Meta App makes it particularly suited for training purposes.
BACKGROUND: Retrotransposons have been implicated as causes of Mendelian disease, but their role in autism spectrum disorder (ASD) has not been systematically defined, because they are only called with adequate sensitivity from whole genome sequencing (WGS) data and a large enough cohort for this analysis has only recently become available. RESULTS: We analyzed WGS data from a cohort of 2288 ASD families from the Simons Simplex Collection by establishing a scalable computational pipeline for retrotransposon insertion detection. We report 86,154 polymorphic retrotransposon insertions-including > 60% not previously reported-and 158 de novo retrotransposition events. The overall burden of de novo events was similar between ASD individuals and unaffected siblings, with 1 de novo insertion per 29, 117, and 206 births for Alu, L1, and SVA respectively, and 1 de novo insertion per 21 births total. However, ASD cases showed more de novo L1 insertions than expected in ASD genes. Additionally, we observed exonic insertions in loss-of-function intolerant genes, including a likely pathogenic exonic insertion in CSDE1, only in ASD individuals. CONCLUSIONS: These findings suggest a modest, but important, impact of intronic and exonic retrotransposon insertions in ASD, show the importance of WGS for their analysis, and highlight the utility of specific bioinformatic tools for high-throughput detection of retrotransposon insertions.
The vertebrate retina is generated by retinal progenitor cells (RPCs), which produce >100 cell types. Although some RPCs produce many cell types, other RPCs produce restricted types of daughter cells, such as a cone photoreceptor and a horizontal cell (HC). We used genome-wide assays of chromatin structure to compare the profiles of a restricted cone/HC RPC and those of other RPCs in chicks. These data nominated regions of regulatory activity, which were tested in tissue, leading to the identification of many cis-regulatory modules (CRMs) active in cone/HC RPCs and developing cones. Two transcription factors, Otx2 and Oc1, were found to bind to many of these CRMs, including those near genes important for cone development and function, and their binding sites were required for activity. We also found that Otx2 has a predicted autoregulatory CRM. These results suggest that Otx2, Oc1 and possibly other Onecut proteins have a broad role in coordinating cone development and function. The many newly discovered CRMs for cones are potentially useful reagents for gene therapy of cone diseases.
Wang Y, Bae T, Thorpe J, Sherman MA, Jones AG, Cho S, Daily K, Dou Y, Ganz J, Galor A, Lobon I, Pattni R, Rosenbluh C, Tomasi S, Tomasini L, Yang X, Zhou B, Akbarian S, Ball LL, Bizzotto S, Emery SB, Doan R, Fasching L, Jang Y, Juan D, Lizano E, Luquette LJ, Moldovan JB, Narurkar R, Oetjens MT, Rodin RE, Sekar S, Shin JH, Soriano E, Straub RE, Zhou W, Chess A, Gleeson JG, Marquès-Bonet T, Park PJ, Peters MA, Pevsner J, Walsh CA, Weinberger DR, Weinberger DR, Vaccarino FM, Moran JV, Urban AE, Kidd JM, Mills RE, Abyzov A. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol 2021;22(1):92.Abstract
BACKGROUND: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. RESULTS: Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. CONCLUSIONS: This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Histone chaperones are critical for controlling chromatin integrity during transcription, DNA replication, and DNA repair. Three conserved and essential chaperones, Spt6, Spn1/Iws1, and FACT, associate with elongating RNA polymerase II and interact with each other physically and/or functionally; however, there is little understanding of their individual functions or their relationships with each other. In this study, we selected for suppressors of a temperature-sensitive spt6 mutation that disrupts the Spt6-Spn1 physical interaction and that also causes both transcription and chromatin defects. This selection identified novel mutations in FACT. Surprisingly, suppression by FACT did not restore the Spt6-Spn1 interaction, based on coimmunoprecipitation, ChIP, and mass spectrometry experiments. Furthermore, suppression by FACT bypassed the complete loss of Spn1. Interestingly, the FACT suppressor mutations cluster along the FACT-nucleosome interface, suggesting that they alter FACT-nucleosome interactions. In agreement with this observation, we showed that the spt6 mutation that disrupts the Spt6-Spn1 interaction caused an elevated level of FACT association with chromatin, while the FACT suppressors reduced the level of FACT-chromatin association, thereby restoring a normal Spt6-FACT balance on chromatin. Taken together, these studies reveal previously unknown regulation between histone chaperones that is critical for their essential in vivo functions.
Negative elongation factor (NELF) is a critical transcriptional regulator that stabilizes paused RNA polymerase to permit rapid gene expression changes in response to environmental cues. Although NELF is essential for embryonic development, its role in adult stem cells remains unclear. In this study, through a muscle-stem-cell-specific deletion, we showed that NELF is required for efficient muscle regeneration and stem cell pool replenishment. In mechanistic studies using PRO-seq, single-cell trajectory analyses and myofiber cultures revealed that NELF works at a specific stage of regeneration whereby it modulates p53 signaling to permit massive expansion of muscle progenitors. Strikingly, transplantation experiments indicated that these progenitors are also necessary for stem cell pool repopulation, implying that they are able to return to quiescence. Thus, we identified a critical role for NELF in the expansion of muscle progenitors in response to injury and revealed that progenitors returning to quiescence are major contributors to the stem cell pool repopulation.
Mutational activation of KRAS promotes the initiation and progression of cancers, especially in the colorectum, pancreas, lung, and blood plasma, with varying prevalence of speciﬁc activating missense mutations. Although epidemiological studies connect speciﬁc alleles to clinical outcomes, the mechanisms underlying the distinct clinical characteristics of mutant KRAS alleles are unclear. Here, we analyze 13,492 samples from these four tumor types to examine allele- and tissue-speciﬁc genetic properties associated with oncogenic KRAS mutations. The prevalence of known mutagenic mechanisms partially explains the observed spectrum of KRAS activating mutations. However, there are substantial differences between the observed and predicted frequencies for many alleles, suggesting that biological selection underlies the tissue-speciﬁc frequencies of mutant alleles. Consistent with experimental studies that have identiﬁed distinct signaling properties associated with each mutant form of KRAS, our genetic analysis reveals that each KRAS allele is associated with a distinct tissuespeciﬁc comutation network. Moreover, we identify tissue-speciﬁc genetic dependencies associated with speciﬁc mutant KRAS alleles. Overall, this analysis demonstrates that the genetic interactions of oncogenic KRAS mutations are allele- and tissue-speciﬁc, underscoring the complexity that drives their clinical consequences.
Although cell lineage information is fundamental to understanding organismal development, very little direct information is available for humans. We performed high-depth (250×) whole-genome sequencing of multiple tissues from three individuals to identify hundreds of somatic single-nucleotide variants (sSNVs). Using these variants as "endogenous barcodes" in single cells, we reconstructed early embryonic cell divisions. Targeted sequencing of clonal sSNVs in different organs (about 25,000×) and in more than 1000 cortical single cells, as well as single-nucleus RNA sequencing and single-nucleus assay for transposase-accessible chromatin sequencing of ~100,000 cortical single cells, demonstrated asymmetric contributions of early progenitors to extraembryonic tissues, distinct germ layers, and organs. Our data suggest onset of gastrulation at an effective progenitor pool of about 170 cells and about 50 to 100 founders for the forebrain. Thus, mosaic mutations provide a permanent record of human embryonic development at very high resolution.
SUMMARY: Despite the improvement in variant detection algorithms, visual inspection of the read-level data remains an essential step for accurate identification of variants in genome analysis. We developed BamSnap, an efficient BAM file viewer utilizing a graphics library and BAM indexing. In contrast to existing viewers, BamSnap can generate high-quality snapshots rapidly, with customized tracks and layout. As an example, we produced read-level images at 1000 genomic loci for >2500 whole-genomes. AVAILABILITY: BamSnap is freely available at https://github.com/parklab/bamsnap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.