The vertebrate retina is generated by retinal progenitor cells (RPCs), which produce >100 cell types. Although some RPCs produce many cell types, other RPCs produce restricted types of daughter cells, such as a cone photoreceptor and a horizontal cell (HC). We used genome-wide assays of chromatin structure to compare the profiles of a restricted cone/HC RPC and those of other RPCs in chicks. These data nominated regions of regulatory activity, which were tested in tissue, leading to the identification of many cis-regulatory modules (CRMs) active in cone/HC RPCs and developing cones. Two transcription factors, Otx2 and Oc1, were found to bind to many of these CRMs, including those near genes important for cone development and function, and their binding sites were required for activity. We also found that Otx2 has a predicted autoregulatory CRM. These results suggest that Otx2, Oc1 and possibly other Onecut proteins have a broad role in coordinating cone development and function. The many newly discovered CRMs for cones are potentially useful reagents for gene therapy of cone diseases.
Wang Y, Bae T, Thorpe J, Sherman MA, Jones AG, Cho S, Daily K, Dou Y, Ganz J, Galor A, Lobon I, Pattni R, Rosenbluh C, Tomasi S, Tomasini L, Yang X, Zhou B, Akbarian S, Ball LL, Bizzotto S, Emery SB, Doan R, Fasching L, Jang Y, Juan D, Lizano E, Luquette LJ, Moldovan JB, Narurkar R, Oetjens MT, Rodin RE, Sekar S, Shin JH, Soriano E, Straub RE, Zhou W, Chess A, Gleeson JG, Marquès-Bonet T, Park PJ, Peters MA, Pevsner J, Walsh CA, Weinberger DR, Weinberger DR, Vaccarino FM, Moran JV, Urban AE, Kidd JM, Mills RE, Abyzov A. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol 2021;22(1):92.Abstract
BACKGROUND: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. RESULTS: Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. CONCLUSIONS: This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Histone chaperones are critical for controlling chromatin integrity during transcription, DNA replication, and DNA repair. Three conserved and essential chaperones, Spt6, Spn1/Iws1, and FACT, associate with elongating RNA polymerase II and interact with each other physically and/or functionally; however, there is little understanding of their individual functions or their relationships with each other. In this study, we selected for suppressors of a temperature-sensitive spt6 mutation that disrupts the Spt6-Spn1 physical interaction and that also causes both transcription and chromatin defects. This selection identified novel mutations in FACT. Surprisingly, suppression by FACT did not restore the Spt6-Spn1 interaction, based on coimmunoprecipitation, ChIP, and mass spectrometry experiments. Furthermore, suppression by FACT bypassed the complete loss of Spn1. Interestingly, the FACT suppressor mutations cluster along the FACT-nucleosome interface, suggesting that they alter FACT-nucleosome interactions. In agreement with this observation, we showed that the spt6 mutation that disrupts the Spt6-Spn1 interaction caused an elevated level of FACT association with chromatin, while the FACT suppressors reduced the level of FACT-chromatin association, thereby restoring a normal Spt6-FACT balance on chromatin. Taken together, these studies reveal previously unknown regulation between histone chaperones that is critical for their essential in vivo functions.
Negative elongation factor (NELF) is a critical transcriptional regulator that stabilizes paused RNA polymerase to permit rapid gene expression changes in response to environmental cues. Although NELF is essential for embryonic development, its role in adult stem cells remains unclear. In this study, through a muscle-stem-cell-specific deletion, we showed that NELF is required for efficient muscle regeneration and stem cell pool replenishment. In mechanistic studies using PRO-seq, single-cell trajectory analyses and myofiber cultures revealed that NELF works at a specific stage of regeneration whereby it modulates p53 signaling to permit massive expansion of muscle progenitors. Strikingly, transplantation experiments indicated that these progenitors are also necessary for stem cell pool repopulation, implying that they are able to return to quiescence. Thus, we identified a critical role for NELF in the expansion of muscle progenitors in response to injury and revealed that progenitors returning to quiescence are major contributors to the stem cell pool repopulation.
Mutational activation of KRAS promotes the initiation and progression of cancers, especially in the colorectum, pancreas, lung, and blood plasma, with varying prevalence of speciﬁc activating missense mutations. Although epidemiological studies connect speciﬁc alleles to clinical outcomes, the mechanisms underlying the distinct clinical characteristics of mutant KRAS alleles are unclear. Here, we analyze 13,492 samples from these four tumor types to examine allele- and tissue-speciﬁc genetic properties associated with oncogenic KRAS mutations. The prevalence of known mutagenic mechanisms partially explains the observed spectrum of KRAS activating mutations. However, there are substantial differences between the observed and predicted frequencies for many alleles, suggesting that biological selection underlies the tissue-speciﬁc frequencies of mutant alleles. Consistent with experimental studies that have identiﬁed distinct signaling properties associated with each mutant form of KRAS, our genetic analysis reveals that each KRAS allele is associated with a distinct tissuespeciﬁc comutation network. Moreover, we identify tissue-speciﬁc genetic dependencies associated with speciﬁc mutant KRAS alleles. Overall, this analysis demonstrates that the genetic interactions of oncogenic KRAS mutations are allele- and tissue-speciﬁc, underscoring the complexity that drives their clinical consequences.
Although cell lineage information is fundamental to understanding organismal development, very little direct information is available for humans. We performed high-depth (250×) whole-genome sequencing of multiple tissues from three individuals to identify hundreds of somatic single-nucleotide variants (sSNVs). Using these variants as "endogenous barcodes" in single cells, we reconstructed early embryonic cell divisions. Targeted sequencing of clonal sSNVs in different organs (about 25,000×) and in more than 1000 cortical single cells, as well as single-nucleus RNA sequencing and single-nucleus assay for transposase-accessible chromatin sequencing of ~100,000 cortical single cells, demonstrated asymmetric contributions of early progenitors to extraembryonic tissues, distinct germ layers, and organs. Our data suggest onset of gastrulation at an effective progenitor pool of about 170 cells and about 50 to 100 founders for the forebrain. Thus, mosaic mutations provide a permanent record of human embryonic development at very high resolution.
SUMMARY: Despite the improvement in variant detection algorithms, visual inspection of the read-level data remains an essential step for accurate identification of variants in genome analysis. We developed BamSnap, an efficient BAM file viewer utilizing a graphics library and BAM indexing. In contrast to existing viewers, BamSnap can generate high-quality snapshots rapidly, with customized tracks and layout. As an example, we produced read-level images at 1000 genomic loci for >2500 whole-genomes. AVAILABILITY: BamSnap is freely available at https://github.com/parklab/bamsnap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Idiopathic normal pressure hydrocephalus (iNPH) is a neurological disorder that occurs in about 1% of individuals over age 60 and is characterized by enlarged cerebral ventricles, gait difficulty, incontinence, and cognitive decline. The cause and pathophysiology of iNPH are largely unknown. We performed whole exome sequencing of DNA obtained from 53 unrelated iNPH patients. Two recurrent heterozygous loss of function deletions in CWH43 were observed in 15% of iNPH patients and were significantly enriched 6.6-fold and 2.7-fold, respectively, when compared to the general population. Cwh43 modifies the lipid anchor of glycosylphosphatidylinositol-anchored proteins. Mice heterozygous for CWH43 deletion appeared grossly normal but displayed hydrocephalus, gait and balance abnormalities, decreased numbers of ependymal cilia, and decreased localization of glycosylphosphatidylinositol-anchored proteins to the apical surfaces of choroid plexus and ependymal cells. Our findings provide novel mechanistic insights into the origins of iNPH and demonstrate that it represents a distinct disease entity.
Homologous recombination (HR)-deficient cancers are sensitive to poly-ADP ribose polymerase inhibitors (PARPi), which have shown clinical efficacy in the treatment of high-grade serous cancers (HGSC). However, the majority of patients will relapse, and acquired PARPi resistance is emerging as a pressing clinical problem. Here we generated seven single-cell clones with acquired PARPi resistance derived from a PARPi-sensitive TP53 -/- and BRCA1 -/- epithelial cell line generated using CRISPR/Cas9. These clones showed diverse resistance mechanisms, and some clones presented with multiple mechanisms of resistance at the same time. Genomic analysis of the clones revealed unique transcriptional and mutational profiles and increased genomic instability in comparison with a PARPi-sensitive cell line. Clonal evolutionary analyses suggested that acquired PARPi resistance arose via clonal selection from an intrinsically unstable and heterogenous cell population in the sensitive cell line, which contained preexisting drug-tolerant cells. Similarly, clonal and spatial heterogeneity in tumor biopsies from a clinical patient with BRCA1-mutant HGSC with acquired PARPi resistance was observed. In an imaging-based drug screening, the clones showed heterogenous responses to targeted therapeutic agents, indicating that not all PARPi-resistant clones can be targeted with just one therapy. Furthermore, PARPi-resistant clones showed mechanism-dependent vulnerabilities to the selected agents, demonstrating that a deeper understanding on the mechanisms of resistance could lead to improved targeting and biomarkers for HGSC with acquired PARPi resistance. SIGNIFICANCE: This study shows that BRCA1-deficient cells can give rise to multiple genomically and functionally heterogenous PARPi-resistant clones, which are associated with various vulnerabilities that can be targeted in a mechanism-specific manner.
Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. AVAILABILITY AND IMPLEMENTATION: HiTea is available at https://github.com/parklab/HiTea and as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
We characterize the landscape of somatic mutations-mutations occurring after fertilization-in the human brain using ultra-deep (~250×) whole-genome sequencing of prefrontal cortex from 59 donors with autism spectrum disorder (ASD) and 15 control donors. We observe a mean of 26 somatic single-nucleotide variants per brain present in ≥4% of cells, with enrichment of mutations in coding and putative regulatory regions. Our analysis reveals that the first cell division after fertilization produces ~3.4 mutations, followed by 2-3 mutations in subsequent generations. This suggests that a typical individual possesses ~80 somatic single-nucleotide variants present in ≥2% of cells-comparable to the number of de novo germline mutations per generation-with about half of individuals having at least one potentially function-altering somatic mutation somewhere in the cortex. ASD brains show an excess of somatic mutations in neural enhancer sequences compared with controls, suggesting that mosaic enhancer mutations may contribute to ASD risk.
Although germline de novo copy number variants (CNVs) are known causes of autism spectrum disorder (ASD), the contribution of mosaic (early-developmental) copy number variants (mCNVs) has not been explored. In this study, we assessed the contribution of mCNVs to ASD by ascertaining mCNVs in genotype array intensity data from 12,077 probands with ASD and 5,500 unaffected siblings. We detected 46 mCNVs in probands and 19 mCNVs in siblings, affecting 2.8-73.8% of cells. Probands carried a significant burden of large (>4-Mb) mCNVs, which were detected in 25 probands but only one sibling (odds ratio = 11.4, 95% confidence interval = 1.5-84.2, P = 7.4 × 10). Event size positively correlated with severity of ASD symptoms (P = 0.016). Surprisingly, we did not observe mosaic analogues of the short de novo CNVs recurrently observed in ASD (eg, 16p11.2). We further experimentally validated two mCNVs in postmortem brain tissue from 59 additional probands. These results indicate that mCNVs contribute a previously unexplained component of ASD risk.
A large amount of genomic data for profiling three-dimensional genome architecture have accumulated from large-scale consortium projects as well as from individual laboratories. In this review, we summarize recent landmark datasets and collections in the field. We describe the challenges in collection, annotation, and analysis of these data, particularly for integration of sequencing and microscopy data. We introduce efforts from consortia and independent groups to harmonize diverse datasets. As the resolution and throughput of sequencing and imaging technologies continue to increase, more efficient utilization and integration of collected data will be critical for a better understanding of nuclear architecture.