The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Mutational activation of KRAS promotes the initiation and progression of cancers, especially in the colorectum, pancreas, lung, and blood plasma, with varying prevalence of speciﬁc activating missense mutations. Although epidemiological studies connect speciﬁc alleles to clinical outcomes, the mechanisms underlying the distinct clinical characteristics of mutant KRAS alleles are unclear. Here, we analyze 13,492 samples from these four tumor types to examine allele- and tissue-speciﬁc genetic properties associated with oncogenic KRAS mutations. The prevalence of known mutagenic mechanisms partially explains the observed spectrum of KRAS activating mutations. However, there are substantial differences between the observed and predicted frequencies for many alleles, suggesting that biological selection underlies the tissue-speciﬁc frequencies of mutant alleles. Consistent with experimental studies that have identiﬁed distinct signaling properties associated with each mutant form of KRAS, our genetic analysis reveals that each KRAS allele is associated with a distinct tissuespeciﬁc comutation network. Moreover, we identify tissue-speciﬁc genetic dependencies associated with speciﬁc mutant KRAS alleles. Overall, this analysis demonstrates that the genetic interactions of oncogenic KRAS mutations are allele- and tissue-speciﬁc, underscoring the complexity that drives their clinical consequences.
Goldman MJ*, Zhang J*, Fonseca NA*, Cortés-Ciriano I*, Xiang Q, Craft B, Piñeiro-Yáñez E, O'Connor BD, Bazant W, Barrera E, Muñoz-Pomer A, Petryszak R, Füllgrabe A, Al-Shahrour F, Keays M, Haussler D, Weinstein JN, Huber W, Valencia A, Park PJ, Papatheodorou I, Zhu J, Ferretti V, Vazquez M. A user guide for the online exploration and visualization of PCAWG data. Nat Commun 2020;11(1):3400.Abstract
The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.
Combined PARP and immune checkpoint inhibition has yielded encouraging results in ovarian cancer, but predictive biomarkers are lacking. We performed immunogenomic profiling and highly multiplexed single-cell imaging on tumor samples from patients enrolled in a Phase I/II trial of niraparib and pembrolizumab in ovarian cancer (NCT02657889). We identify two determinants of response; mutational signature 3 reflecting defective homologous recombination DNA repair, and positive immune score as a surrogate of interferon-primed exhausted CD8 + T-cells in the tumor microenvironment. Presence of one or both features associates with an improved outcome while concurrent absence yields no responses. Single-cell spatial analysis reveals prominent interactions of exhausted CD8 + T-cells and PD-L1 + macrophages and PD-L1 + tumor cells as mechanistic determinants of response. Furthermore, spatial analysis of two extreme responders shows differential clustering of exhausted CD8 + T-cells with PD-L1 + macrophages in the first, and exhausted CD8 + T-cells with cancer cells harboring genomic PD-L1 and PD-L2 amplification in the second.
Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXXtrunc) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXXtrunc tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.
Recent advances in single cell technology have enabled dissection of cellular heterogeneity in great detail. However, analysis of single cell DNA sequencing data remains challenging due to bias and artifacts that arise during DNA extraction and whole-genome amplification, including allelic imbalance and dropout. Here, we present a framework for statistical estimation of allele-specific amplification imbalance at any given position in single cell whole-genome sequencing data by utilizing the allele frequencies of heterozygous single nucleotide polymorphisms in the neighborhood. The resulting allelic imbalance profile is critical for determining whether the variant allele fraction of an observed mutation is consistent with the expected fraction for a true variant. This method, implemented in SCAN-SNV (Single Cell ANalysis of SNVs), substantially improves the identification of somatic variants in single cells. Our allele balance framework is broadly applicable to genotype analysis of any variant type in any data that might exhibit allelic imbalance.
Bromodomain-containing protein 9 (BRD9) is a recently identified subunit of SWI/SNF(BAF) chromatin remodeling complexes, yet its function is poorly understood. Here, using a genome-wide CRISPR-Cas9 screen, we show that BRD9 is a specific vulnerability in pediatric malignant rhabdoid tumors (RTs), which are driven by inactivation of the SMARCB1 subunit of SWI/SNF. We find that BRD9 exists in a unique SWI/SNF sub-complex that lacks SMARCB1, which has been considered a core subunit. While SMARCB1-containing SWI/SNF complexes are bound preferentially at enhancers, we show that BRD9-containing complexes exist at both promoters and enhancers. Mechanistically, we show that SMARCB1 loss causes increased BRD9 incorporation into SWI/SNF thus providing insight into BRD9 vulnerability in RTs. Underlying the dependency, while its bromodomain is dispensable, the DUF3512 domain of BRD9 is essential for SWI/SNF integrity in the absence of SMARCB1. Collectively, our results reveal a BRD9-containing SWI/SNF subcomplex is required for the survival of SMARCB1-mutant RTs.
Microsatellite instability (MSI) refers to the hypermutability of short repetitive sequences in the genome caused by impaired DNA mismatch repair. Although MSI has been studied for decades, large amounts of sequencing data now available allows us to examine the molecular fingerprints of MSI in greater detail. Here, we analyse ∼8,000 exomes and ∼1,000 whole genomes of cancer patients across 23 cancer types. Our analysis reveals that the frequency of MSI events is highly variable within and across tumour types. We also identify genes in DNA repair and oncogenic pathways recurrently subject to MSI and uncover non-coding loci that frequently display MSI. Finally, we propose a highly accurate exome-based predictive model for the MSI phenotype. These results advance our understanding of the genomic drivers and consequences of MSI, and our comprehensive catalogue of tumour-type-specific MSI loci will enable panel-based MSI testing to identify patients who are likely to benefit from immunotherapy.
Accurate detection of genomic alterations using high-throughput sequencing is an essential component of precision cancer medicine. We characterize the variant allele fractions (VAFs) of somatic single nucleotide variants and indels across 5095 clinical samples profiled using a custom panel, CancerSCAN. Our results demonstrate that a significant fraction of clinically actionable variants have low VAFs, often due to low tumor purity and treatment-induced mutations. The percentages of mutations under 5% VAF across hotspots in EGFR, KRAS, PIK3CA, and BRAF are 16%, 11%, 12%, and 10%, respectively, with 24% for EGFR T790M and 17% for PIK3CA E545. For clinical relevance, we describe two patients for whom targeted therapy achieved remission despite low VAF mutations. We also characterize the read depths necessary to achieve sensitivity and specificity comparable to current laboratory assays. These results show that capturing low VAF mutations at hotspots by sufficient sequencing coverage and carefully tuned algorithms is imperative for a clinical assay.
Release of promoter-proximally paused RNA polymerase II (RNAPII) is a recently recognized transcriptional regulatory checkpoint. The biological roles of RNAPII pause release and the mechanisms by which extracellular signals control it are incompletely understood. Here we show that VEGF stimulates RNAPII pause release by stimulating acetylation of ETS1, a master endothelial cell transcriptional regulator. In endothelial cells, ETS1 binds transcribed gene promoters and stimulates their expression by broadly increasing RNAPII pause release. 34 VEGF enhances ETS1 chromatin occupancy and increases ETS1 acetylation, enhancing its binding to BRD4, which recruits the pause release machinery and increases RNAPII pause release. Endothelial cell angiogenic responses in vitro and in vivo require ETS1-mediated transduction of VEGF signaling to release paused RNAPII. Our results define an angiogenic pathway in which VEGF enhances ETS1-BRD4 interaction to broadly promote RNAPII pause release and drive angiogenesis.Promoter proximal RNAPII pausing is a rate-limiting transcriptional mechanism. Chen et al. show that this process is essential in angiogenesis by demonstrating that the endothelial master transcription factor ETS1 promotes global RNAPII pause release, and that this process is governed by VEGF.
Genes encoding subunits of SWI/SNF (BAF) chromatin remodelling complexes are collectively altered in over 20% of human malignancies, but the mechanisms by which these complexes alter chromatin to modulate transcription and cell fate are poorly understood. Utilizing mouse embryonic fibroblast and cancer cell line models, here we show via ChIP-seq and biochemical assays that SWI/SNF complexes are preferentially targeted to distal lineage specific enhancers and interact with p300 to modulate histone H3 lysine 27 acetylation. We identify a greater requirement for SWI/SNF at typical enhancers than at most super-enhancers and at enhancers in untranscribed regions than in transcribed regions. Our data further demonstrate that SWI/SNF-dependent distal enhancers are essential for controlling expression of genes linked to developmental processes. Our findings thus establish SWI/SNF complexes as regulators of the enhancer landscape and provide insight into the roles of SWI/SNF in cellular fate control.
Chromatin accessibility plays a fundamental role in gene regulation. Nucleosome placement, usually measured by quantifying protection of DNA from enzymatic digestion, can regulate accessibility. We introduce a metric that uses micrococcal nuclease (MNase) digestion in a novel manner to measure chromatin accessibility by combining information from several digests of increasing depths. This metric, MACC (MNase accessibility), quantifies the inherent heterogeneity of nucleosome accessibility in which some nucleosomes are seen preferentially at high MNase and some at low MNase. MACC interrogates each genomic locus, measuring both nucleosome location and accessibility in the same assay. MACC can be performed either with or without a histone immunoprecipitation step, and thereby compares histone and non-histone protection. We find that changes in accessibility at enhancers, promoters and other regulatory regions do not correlate with changes in nucleosome occupancy. Moreover, high nucleosome occupancy does not necessarily preclude high accessibility, which reveals novel principles of chromatin regulation.
Chromatin structure determines DNA accessibility. We compare nucleosome occupancy in mouse and human embryonic stem cells (ESCs), induced-pluripotent stem cells (iPSCs) and differentiated cell types using MNase-seq. To address variability inherent in this technique, we developed a bioinformatic approach to identify regions of difference (RoD) in nucleosome occupancy between pluripotent and somatic cells. Surprisingly, most chromatin remains unchanged; a majority of rearrangements appear to affect a single nucleosome. RoDs are enriched at genes and regulatory elements, including enhancers associated with pluripotency and differentiation. RoDs co-localize with binding sites of key developmental regulators, including the reprogramming factors Klf4, Oct4/Sox2 and c-Myc. Nucleosomal landscapes in ESC enhancers are extensively altered, exhibiting lower nucleosome occupancy in pluripotent cells than in somatic cells. Most changes are reset during reprogramming. We conclude that changes in nucleosome occupancy are a hallmark of cell differentiation and reprogramming and likely identify regulatory regions essential for these processes.