The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu SX, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 2012;22(9):1813-31.Abstract
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Lung squamous cell carcinoma is a common type of lung cancer, causing approximately 400,000 deaths per year worldwide. Genomic alterations in squamous cell lung cancers have not been comprehensively characterized, and no molecularly targeted agents have been specifically developed for its treatment. As part of The Cancer Genome Atlas, here we profile 178 lung squamous cell carcinomas to provide a comprehensive landscape of genomic and epigenomic alterations. We show that the tumour type is characterized by complex genomic alterations, with a mean of 360 exonic mutations, 165 genomic rearrangements, and 323 segments of copy number alteration per tumour. We find statistically recurrent mutations in 11 genes, including mutation of TP53 in nearly all specimens. Previously unreported loss-of-function mutations are seen in the HLA-A class I major histocompatibility gene. Significantly altered pathways included NFE2L2 and KEAP1 in 34%, squamous differentiation genes in 44%, phosphatidylinositol-3-OH kinase pathway genes in 47%, and CDKN2A and RB1 in 72% of tumours. We identified a potential therapeutic target in most tumours, offering new avenues of investigation for the treatment of squamous cell lung cancers.
To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at >10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel protein-expression-defined subgroups, possibly produced by stromal/microenvironmental elements, and integrated analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated HER2/EGFR/phosphorylated EGFR signature within the HER2-enriched expression subtype. Comparison of basal-like breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer.
Uterine leiomyomata (UL), the most common neoplasm in reproductive-age women, are classified into distinct genetic subgroups based on recurrent chromosome abnormalities. To develop a molecular signature of UL with t(12;14)(q14-q15;q23-q24), we took advantage of the multiple UL arising as independent clonal lesions within a single uterus. We compared genome-wide expression levels of t(12;14) UL to non-t(12;14) UL from each of nine women in a paired analysis, with each sample weighted for the percentage of t(12;14) cells to adjust for mosaicism with normal cells. This resulted in a transcriptional profile that confirmed HMGA2, known to be overexpressed in t(12;14) UL, as the most significantly altered gene. Pathway analysis of the differentially expressed genes showed significant association with cell proliferation, particularly G1/S checkpoint regulation. This is consistent with the known larger size of t(12;14) UL relative to karyotypically normal UL or to UL in the deletion 7q22 subgroup. Unsupervised hierarchical clustering demonstrated that patient variability is relatively dominant to the distinction of t(12;14) UL compared with non-t(12;14) UL or of t(12;14) UL compared with del(7q) UL. The paired design we employed is therefore important to produce an accurate t(12;14) UL-specific gene list by removing the confounding effects of genotype and environment. Interestingly, myometrium not only clustered away from the tumors, but generally separated based on associated t(12;14) versus del(7q) status. Nine genes were identified whose expression can distinguish the myometrium origin. This suggests an underlying constitutional genetic predisposition to these somatic changes which could potentially lead to improved personalized management and treatment.
Sex chromosome dosage compensation in Drosophila provides a model for understanding how chromatin organization can modulate coordinate gene regulation. Male Drosophila increase the transcript levels of genes on the single male X approximately two-fold to equal the gene expression in females, which have two X-chromosomes. Dosage compensation is mediated by the Male-Specific Lethal (MSL) histone acetyltransferase complex. Five core components of the MSL complex were identified by genetic screens for genes that are specifically required for male viability and are dispensable for females. However, because dosage compensation must interface with the general transcriptional machinery, it is likely that identifying additional regulators that are not strictly male-specific will be key to understanding the process at a mechanistic level. Such regulators would not have been recovered from previous male-specific lethal screening strategies. Therefore, we have performed a cell culture-based, genome-wide RNAi screen to search for factors required for MSL targeting or function. Here we focus on the discovery of proteins that function to promote MSL complex recruitment to "chromatin entry sites," which are proposed to be the initial sites of MSL targeting. We find that components of the NSL (Non-specific lethal) complex, and a previously unstudied zinc-finger protein, facilitate MSL targeting and display a striking enrichment at MSL entry sites. Identification of these factors provides new insight into how MSL complex establishes the specialized hyperactive chromatin required for dosage compensation in Drosophila.
OBJECTIVE: Short bowel syndrome remains a condition of high morbidity and mortality, and current therapeutic options carry significant side effects. To identify new treatments we focused on postresection changes in microRNAs--short noncoding RNAs, which suppress target genes--and suggest a previously undiscovered role for microRNA-125a (mir-125a) in intestinal adaptation. METHODS: Rats underwent either 80% massive small bowel resection or transection and were harvested after 48 hours. Jejunum was harvested for microRNA microarrays, laser capture microdissection, and RNA and protein analysis. Mir-125a was overexpressed in intestinal epithelium-6 (crypt-derived) cells (IEC-6) and effects on proliferation and apoptosis determined using MTS and flow cytometry. Expression of potential targets of mir-125a in rat jejunum and IEC-6 cells was determined using quantitative real-time polymerase chain reaction (RNA) and Western blotting (protein). RESULTS: Resection upregulated mir-125a and mir-214 by 2.4-folds and 3.2-folds, respectively. Highest levels of expression were noted in the crypt fraction. Mir-125a overexpression induced apoptosis and resultant growth arrest in IEC-6 cells. The expression of the prosurvival Bcl-2 family member Mcl-1 was downregulated in both mir-125a-overexpressing IEC-6 cells and in jejunum of resected rats, confirming Mcl-1 as a previously undiscovered target of mir-125a. CONCLUSIONS: Upregulation of mir-125a suppresses the prosurvival protein Mcl1, producing the increase in apoptosis known to accompany the proliferative changes characteristic of intestinal adaptation. Our data highlight a potential role for microRNAs as mediators of the adaptive process and may facilitate the development of new therapeutic options for short bowel syndrome.
Chromatin environments differ greatly within a eukaryotic genome, depending on expression state, chromosomal location, and nuclear position. In genomic regions characterized by high repeat content and high gene density, chromatin structure must silence transposable elements but permit expression of embedded genes. We have investigated one such region, chromosome 4 of Drosophila melanogaster. Using chromatin-immunoprecipitation followed by microarray (ChIP-chip) analysis, we examined enrichment patterns of 20 histone modifications and 25 chromosomal proteins in S2 and BG3 cells, as well as the changes in several marks resulting from mutations in key proteins. Active genes on chromosome 4 are distinct from those in euchromatin or pericentric heterochromatin: while there is a depletion of silencing marks at the transcription start sites (TSSs), HP1a and H3K9me3, but not H3K9me2, are enriched strongly over gene bodies. Intriguingly, genes on chromosome 4 are less frequently associated with paused polymerase. However, when the chromatin is altered by depleting HP1a or POF, the RNA pol II enrichment patterns of many chromosome 4 genes shift, showing a significant decrease over gene bodies but not at TSSs, accompanied by lower expression of those genes. Chromosome 4 genes have a low incidence of TRL/GAGA factor binding sites and a low T(m) downstream of the TSS, characteristics that could contribute to a low incidence of RNA polymerase pausing. Our data also indicate that EGG and POF jointly regulate H3K9 methylation and promote HP1a binding over gene bodies, while HP1a targeting and H3K9 methylation are maintained at the repeats by an independent mechanism. The HP1a-enriched, POF-associated chromatin structure over the gene bodies may represent one type of adaptation for genes embedded in repetitive DNA.
Chromatin insulator elements and associated proteins have been proposed to partition eukaryotic genomes into sets of independently regulated domains. Here we test this hypothesis by quantitative genome-wide analysis of insulator protein binding to Drosophila chromatin. We find distinct combinatorial binding of insulator proteins to different classes of sites and uncover a novel type of insulator element that binds CP190 but not any other known insulator proteins. Functional characterization of different classes of binding sites indicates that only a small fraction act as robust insulators in standard enhancer-blocking assays. We show that insulators restrict the spreading of the H3K27me3 mark but only at a small number of Polycomb target regions and only to prevent repressive histone methylation within adjacent genes that are already transcriptionally inactive. RNAi knockdown of insulator proteins in cultured cells does not lead to major alterations in genome expression. Taken together, these observations argue against the concept of a genome partitioned by specialized boundary elements and suggest that insulators are reserved for specific regulation of selected genes.
Mutations of the NF2 gene on chromosome 22q are thought to initiate tumorigenesis in nearly 50% of meningiomas, and 22q deletion is the earliest and most frequent large-scale chromosomal abnormality observed in these tumors. In aggressive meningiomas, 22q deletions are generally accompanied by the presence of large-scale segmental abnormalities involving other chromosomes, but the reasons for this association are unknown. We find that large-scale chromosomal alterations accumulate during meningioma progression primarily in tumors harboring 22q deletions, suggesting 22q-associated chromosomal instability. Here we show frequent codeletion of the DNA repair and tumor suppressor gene, CHEK2, in combination with NF2 on chromosome 22q in a majority of aggressive meningiomas. In addition, tumor-specific splicing of CHEK2 in meningioma leads to decreased functional Chk2 protein expression. We show that enforced Chk2 knockdown in meningioma cells decreases DNA repair. Furthermore, Chk2 depletion increases centrosome amplification, thereby promoting chromosomal instability. Taken together, these data indicate that alternative splicing and frequent codeletion of CHEK2 and NF2 contribute to the genomic instability and associated development of aggressive biologic behavior in meningiomas.
The generation of induced pluripotent stem cells (iPSCs) often results in aberrant epigenetic silencing of the imprinted Dlk1-Dio3 gene cluster, compromising the ability to generate entirely iPSC-derived adult mice ('all-iPSC mice'). Here, we show that reprogramming in the presence of ascorbic acid attenuates hypermethylation of Dlk1-Dio3 by enabling a chromatin configuration that interferes with binding of the de novo DNA methyltransferase Dnmt3a. This approach allowed us to generate all-iPSC mice from mature B cells, which have until now failed to support the development of exclusively iPSC-derived postnatal animals. Our data show that transcription factor-mediated reprogramming can endow a defined, terminally differentiated cell type with a developmental potential equivalent to that of embryonic stem cells. More generally, these findings indicate that culture conditions during cellular reprogramming can strongly influence the epigenetic and biological properties of the resultant iPSCs.
MicroRNAs (miRNAs) are endogenous noncoding RNA molecules that are involved in post-transcriptional gene silencing. Using global miRNA expression profiling, we found miR-21, -155, and 18a to be highly upregulated in rat kidneys following tubular injury induced by ischemia/reperfusion (I/R) or gentamicin administration. Mir-21 and -155 also showed decreased expression patterns in blood and urinary supernatants in both models of kidney injury. Furthermore, urinary levels of miR-21 increased 1.2-fold in patients with clinical diagnosis of acute kidney injury (AKI) (n = 22) as compared with healthy volunteers (n = 25) (p < 0.05), and miR-155 decreased 1.5-fold in patients with AKI (p < 0.01). We identified 29 messenger RNA core targets of these 3 miRNAs using the context likelihood of relatedness algorithm and found these predicted gene targets to be highly enriched for genes associated with apoptosis or cell proliferation. Taken together, these results suggest that miRNA-21 and -155 could potentially serve as translational biomarkers for detection of AKI and may play a critical role in the pathogenesis of kidney injury and tissue repair process.
Variation in chromatin composition and organization often reflects differences in genome function. Histone variants, for example, replace canonical histones to contribute to regulation of numerous nuclear processes including transcription, DNA repair, and chromosome segregation. Here we focus on H2A.Bbd, a rapidly evolving variant found in mammals but not in invertebrates. We report that in human cells, nucleosomes bearing H2A.Bbd form unconventional chromatin structures enriched within actively transcribed genes and characterized by shorter DNA protection and nucleosome spacing. Analysis of transcriptional profiles from cells depleted for H2A.Bbd demonstrated widespread changes in gene expression with a net downregulation of transcription and disruption of normal mRNA splicing patterns. In particular, we observed changes in exon inclusion rates and increased presence of intronic sequences in mRNA products upon H2A.Bbd depletion. Taken together, our results indicate that H2A.Bbd is involved in formation of a specific chromatin structure that facilitates both transcription and initial mRNA processing.
PURPOSE: To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; http://bioinformatics.udel.edu/Research/iSyTE). iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses. METHODS: Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue. RESULTS: Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease-associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes. CONCLUSIONS: iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.
Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing data sets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.
Glioblastoma contains a hierarchy of stem-like cancer cells, but how this hierarchy is established is unclear. Here, we show that asymmetric Numb localization specifies glioblastoma stem-like cell (GSC) fate in a manner that does not require Notch inhibition. Numb is asymmetrically localized to CD133-hi GSCs. The predominant Numb isoform, Numb4, decreases Notch and promotes a CD133-hi, radial glial-like phenotype. However, upregulation of a novel Numb isoform, Numb4 delta 7 (Numb4d7), increases Notch and AKT activation while nevertheless maintaining CD133-hi fate specification. Numb knockdown increases Notch and promotes growth while favoring a CD133-lo, glial progenitor-like phenotype. We report the novel finding that Numb4 (but not Numb4d7) promotes SCF(Fbw7) ubiquitin ligase assembly and activation to increase Notch degradation. However, both Numb isoforms decrease epidermal growth factor receptor (EGFR) expression, thereby regulating GSC fate. Small molecule inhibition of EGFR activity phenocopies the effect of Numb on CD133 and Pax6. Clinically, homozygous NUMB deletions and low Numb mRNA expression occur primarily in a subgroup of proneural glioblastomas. Higher Numb expression is found in classical and mesenchymal glioblastomas and correlates with decreased survival. Thus, decreased Numb promotes glioblastoma growth, but the remaining Numb establishes a phenotypically diverse stem-like cell hierarchy that increases tumor aggressiveness and therapeutic resistance.
BACKGROUND: Recent withdrawals of major drugs have highlighted the critical importance of drug safety surveillance in the postmarketing phase. Limitations of spontaneous report data have led drug safety professionals to pursue alternative postmarketing surveillance approaches based on healthcare administrative claims data. These data are typically analysed by comparing the adverse event rates associated with a drug of interest to those of a single comparable reference drug. OBJECTIVE: The aim of this study was to determine whether adverse event detection can be improved by incorporating information from multiple reference drugs. We developed a pharmacological network model that implemented this approach and evaluated its performance. METHODS: We studied whether adverse event detection can be improved by incorporating information from multiple reference drugs, and describe two approaches for doing so. The first, reported previously, combines a set of related drugs into a single reference cohort. The second is a novel pharmacoepidemiological network model, which integrates multiple pair-wise comparisons across an entire set of related drugs into a unified consensus safety score for each drug. We also implemented a single reference drug approach for comparison with both multi-drug approaches. All approaches were applied within a sequential analysis framework, incorporating new information as it became available and addressing the issue of multiple testing over time. We evaluated all these approaches using statin (HMG-CoA reductase inhibitors) safety data from a large healthcare insurer in the US covering April 2000 through March 2005. RESULTS: We found that both multiple reference drug approaches offer earlier detection (6-13 months) than the single reference drug approach, without triggering additional false positives. CONCLUSIONS: Such combined approaches have the potential to be used with existing healthcare databases to improve the surveillance of therapeutics in the postmarketing phase over single-comparator methods. The proposed network approach also provides an integrated visualization framework enabling decision makers to understand the key high-level safety relationships amongst a group of related drugs.