Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV**, Park PJ**, Cancer Genome Atlas Research Network TCGA. Landscape of somatic retrotransposition in human cancers. Science 2012;337(6097):967-71.Abstract

Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing data sets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.

Jiang X, Xing H, Kim T-M, Jung Y, Huang W, Yang HW, Song S, Park PJ, Carroll RS, Johnson MD. Numb regulates glioma stem cell fate and growth by altering epidermal growth factor receptor and Skp1-Cullin-F-box ubiquitin ligase activity. Stem Cells 2012;30(7):1313-26.Abstract

Glioblastoma contains a hierarchy of stem-like cancer cells, but how this hierarchy is established is unclear. Here, we show that asymmetric Numb localization specifies glioblastoma stem-like cell (GSC) fate in a manner that does not require Notch inhibition. Numb is asymmetrically localized to CD133-hi GSCs. The predominant Numb isoform, Numb4, decreases Notch and promotes a CD133-hi, radial glial-like phenotype. However, upregulation of a novel Numb isoform, Numb4 delta 7 (Numb4d7), increases Notch and AKT activation while nevertheless maintaining CD133-hi fate specification. Numb knockdown increases Notch and promotes growth while favoring a CD133-lo, glial progenitor-like phenotype. We report the novel finding that Numb4 (but not Numb4d7) promotes SCF(Fbw7) ubiquitin ligase assembly and activation to increase Notch degradation. However, both Numb isoforms decrease epidermal growth factor receptor (EGFR) expression, thereby regulating GSC fate. Small molecule inhibition of EGFR activity phenocopies the effect of Numb on CD133 and Pax6. Clinically, homozygous NUMB deletions and low Numb mRNA expression occur primarily in a subgroup of proneural glioblastomas. Higher Numb expression is found in classical and mesenchymal glioblastomas and correlates with decreased survival. Thus, decreased Numb promotes glioblastoma growth, but the remaining Numb establishes a phenotypically diverse stem-like cell hierarchy that increases tumor aggressiveness and therapeutic resistance.

Reis BY, Olson KL, Tian L, Bohn RL, Brownstein JS, Park PJ, Cziraky MJ, Wilson MD, Mandl KD. A pharmacoepidemiological network model for drug safety surveillance: statins and rhabdomyolysis. Drug Saf 2012;35(5):395-406.Abstract

BACKGROUND: Recent withdrawals of major drugs have highlighted the critical importance of drug safety surveillance in the postmarketing phase. Limitations of spontaneous report data have led drug safety professionals to pursue alternative postmarketing surveillance approaches based on healthcare administrative claims data. These data are typically analysed by comparing the adverse event rates associated with a drug of interest to those of a single comparable reference drug. OBJECTIVE: The aim of this study was to determine whether adverse event detection can be improved by incorporating information from multiple reference drugs. We developed a pharmacological network model that implemented this approach and evaluated its performance. METHODS: We studied whether adverse event detection can be improved by incorporating information from multiple reference drugs, and describe two approaches for doing so. The first, reported previously, combines a set of related drugs into a single reference cohort. The second is a novel pharmacoepidemiological network model, which integrates multiple pair-wise comparisons across an entire set of related drugs into a unified consensus safety score for each drug. We also implemented a single reference drug approach for comparison with both multi-drug approaches. All approaches were applied within a sequential analysis framework, incorporating new information as it became available and addressing the issue of multiple testing over time. We evaluated all these approaches using statin (HMG-CoA reductase inhibitors) safety data from a large healthcare insurer in the US covering April 2000 through March 2005. RESULTS: We found that both multiple reference drug approaches offer earlier detection (6-13 months) than the single reference drug approach, without triggering additional false positives. CONCLUSIONS: Such combined approaches have the potential to be used with existing healthcare databases to improve the surveillance of therapeutics in the postmarketing phase over single-comparator methods. The proposed network approach also provides an integrated visualization framework enabling decision makers to understand the key high-level safety relationships amongst a group of related drugs.

Alekseyenko AA*, Ho JWK*, Peng S*, Gelbart M, Tolstorukov MY, Plachetka A, Kharchenko PV, Jung YL, Gorchakov AA, Larschan E, Gu T, Minoda A, Riddle NC, Schwartz YB, Elgin SCR, Karpen GH, Pirrotta V, Kuroda MI**, Park PJ**. Sequence-specific targeting of dosage compensation in Drosophila favors an active chromatin context. PLoS Genet 2012;8(4):e1002646.Abstract

The Drosophila MSL complex mediates dosage compensation by increasing transcription of the single X chromosome in males approximately two-fold. This is accomplished through recognition of the X chromosome and subsequent acetylation of histone H4K16 on X-linked genes. Initial binding to the X is thought to occur at "entry sites" that contain a consensus sequence motif ("MSL recognition element" or MRE). However, this motif is only ∼2 fold enriched on X, and only a fraction of the motifs on X are initially targeted. Here we ask whether chromatin context could distinguish between utilized and non-utilized copies of the motif, by comparing their relative enrichment for histone modifications and chromosomal proteins mapped in the modENCODE project. Through a comparative analysis of the chromatin features in male S2 cells (which contain MSL complex) and female Kc cells (which lack the complex), we find that the presence of active chromatin modifications, together with an elevated local GC content in the surrounding sequences, has strong predictive value for functional MSL entry sites, independent of MSL binding. We tested these sites for function in Kc cells by RNAi knockdown of Sxl, resulting in induction of MSL complex. We show that ectopic MSL expression in Kc cells leads to H4K16 acetylation around these sites and a relative increase in X chromosome transcription. Collectively, our results support a model in which a pre-existing active chromatin environment, coincident with H3K36me3, contributes to MSL entry site selection. The consequences of MSL targeting of the male X chromosome include increase in nucleosome lability, enrichment for H4K16 acetylation and JIL-1 kinase, and depletion of linker histone H1 on active X-linked genes. Our analysis can serve as a model for identifying chromatin and local sequence features that may contribute to selection of functional protein binding sites in the genome.

Evrony GD*, Cai X*, Lee E, Hills BL, Elhosary PC, Lehmann HS, Parker JJ, Atabay KD, Gilmore EC, Poduri A, Park PJ, Walsh CA. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell 2012;151(3):483-96.Abstract

A major unanswered question in neuroscience is whether there exists genomic variability between individual neurons of the brain, contributing to functional diversity or to an unexplained burden of neurological disease. To address this question, we developed a method to amplify genomes of single neurons from human brains. Because recent reports suggest frequent LINE-1 (L1) retrotransposition in human brains, we performed genome-wide L1 insertion profiling of 300 single neurons from cerebral cortex and caudate nucleus of three normal individuals, recovering >80% of germline insertions from single neurons. While we find somatic L1 insertions, we estimate <0.6 unique somatic insertions per neuron, and most neurons lack detectable somatic insertions, suggesting that L1 is not a major generator of neuronal diversity in cortex and caudate. We then genotyped single cortical cells to characterize the mosaicism of a somatic AKT3 mutation identified in a child with hemimegalencephaly. Single-neuron sequencing allows systematic assessment of genomic diversity in the human brain.

Xi R, Lee S, Park PJ. A survey of copy-number variation detection tools based on high-throughput sequencing data. Curr Protoc Hum Genet 2012;Chapter 7:Unit7.19.Abstract

Copy-number variation (CNV) is a major class of genomic variation with potentially important functional consequences in both normal and diseased populations. Remarkable advances in development of next-generation sequencing (NGS) platforms provide an unprecedented opportunity for accurate, high-resolution characterization of CNVs. In this unit, we give an overview of available computational tools for detection of CNVs and discuss comparative advantages and disadvantages of different approaches.

Tan X, Hu L, Luquette LJ, Gao G, Liu Y, Qu H, Xi R, Lu ZJ, Park PJ, Elledge SJ. Systematic identification of synergistic drug pairs targeting HIV. Nat Biotechnol 2012;30(11):1125-30.Abstract

The systematic identification of effective drug combinations has been hindered by the unavailability of methods that can explore the large combinatorial search space of drug interactions. Here we present multiplex screening for interacting compounds (MuSIC), which expedites the comprehensive assessment of pairwise compound interactions. We examined ∼500,000 drug pairs from 1,000 US Food and Drug Administration (FDA)-approved or clinically tested drugs and identified drugs that synergize to inhibit HIV replication. Our analysis reveals an enrichment of anti-inflammatory drugs in drug combinations that synergize against HIV. As inflammation accompanies HIV infection, these findings indicate that inhibiting inflammation could curb HIV propagation. Multiple drug pairs identified in this study, including various glucocorticoids and nitazoxanide (NTZ), synergize by targeting different steps in the HIV life cycle. MuSIC can be applied to a wide variety of disease-relevant screens to facilitate efficient identification of compound combinations.

O'Connell DJ*, Ho JWK*, Mammoto T, Turbe-Doan A, O'Connell JT, Haseley PS, Koo S, Kamiya N, Ingber DE, Park PJ, Maas RL. A Wnt-bmp feedback circuit controls intertissue signaling dynamics in tooth organogenesis. Science Signaling 2012;5(206):ra4.Abstract

Many vertebrate organs form through the sequential and reciprocal exchange of signaling molecules between juxtaposed epithelial and mesenchymal tissues. We undertook a systems biology approach that combined the generation and analysis of large-scale spatiotemporal gene expression data with mouse genetic experiments to gain insight into the mechanisms that control epithelial-mesenchymal signaling interactions in the developing mouse molar tooth. We showed that the shift in instructive signaling potential from dental epithelium to dental mesenchyme was accompanied by temporally coordinated genome-wide changes in gene expression in both compartments. To identify the mechanism responsible, we developed a probabilistic technique that integrates regulatory evidence from gene expression data and from the literature to reconstruct a gene regulatory network for the epithelial and mesenchymal compartments in early tooth development. By integrating these epithelial and mesenchymal gene regulatory networks through the action of diffusible extracellular signaling molecules, we identified a key epithelial-mesenchymal intertissue Wnt-Bmp (bone morphogenetic protein) feedback circuit. We then validated this circuit in vivo with compound genetic mutations in mice that disrupted this circuit. Moreover, mathematical modeling demonstrated that the structure of the circuit accounted for the observed reciprocal signaling dynamics. Thus, we have identified a critical signaling circuit that controls the coordinated genome-wide expression changes and reciprocal signaling molecule dynamics that occur in interacting epithelial and mesenchymal compartments during organogenesis.

Integrated genomic analyses of ovarian carcinoma. Nature 2011;474(7353):609-15.Abstract
A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients' lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology.
Egelhofer TA*, Minoda A*, Klugman S*, Lee K, Kolasinska-Zwierz P, Alekseyenko AA, Cheung M-S, Day DS, Gadel S, Gorchakov AA, Gu T, Kharchenko PV, Kuan S, Latorre I, Linder-Basso D, Luu Y, Ngo Q, Perry M, Rechtsteiner A, Riddle NC, Schwartz YB, Shanower GA, Vielle A, Ahringer J, Elgin SCR, Kuroda MI, Pirrotta V, Ren B, Strome S, Park PJ**, Karpen GH**, Hawkins D**R, Lieb JD**. An assessment of histone-modification antibody quality. Nat Struct Mol Biol 2011;18(1):91-3.Abstract

We have tested the specificity and utility of more than 200 antibodies raised against 57 different histone modifications in Drosophila melanogaster, Caenorhabditis elegans and human cells. Although most antibodies performed well, more than 25% failed specificity tests by dot blot or western blot. Among specific antibodies, more than 20% failed in chromatin immunoprecipitation experiments. We advise rigorous testing of histone-modification antibodies before use, and we provide a website for posting new test results (

Kharchenko PV, Xi R, Park PJ. Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nat Genet 2011;43(12):1167-9; author reply 1171-2.
Zacharek SJ, Fillmore CM, Lau AN, Gludish DW, Chou A, Ho JWK, Zamponi R, Gazit R, Bock C, Jäger N, Smith ZD, Kim T-M, Saunders AH, Wong J, Lee J-H, Roach RR, Rossi DJ, Meissner A, Gimelbrant AA, Park PJ, Kim CF. Lung stem cell self-renewal relies on BMI1-dependent control of expression at imprinted loci. Cell Stem Cell 2011;9(3):272-81.Abstract

BMI1 is required for the self-renewal of stem cells in many tissues including the lung epithelial stem cells, Bronchioalveolar Stem Cells (BASCs). Imprinted genes, which exhibit expression from only the maternally or paternally inherited allele, are known to regulate developmental processes, but what their role is in adult cells remains a fundamental question. Many imprinted genes were derepressed in Bmi1 knockout mice, and knockdown of Cdkn1c (p57) and other imprinted genes partially rescued the self-renewal defect of Bmi1 mutant lung cells. Expression of p57 and other imprinted genes was required for lung cell self-renewal in culture and correlated with repair of lung epithelial cell injury in vivo. Our data suggest that BMI1-dependent regulation of expressed alleles at imprinted loci, distinct from imprinting per se, is required for control of lung stem cells. We anticipate that the regulation and function of imprinted genes is crucial for self-renewal in diverse adult tissue-specific stem cells.

Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin.
Riddle NC*, Minoda A*, Kharchenko PV*, Alekseyenko AA, Schwartz YB, Tolstorukov MY, Gorchakov AA, Jaffe JD, Kennedy C, Linder-Basso D, Peach SE, Shanower G, Zheng H, Kuroda MI, Pirrotta V, Park PJ, Elgin SCR**, Karpen GH**. Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin. Genome Res 2011;21(2):147-63.Abstract

Eukaryotic genomes are packaged in two basic forms, euchromatin and heterochromatin. We have examined the composition and organization of Drosophila melanogaster heterochromatin in different cell types using ChIP-array analysis of histone modifications and chromosomal proteins. As anticipated, the pericentric heterochromatin and chromosome 4 are on average enriched for the "silencing" marks H3K9me2, H3K9me3, HP1a, and SU(VAR)3-9, and are generally depleted for marks associated with active transcription. The locations of the euchromatin-heterochromatin borders identified by these marks are similar in animal tissues and most cell lines, although the amount of heterochromatin is variable in some cell lines. Combinatorial analysis of chromatin patterns reveals distinct profiles for euchromatin, pericentric heterochromatin, and the 4th chromosome. Both silent and active protein-coding genes in heterochromatin display complex patterns of chromosomal proteins and histone modifications; a majority of the active genes exhibit both "activation" marks (e.g., H3K4me3 and H3K36me3) and "silencing" marks (e.g., H3K9me2 and HP1a). The hallmark of active genes in heterochromatic domains appears to be a loss of H3K9 methylation at the transcription start site. We also observe complex epigenomic profiles of intergenic regions, repeated transposable element (TE) sequences, and genes in the heterochromatic extensions. An unexpectedly large fraction of sequences in the euchromatic chromosome arms exhibits a heterochromatic chromatin signature, which differs in size, position, and impact on gene expression among cell types. We conclude that patterns of heterochromatin/euchromatin packaging show greater complexity and plasticity than anticipated. This comprehensive analysis provides a foundation for future studies of gene activity and chromosomal functions that are influenced by or dependent upon heterochromatin.

Larschan E*, Bishop EP*, Kharchenko PV, Core LJ, Lis JT, Park PJ**, Kuroda MI**. X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila. Nature 2011;471(7336):115-8.Abstract

The evolution of sex chromosomes has resulted in numerous species in which females inherit two X chromosomes but males have a single X, thus requiring dosage compensation. MSL (Male-specific lethal) complex increases transcription on the single X chromosome of Drosophila males to equalize expression of X-linked genes between the sexes. The biochemical mechanisms used for dosage compensation must function over a wide dynamic range of transcription levels and differential expression patterns. It has been proposed that the MSL complex regulates transcriptional elongation to control dosage compensation, a model subsequently supported by mapping of the MSL complex and MSL-dependent histone 4 lysine 16 acetylation to the bodies of X-linked genes in males, with a bias towards 3' ends. However, experimental analysis of MSL function at the mechanistic level has been challenging owing to the small magnitude of the chromosome-wide effect and the lack of an in vitro system for biochemical analysis. Here we use global run-on sequencing (GRO-seq) to examine the specific effect of the MSL complex on RNA Polymerase II (RNAP II) on a genome-wide level. Results indicate that the MSL complex enhances transcription by facilitating the progression of RNAP II across the bodies of active X-linked genes. Improving transcriptional output downstream of typical gene-specific controls may explain how dosage compensation can be imposed on the diverse set of genes along an entire chromosome.

Kim T-M, Park PJ. Advances in analysis of transcriptional regulatory networks. Wiley Interdiscip Rev Syst Biol Med 2011;3(1):21-35.Abstract

A transcriptional regulatory network represents a molecular framework in which developmental or environmental cues are transformed into differential expression of genes. Transcriptional regulation is mediated by the combinatorial interplay between cis-regulatory DNA elements and trans-acting transcription factors, and is perhaps the most important mechanism for controlling gene expression. Recent innovations, most notably the method for detecting protein-DNA interactions genome-wide, can help provide a comprehensive catalog of cis-regulatory elements and their interaction with given trans-acting factors in a given condition. A transcriptional regulatory network that integrates such information can lead to a systems-level understanding of regulatory mechanisms. In this review, we will highlight the key aspects of current knowledge on eukaryotic transcriptional regulation, especially on known transcription factors and their interacting regulatory elements. Then we will review some recent technical advances for genome-wide mapping of DNA-protein interactions based on high-throughput sequencing. Finally, we will discuss the types of biological insights that can be obtained from a network-level understanding of transcription regulation as well as future challenges in the field.

Anchan RM, Quaas P, Gerami-Naini B, Bartake H, Griffin A, Zhou Y, Day DS, Eaton JL, George LL, Naber C, Turbe-Doan A, Park PJ, Hornstein MD, Maas RL. Amniocytes can serve a dual function as a source of iPS cells and feeder layers. Hum Mol Genet 2011;20(5):962-74.Abstract

Clinical barriers to stem-cell therapy include the need for efficient derivation of histocompatible stem cells and the zoonotic risk inherent to human stem-cell xenoculture on mouse feeder cells. We describe a system for efficiently deriving induced pluripotent stem (iPS) cells from human and mouse amniocytes, and for maintaining the pluripotency of these iPS cells on mitotically inactivated feeder layers prepared from the same amniocytes. Both cellular components of this system are thus autologous to a single donor. Moreover, the use of human feeder cells reduces the risk of zoonosis. Generation of iPS cells using retroviral vectors from short- or long-term cultured human and mouse amniocytes using four factors, or two factors in mouse, occurs in 5-7 days with 0.5% efficiency. This efficiency is greater than that reported for mouse and human fibroblasts using similar viral infection approaches, and does not appear to result from selective reprogramming of Oct4(+) or c-Kit(+) amniocyte subpopulations. Derivation of amniocyte-derived iPS (AdiPS) cell colonies, which express pluripotency markers and exhibit appropriate microarray expression and DNA methylation properties, was facilitated by live immunostaining. AdiPS cells also generate embryoid bodies in vitro and teratomas in vivo. Furthermore, mouse and human amniocytes can serve as feeder layers for iPS cells and for mouse and human embryonic stem (ES) cells. Thus, human amniocytes provide an efficient source of autologous iPS cells and, as feeder cells, can also maintain iPS and ES cell pluripotency without the safety concerns associated with xenoculture.

Ho JWK, Bishop EP, Karchenko PV, Nègre N, White KP, Park PJ. ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis. BMC Genomics 2011;12:134.Abstract

BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. RESULTS: Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. CONCLUSIONS: Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.

Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, Linder-Basso D, Plachetka A, Shanower G, Tolstorukov MY, Luquette LJ, Xi R, Jung YL, Park RW, Bishop EP, Canfield TK, Sandstrom R, Thurman RE, MacAlpine DM, Stamatoyannopoulos JA, Kellis M, Elgin SCR, Kuroda MI, Pirrotta V, Karpen GH**, Park PJ**. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 2011;471(7339):480-5.Abstract

Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which have an impact on cell differentiation, gene regulation and other key cellular processes. Here we present a genome-wide chromatin landscape for Drosophila melanogaster based on eighteen histone modifications, summarized by nine prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNase I hypersensitivity, GRO-Seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.

Xi R, Hadjipanayis AG, Luquette LJ, Kim T-M, Lee E, Zhang J, Johnson MD, Muzny DM, Wheeler DA, Gibbs RA, Kucherlapati R, Park PJ. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A 2011;108(46):E1128-36.Abstract

DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data.

Kim T-M, Huang W, Park R, Park PJ**, Johnson MD**. A developmental taxonomy of glioblastoma defined and maintained by MicroRNAs. Cancer Res 2011;71(9):3387-99.Abstract

mRNA expression profiling has suggested the existence of multiple glioblastoma subclasses, but their number and characteristics vary among studies and the etiology underlying their development is unclear. In this study, we analyzed 261 microRNA expression profiles from The Cancer Genome Atlas (TCGA), identifying five clinically and genetically distinct subclasses of glioblastoma that each related to a different neural precursor cell type. These microRNA-based glioblastoma subclasses displayed microRNA and mRNA expression signatures resembling those of radial glia, oligoneuronal precursors, neuronal precursors, neuroepithelial/neural crest precursors, or astrocyte precursors. Each subclass was determined to be genetically distinct, based on the significant differences they displayed in terms of patient race, age, treatment response, and survival. We also identified several microRNAs as potent regulators of subclass-specific gene expression networks in glioblastoma. Foremost among these is miR-9, which suppresses mesenchymal differentiation in glioblastoma by downregulating expression of JAK kinases and inhibiting activation of STAT3. Our findings suggest that microRNAs are important determinants of glioblastoma subclasses through their ability to regulate developmental growth and differentiation programs in several transformed neural precursor cell types. Taken together, our results define developmental microRNA expression signatures that both characterize and contribute to the phenotypic diversity of glioblastoma subclasses, thereby providing an expanded framework for understanding the pathogenesis of glioblastoma in a human neurodevelopmental context.