Publications

2008
Tolstorukov MY**, Choudhary V, Olson WK, Zhurkin VB, Park PJ**. nuScore: a web-interface for nucleosome positioning predictions. Bioinformatics 2008;24(12):1456-8.Abstract

SUMMARY: Sequence-directed mapping of nucleosome positions is of major biological interest. Here, we present a web-interface for estimation of the affinity of the histone core to DNA and prediction of nucleosome arrangement on a given sequence. Our approach is based on assessment of the energy cost of imposing the deformations required to wrap DNA around the histone surface. The interface allows the user to specify a number of options such as selecting from several structural templates for threading calculations and adding random sequences to the analysis. AVAILABILITY: The nuScore interface is freely available for use at http://compbio.med.harvard.edu/nuScore. CONTACT: peter_park@harvard.edu; tolstorukov@gmail.com SUPPLEMENTARY INFORMATION: The site contains user manual, description of the methodology and examples.

Alekseyenko AA, Peng S, Larschan E, Gorchakov AA, Lee O-K, Kharchenko P, McGrath SD, Wang CI, Mardis ER, Park PJ, Kuroda MI. A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome. Cell 2008;134(4):599-609.Abstract

The Drosophila MSL complex associates with active genes specifically on the male X chromosome to acetylate histone H4 at lysine 16 and increase expression approximately 2-fold. To date, no DNA sequence has been discovered to explain the specificity of MSL binding. We hypothesized that sequence-specific targeting occurs at "chromatin entry sites," but the majority of sites are sequence independent. Here we characterize 150 potential entry sites by ChIP-chip and ChIP-seq and discover a GA-rich MSL recognition element (MRE). The motif is only slightly enriched on the X chromosome ( approximately 2-fold), but this is doubled when considering its preferential location within or 3' to active genes (>4-fold enrichment). When inserted on an autosome, a newly identified site can direct local MSL spreading to flanking active genes. These results provide strong evidence for both sequence-dependent and -independent steps in MSL targeting of dosage compensation to the male X chromosome.

Dermody JL, Dreyfuss JM, Villén J, Ogundipe B, Gygi SP, Park PJ, Ponticelli AS, Moore CL, Buratowski S, Bucheli ME. Unphosphorylated SR-like protein Npl3 stimulates RNA polymerase II elongation. PLoS One 2008;3(9):e3273.Abstract

The production of a functional mRNA is regulated at every step of transcription. An area not well-understood is the transition of RNA polymerase II from elongation to termination. The S. cerevisiae SR-like protein Npl3 functions to negatively regulate transcription termination by antagonizing the binding of polyA/termination proteins to the mRNA. In this study, Npl3 is shown to interact with the CTD and have a direct stimulatory effect on the elongation activity of the polymerase. The interaction is inhibited by phosphorylation of Npl3. In addition, Casein Kinase 2 was found to be required for the phosphorylation of Npl3 and affect its ability to compete against Rna15 (Cleavage Factor I) for binding to polyA signals. Our results suggest that phosphorylation of Npl3 promotes its dissociation from the mRNA/RNAP II, and contributes to the association of the polyA/termination factor Rna15. This work defines a novel role for Npl3 in elongation and its regulation by phosphorylation.

2007
Larschan E, Alekseyenko AA, Gortchakov AA, Peng S, Li B, Yang P, Workman JL, Park PJ, Kuroda MI. MSL complex is attracted to genes marked by H3K36 trimethylation using a sequence-independent mechanism. Mol Cell 2007;28(1):121-33.Abstract

In Drosophila, X chromosome dosage compensation requires the male-specific lethal (MSL) complex, which associates with actively transcribed genes on the single male X chromosome to upregulate transcription approximately 2-fold. We found that on the male X chromosome, or when MSL complex is ectopically localized to an autosome, histone H3K36 trimethylation (H3K36me3) is a strong predictor of MSL binding. We isolated mutants lacking Set2, the H3K36me3 methyltransferase, and found that Set2 is an essential gene in both sexes of Drosophila. In set2 mutant males, MSL complex maintains X specificity but exhibits reduced binding to target genes. Furthermore, recombinant MSL3 protein preferentially binds nucleosomes marked by H3K36me3 in vitro. Our results support a model in which MSL complex uses high-affinity sites to initially recognize the X chromosome and then associates with many of its targets through sequence-independent features of transcribed genes.

Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet 2007;3(6):e96.Abstract

Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug treatment, and gene knockouts, and yet all display the clinical hallmarks of hyperglycemia and insulin resistance in peripheral tissue. The recent advances in gene-expression microarray technologies present an unprecedented opportunity to study type 2 diabetes mellitus at a genome-wide scale and across different models. To date, a key challenge has been to identify the biological processes or signaling pathways that play significant roles in the disorder. Here, using a network-based analysis methodology, we identified two sets of genes, associated with insulin signaling and a network of nuclear receptors, which are recurrent in a statistically significant number of diabetes and insulin resistance models and transcriptionally altered across diverse tissue types. We additionally identified a network of protein-protein interactions between members from the two gene sets that may facilitate signaling between them. Taken together, the results illustrate the benefits of integrating high-throughput microarray studies, together with protein-protein interaction networks, in elucidating the underlying biological processes associated with a complex disorder.

Peng S, Alekseyenko AA, Larschan E, Kuroda MI, Park PJ. Normalization and experimental design for ChIP-chip data. BMC Bioinformatics 2007;8:219.Abstract

BACKGROUND: Chromatin immunoprecipitation on tiling arrays (ChIP-chip) has been widely used to investigate the DNA binding sites for a variety of proteins on a genome-wide scale. However, several issues in the processing and analysis of ChIP-chip data have not been resolved fully, including the effect of background (mock control) subtraction and normalization within and across arrays. RESULTS: The binding profiles of Drosophila male-specific lethal (MSL) complex on a tiling array provide a unique opportunity for investigating these topics, as it is known to bind on the X chromosome but not on the autosomes. These large bound and control regions on the same array allow clear evaluation of analytical methods.We introduce a novel normalization scheme specifically designed for ChIP-chip data from dual-channel arrays and demonstrate that this step is critical for correcting systematic dye-bias that may exist in the data. Subtraction of the mock (non-specific antibody or no antibody) control data is generally needed to eliminate the bias, but appropriate normalization obviates the need for mock experiments and increases the correlation among replicates. The idea underlying the normalization can be used subsequently to estimate the background noise level in each array for normalization across arrays. We demonstrate the effectiveness of the methods with the MSL complex binding data and other publicly available data. CONCLUSION: Proper normalization is essential for ChIP-chip experiments. The proposed normalization technique can correct systematic errors and compensate for the lack of mock control data, thus reducing the experimental cost and producing more accurate results.

2006
Kuo WP, Liu F, Trimarchi J, Punzo C, Lombardi M, Sarang J, Whipple ME, Maysuria M, Serikawa K, Lee SY, McCrann D, Kang J, Shearstone JR, Burke J, Park DJ, Wang X, Rector TL, Ricciardi-Castagnoli P, Perrin S, Choi S, Bumgarner R, Kim JH, Short GF, Freeman MW, Seed B, Jensen R, Church GM, Hovig E, Cepko CL, Park P, Ohno-Machado L, Jenssen T-K. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat Biotechnol 2006;24(7):832-40.Abstract
Over the last decade, gene expression microarrays have had a profound impact on biomedical research. The diversity of platforms and analytical methods available to researchers have made the comparison of data from multiple platforms challenging. In this study, we describe a framework for comparisons across platforms and laboratories. We have attempted to include nearly all the available commercial and 'in-house' platforms. Using probe sequences matched at the exon level improved consistency of measurements across the different microarray platforms compared to annotation-based matches. Generally, consistency was good for highly expressed genes, and variable for genes with lower expression values as confirmed by quantitative real-time (QRT)-PCR. Concordance of measurements was higher between laboratories on the same platform than across platforms. We demonstrate that, after stringent preprocessing, commercial arrays were more consistent than in-house arrays, and by most measures, one-dye platforms were more consistent than two-dye platforms.
High-resolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome.
Alekseyenko AA, Larschan E, Lai WR, Park PJ**, Kuroda MI**. High-resolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome. Genes Dev 2006;20(7):848-57.Abstract

X-chromosome dosage compensation in Drosophila requires the male-specific lethal (MSL) complex, which up-regulates gene expression from the single male X chromosome. Here, we define X-chromosome-specific MSL binding at high resolution in two male cell lines and in late-stage embryos. We find that the MSL complex is highly enriched over most expressed genes, with binding biased toward the 3' end of transcription units. The binding patterns are largely similar in the distinct cell types, with approximately 600 genes clearly bound in all three cases. Genes identified as clearly bound in one cell type and not in another indicate that attraction of MSL complex correlates with expression state. Thus, sequence alone is not sufficient to explain MSL targeting. We propose that the MSL complex recognizes most X-linked genes, but only in the context of chromatin factors or modifications indicative of active transcription. Distinguishing expressed genes from the bulk of the genome is likely to be an important function common to many chromatin organizing and modifying activities.

Larschan E, Alekseyenko AA, Lai WR, Park PJ, Kuroda MI. MSL complex associates with clusters of actively transcribed genes along the Drosophila male X chromosome. Cold Spring Harb Symp Quant Biol 2006;71:385-94.Abstract

Dosage compensation in Drosophila serves as a model system for understanding the targeting of chromatin-modifying complexes to their sites of action. The MSL (male-specific lethal) complex up-regulates transcription of the single male X chromosome, thereby equalizing levels of transcription of X-linked genes between the sexes. Recruitment of the MSL complex to its binding sites on the male X chromosome requires each of the MSL proteins and at least one of the two large noncoding roX RNAs. To better understand how the MSL complex specifically targets the X chromosome, we have defined the binding using high-resolution genomic tiling arrays. Our results indicate that the MSL complex largely associates with transcribed genes that are present in clusters along the X chromosome. We hypothesize that after initial recruitment of the MSL complex to the X chromosome by unknown mechanisms, nascent transcripts or chromatin marks associated with active transcription attract the MSL complex to its final targets. Defining MSL-complex-binding sites will provide a tool for understanding functions of large noncoding RNAs that have remained elusive.

Kong SW, Pu WT, Park PJ. A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics 2006;22(19):2373-80.Abstract

MOTIVATION: Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions. RESULTS: We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling's T(2) statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and visualize the results in the principal component space.

Namekawa SH, Park PJ, Zhang L-F, Shima JE, McCarrey JR, Griswold MD, Lee JT. Postmeiotic sex chromatin in the male germline of mice. Curr Biol 2006;16(7):660-7.Abstract

In mammals, the X and Y chromosomes are subject to meiotic sex chromosome inactivation (MSCI) during prophase I in the male germline, but their status thereafter is currently unclear. An abundance of X-linked spermatogenesis genes has spawned the view that the X must be active . On the other hand, the idea that the imprinted paternal X of the early embryo may be preinactivated by MSCI suggests that silencing may persist longer . To clarify this issue, we establish a comprehensive X-expression profile during mouse spermatogenesis. Here, we discover that the X and Y occupy a novel compartment in the postmeiotic spermatid and adopt a non-Rabl configuration. We demonstrate that this postmeiotic sex chromatin (PMSC) persists throughout spermiogenesis into mature sperm and exhibits epigenetic similarity to the XY body. In the spermatid, 87% of X-linked genes remain suppressed postmeiotically, while autosomes are largely active. We conclude that chromosome-wide X silencing continues from meiosis to the end of spermiogenesis, and we discuss implications for proposed mechanisms of imprinted X-inactivation.

Yoon SS, Segal NH, Park PJ, Detwiller KY, Fernando NT, Ryeom SW, Brennan MF, Singer S. Angiogenic profile of soft tissue sarcomas based on analysis of circulating factors and microarray gene expression. J Surg Res 2006;135(2):282-90.Abstract

BACKGROUND: Broader understanding of diverse angiogenic pathways in a particular cancer can lead to better utilization of anti-angiogenic therapies. The aim of this study was to develop profiles of angiogenesis-related gene and protein expression for various histologic subtypes of soft tissue sarcomas (STS) growing in different sites. MATERIALS AND METHODS: Plasma levels of vascular endothelial growth factor (VEGF), basic fibroblast growth factor (bFGF), angiopoietin 2 (Ang2), and leptin were determined in 108 patients with primary STS. Gene expression patterns were analyzed in 38 STS samples and 13 normal tissues using oligonucleotide microarrays. RESULTS: VEGF and bFGF plasma levels were elevated 10-13 fold in STS patients compared to controls. VEGF levels were broadly elevated while bFGF levels were higher in patients with fibrosarcomas and leiomyosarcomas. Ang2 levels correlated with tumor size and were most elevated for tumors located in the trunk, while leptin levels were highest in patients with liposarcomas. Hierarchical clustering of microarray data based on angiogenesis-related gene expression demonstrated that histologic subtypes of STS often shared similar expression patterns, and these patterns were distinctly different from those of normal tissues. Matrix metalloproteinase 2, platelet-derived growth factor receptor, alpha and Notch 4 were among several genes that were up-regulated at least 7-fold in STS. CONCLUSIONS: STS demonstrate significant heterogeneity in their angiogenic profiles based on size, histologic subtype, and location of tumor growth, which may have implications for anti-angiogenic strategies. Comparison of STS to normal tissues reveals a panel of upregulated genes that may be targets for future therapies.

Liu F*, Park PJ*, Lai W, Maher E, Chakravarti A, Durso L, Jiang X, Yu Y, Brosius A, Thomas M, Chin L, Brennan C, DePinho RA, Kohane I, Carroll RS, Black PM, Johnson MD. A genome-wide screen reveals functional gene clusters in the cancer genome and identifies EphA2 as a mitogen in glioblastoma. Cancer Res 2006;66(22):10815-23.Abstract

A novel genome-wide screen that combines patient outcome analysis with array comparative genomic hybridization and mRNA expression profiling was developed to identify genes with copy number alterations, aberrant mRNA expression, and relevance to survival in glioblastoma. The method led to the discovery of physical gene clusters within the cancer genome with boundaries defined by physical proximity, correlated mRNA expression patterns, and survival relatedness. These boundaries delineate a novel genomic interval called the functional common region (FCR). Many FCRs contained genes of high biological relevance to cancer and were used to pinpoint functionally significant DNA alterations that were too small or infrequent to be reliably identified using standard algorithms. One such FCR contained the EphA2 receptor tyrosine kinase. Validation experiments showed that EphA2 mRNA overexpression correlated inversely with patient survival in a panel of 21 glioblastomas, and ligand-mediated EphA2 receptor activation increased glioblastoma proliferation and tumor growth via a mitogen-activated protein kinase-dependent pathway. This novel genome-wide approach greatly expanded the list of target genes in glioblastoma and represents a powerful new strategy to identify the upstream determinants of tumor phenotype in a range of human cancers.

2005
Greenberg SA, Pinkus JL, Pinkus GS, Burleson T, Sanoudou D, Tawil R, Barohn RJ, Saperstein DS, Briemberg HR, Ericsson M, Park P, Amato AA. Interferon-alpha/beta-mediated innate immune mechanisms in dermatomyositis. Ann Neurol 2005;57(5):664-78.Abstract
Dermatomyositis has been modeled as an autoimmune disease largely mediated by the adaptive immune system, including a local humorally mediated response with B and T helper cell muscle infiltration, antibody and complement-mediated injury of capillaries, and perifascicular atrophy of muscle fibers caused by ischemia. To further understand the pathophysiology of dermatomyositis, we used microarrays, computational methods, immunohistochemistry and electron microscopy to study muscle specimens from 67 patients, 54 with inflammatory myopathies, 14 with dermatomyositis. In dermatomyositis, genes induced by interferon-alpha/beta were highly overexpressed, and immunohistochemistry for the interferon-alpha/beta inducible protein MxA showed dense staining of perifascicular, and, sometimes all myofibers in 8/14 patients and on capillaries in 13/14 patients. Of 36 patients with other inflammatory myopathies, 1 patient had faint MxA staining of myofibers and 3 of capillaries. Plasmacytoid dendritic cells, potent CD4+ cellular sources of interferon-alpha, are present in substantial numbers in dermatomyositis and may account for most of the cells previously identified as T helper cells. In addition to an adaptive immune response, an innate immune response characterized by plasmacytoid dendritic cell infiltration and interferon-alpha/beta inducible gene and protein expression may be an important part of the pathogenesis of dermatomyositis, as it appears to be in systemic lupus erythematosus.
Park PJ. Gene Expression Data and Survival Analysis. In: Shoemaker JS, Lin SM Methods of Microarray Data Analysis IV. New York City, New York: Springer US; 2005
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci U S A 2005;102(38):13544-9.Abstract

Accurate and rapid identification of perturbed pathways through the analysis of genome-wide expression profiles facilitates the generation of biological hypotheses. We propose a statistical framework for determining whether a specified group of genes for a pathway has a coordinated association with a phenotype of interest. Several issues on proper hypothesis-testing procedures are clarified. In particular, it is shown that the differences in the correlation structure of each set of genes can lead to a biased comparison among gene sets unless a normalization procedure is applied. We propose statistical tests for two important but different aspects of association for each group of genes. This approach has more statistical power than currently available methods and can result in the discovery of statistically significant pathways that are not detected by other methods. This method is applied to data sets involving diabetes, inflammatory myopathies, and Alzheimer's disease, using gene sets we compiled from various public databases. In the case of inflammatory myopathies, we have correctly identified the known cytotoxic T lymphocyte-mediated autoimmunity in inclusion body myositis. Furthermore, we predicted the presence of dendritic cells in inclusion body myositis and of an IFN-alpha/beta response in dermatomyositis, neither of which was previously described. These predictions have been subsequently corroborated by immunohistochemistry.

Hamada FN, Park PJ, Gordadze PR, Kuroda MI. Global regulation of X chromosomal genes by the MSL complex in Drosophila melanogaster. Genes Dev 2005;19(19):2289-94.Abstract

A long-standing model postulates that X-chromosome dosage compensation in Drosophila occurs by twofold up-regulation of the single male X, but previous data cannot exclude an alternative model, in which male autosomes are down-regulated to balance gene expression. To distinguish between the two models, we used RNA interference to deplete Male-Specific Lethal (MSL) complexes from male-like tissue culture cells. We found that expression of many genes from the X chromosome decreased, while expression from the autosomes was largely unchanged. We conclude that the primary role of the MSL complex is to up-regulate the male X chromosome.

Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 2005;21(19):3763-70.Abstract

MOTIVATION: Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear. RESULTS: We compare 11 different algorithms for analyzing array CGH data. These include both segment detection methods and smoothing methods, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms. We compute the Receiver Operating Characteristic (ROC) curves using simulated data to quantify sensitivity and specificity for various levels of signal-to-noise ratio and different sizes of abnormalities. We also characterize their performance on chromosomal regions of interest in a real dataset obtained from patients with Glioblastoma Multiforme. While comparisons of this type are difficult due to possibly sub-optimal choice of parameters in the methods, they nevertheless reveal general characteristics that are helpful to the biological investigator.

Kong SW, Hwang K-B, Kim RD, Zhang B-T, Greenberg SA, Kohane IS, Park PJ. CrossChip: a system supporting comparative analysis of different generations of Affymetrix arrays. Bioinformatics 2005;21(9):2116-7.Abstract

SUMMARY: To increase compatibility between different generations of Affymetrix GeneChip arrays, we propose a method of filtering probes based on their sequences. Our method is implemented as a web-based service for downloading necessary materials for converting the raw data files (*.CEL) for comparative analysis. The user can specify the appropriate level of filtering by setting the criteria for the minimum overlap length between probe sequences and the minimum number of usable probe pairs per probe set. Our website supports a within-species comparison for human and mouse GeneChip arrays. AVAILABILITY: http://www.crosschip.org

2004
Park PJ, Hou TY. Multiscale Numerical Methods for Singularly Perturbed Convection-Diffusion Equations. International Journal of Computational Methods 2004;1(1):17-65.Abstract

We present an efficient and robust approach in the finite element framework for numerical solutions that exhibit multiscale behavior, with applications to singularly perturbed convection-diffusion problems. The first type of equation we study is the convection-dominated convection-diffusion equation, with periodic or random coefficients; the second type of equation is an elliptic equation with singularities due to discontinuous coefficients and non-smooth boundaries. In both cases, standard methods for purely hyperbolic or elliptic problems perform poorly due to sharp boundary and internal layers in the solution.

We propose a framework in which the finite element basis functions are designed to capture the local small-scale behavior correctly. When the structure of the layers can be determined locally, we apply the multiscale finite element method, in which we solve the corresponding homogeneous equation on each element to capture the small scale features of the differential operator. We demonstrate the effectiveness of this method by computing the enhanced diffusivity scaling for a passive scalar in the cellular flow. We also carry out the asymptotic error analysis for its convergence rate and perform numerical experiments for verification. For a random flow with nonlocal layer structure, we use a variational principle to gain additional information in our attempt to design asymptotic basis functions. We also apply the same framework for elliptic equations with discontinuous coefficients or non-smooth boundaries. In that case, we construct local basis function near singularities using infinite element method in order to resolve extreme singularity. Numerical results on problems with various singularities confirm the efficiency and accuracy of this approach.

Pages