Publications

2003
Kuo WP, Mendez E, Chen C, Whipple ME, Farell G, Agoff N, Park PJ. Functional relationships between gene pairs in oral squamous cell carcinoma. AMIA Annu Symp Proc 2003;:371-5.Abstract

We developed a novel method for the discovery of functional relationships between pairs of genes based on gene expression profiles generated from microarrays. This approach examines all possible pairs of genes and identifies those in which the relationship between the two genes changes in different diseases or conditions. In contrast to previous methods that have focused on differentially expressed genes, this method attempts to find changes in the correlation between genes. These changes may be indicative of the functional relationships related to a disease mechanism. We demonstrate the utility of this approach by applying it to an oral squamous cell carcinoma (OSCC) microarray data set. Our results suggest new directions for future experimental investigations.

pdf
Koopman LA, Kopcow HD, Rybalov B, Boyson JE, Orange JS, Schatz F, Masch R, Lockwood CJ, Schachter AD, Park PJ, Strominger JL. Human decidual natural killer cells are a unique NK cell subset with immunomodulatory potential. J Exp Med 2003;198(8):1201-12.Abstract

Natural killer cells constitute 50-90% of lymphocytes in human uterine decidua in early pregnancy. Here, CD56(bright) uterine decidual NK (dNK) cells were compared with the CD56(bright) and CD56(dim) peripheral NK cell subsets by microarray analysis, with verification of results by flow cytometry and RT-PCR. Among the approximately 10,000 genes studied, 278 genes showed at least a threefold change with P < or = 0.001 when comparing the dNK and peripheral NK cell subsets, most displaying increased expression in dNK cells. The largest number of these encoded surface proteins, including the unusual lectinlike receptors NKG2E and Ly-49L, several killer cell Ig-like receptors, the integrin subunits alpha(D), alpha(X), beta1, and beta5, and multiple tetraspanins (CD9, CD151, CD53, CD63, and TSPAN-5). Additionally, two secreted proteins, galectin-1 and progestagen-associated protein 14, known to have immunomodulatory functions, were selectively expressed in dNK cells.

pdf
Kuo WP, Whipple ME, Jenssen T-K, Todd R, Epstein JB, Ohno-Machado L, Sonis ST, Park PJ. Microarrays and clinical dentistry. J Am Dent Assoc 2003;134(4):456-62.Abstract

BACKGROUND: The Human Genome Project, or HGP, has inspired a great deal of exciting biology recently by enabling the development of new technologies that will be essential for understanding the different types of abnormalities in diseases related to the oral cavity. LITERATURE REVIEWED: The authors review current literature pertaining to the advanced microarray technologies arising from the HGP and how they can contribute to dentistry. This technology has become a standard tool for monitoring activities of genes at both academic and pharmaceutical research institutions. RESULTS: With the availability of the DNA sequences for the entire human genome, attention now is focused on understanding various diseases at the genome level. Deciphering the molecular behavior of genetically encoded proteins is crucial to obtaining a more comprehensive picture of disease processes. Important progress has been made using microarrays, which have been shown to be effective in identifying gene expression patterns and variations that correlate with cellular development, physiology and function. Arrays can be used to classify tissue samples accurately based on molecular profiles and to select candidate genes related to a number of cancers, including oral cancer. This type of oral genetic approach will aid in the understanding of disease progression, thus improving diagnosis and treatment for patients. CLINICAL IMPLICATIONS: Microarrays hold much promise for the analysis of diseases in the oral cavity. As the technology evolves, dentists may see these tools as screening tests for better managing patients' dental care.

pdf
Blackshaw S, Kuo WP, Park PJ, Tsujikawa M, Gunnersen JM, Scott HS, Boon W-M, Tan S-S, Cepko CL. MicroSAGE is highly representative and reproducible but reveals major differences in gene expression among samples obtained from similar tissues. Genome Biol 2003;4(3):R17.Abstract

BACKGROUND: Serial analysis of gene expression using small amounts of starting material (microSAGE) has not yet been conclusively shown to be representative, reproducible or accurate. RESULTS: We show that microSAGE is highly representative, reproducible and accurate, but that pronounced differences in gene expression are seen between tissue samples taken from different individuals. CONCLUSIONS: MicroSAGE is a reliable method of comprehensively profiling differences in gene expression among samples, but care should be taken in generalizing results obtained from libraries constructed from tissue obtained from different individuals and/or processed or stored differently.

pdf
2002
Park PJ, Butte AJ, Kohane IS. Comparing expression profiles of genes with similar promoter regions. Bioinformatics 2002;18(12):1576-84.Abstract

MOTIVATION: Gene regulatory elements are often predicted by seeking common sequences in the promoter regions of genes that are clustered together based on their expression profiles. We consider the problem in the opposite direction: we seek to find the genes that have similar promoter regions and determine the extent to which these genes have similar expression profiles. RESULTS: We use the data sets from experiments on Saccharomyces cerevisiae. Our similarity measure for the promoter regions is based on the set of common mapped or putative transcription factor binding sites and other regulatory elements in the upstream region of the genes, as contained in the Saccharomyces cerevisiae Promoter Database. We pair up the genes with high similarity scores and compare their expression levels in time-course experiment data. We find that genes with similar promoter regions on the average have significantly higher correlation, but it can vary widely depending on the genes. This confirms that the presence of similar regulatory elements often does not correspond to similarity in expression profiles and indicates that finding transcription factor binding sites or other regulatory elements starting with the expression patterns may be limited in many cases. Regardless of the correlation, the degree to which the profiles agree under different experimental conditions can be examined to derive hypotheses concerning the role of common regulatory elements. Overall, we find that considering the relationship between the promoter regions and the expression profiles starting with the regulatory elements is a difficult but useful process that can provide valuable insights.

pdf
Kuo WP, Jenssen T-K, Park PJ, Lingen MW, Hasina R, Ohno-Machado L. Gene expression levels in different stages of progression in oral squamous cell carcinoma. Proc AMIA Symp 2002;:415-9.Abstract

Oral squamous cell carcinoma (OSCC) is one of the most common cancer types worldwide. The prognosis for patients with this disease is generally poor and little is known about its progression. Gene expression studies may provide important insights to the molecular mechanisms of this disease. We analyzed gene expression data from a small panel of patients diagnosed with OSCC. Even with only 13 patient samples we were able to find genes with significant differences in expression levels between normal, dysplasia, and cancer samples. The largest differences in expression were generally found between normal and cancer samples, but significant differences were also found for several genes between dysplasia and the other two sample types. We also represent the significance levels of differentially expressed genes on the chromosome domain. The genes and genetic features we examine are potentially important factors on the molecular level in the progression of OSCC.

pdf
Park PJ, Tian L, Kohane IS. Linking gene expression data with patient survival times using partial least squares. Bioinformatics 2002;18 Suppl 1:S120-7.Abstract

There is an increasing need to link the large amount of genotypic data, gathered using microarrays for example, with various phenotypic data from patients. The classification problem in which gene expression data serve as predictors and a class label phenotype as the binary outcome variable has been examined extensively, but there has been less emphasis in dealing with other types of phenotypic data. In particular, patient survival times with censoring are often not used directly as a response variable due to the complications that arise from censoring. We show that the issues involving censored data can be circumvented by reformulating the problem as a standard Poisson regression problem. The procedure for solving the transformed problem is a combination of two approaches: partial least squares, a regression technique that is especially effective when there is severe collinearity due to a large number of predictors, and generalized linear regression, which extends standard linear regression to deal with various types of response variables. The linear combinations of the original variables identified by the method are highly correlated with the patient survival times and at the same time account for the variability in the covariates. The algorithm is fast, as it does not involve any matrix decompositions in the iterations. We apply our method to data sets from lung carcinoma and diffuse large B-cell lymphoma studies to verify its effectiveness.

Kuruvilla FG, Park PJ, Schreiber SL. Vector algebra in the analysis of genome-wide expression data. Genome Biol 2002;3(3):RESEARCH0011.Abstract

BACKGROUND: Data from thousands of transcription-profiling experiments in organisms ranging from yeast to humans are now publicly available. How best to analyze these data remains an important challenge. A variety of tools have been used for this purpose, including hierarchical clustering, self-organizing maps and principal components analysis. In particular, concepts from vector algebra have proven useful in the study of genome-wide expression data. RESULTS: Here we present a framework based on vector algebra for the analysis of transcription profiles that is geometrically intuitive and computationally efficient. Concepts in vector algebra such as angles, magnitudes, subspaces, singular value decomposition, bases and projections have natural and powerful interpretations in the analysis of microarray data. Angles in particular offer a rigorous method of defining 'similarity' and are useful in evaluating the claims of a microarray-based study. We present a sample analysis of cells treated with rapamycin, an immunosuppressant whose effects have been extensively studied with microarrays. In addition, the algebraic concept of a basis for a space affords the opportunity to simplify data analysis and uncover a limited number of expression vectors to span the transcriptional range of cell behavior. CONCLUSIONS: This framework represents a compact, powerful and scalable construction for analysis and computation. As the amount of microarray data in the public domain grows, these vector-based methods are relevant in determining statistical significance. These approaches are also well suited to extract biologically meaningful information in the analysis of signaling networks.

pdf
2001
A nonparametric scoring algorithm for identifying informative genes from microarray data.
Park PJ, Pagano M, Bonetti M. A nonparametric scoring algorithm for identifying informative genes from microarray data. Pac Symp Biocomput 2001;6:52-63.Abstract

Microarray data routinely contain gene expression levels of thousands of genes. In the context of medical diagnostics, an important problem is to find the genes that are correlated with given phenotypes. These genes may reveal insights to biological processes and may be used to predict the phenotypes of new samples. In most cases, while the gene expression levels are available for a large number of genes, only a small fraction of these genes may be informative in classification with statistical significance. We introduce a nonparametric scoring algorithm that assigns a score to each gene based on samples with known classes. Based on these scores, we can find a small set of genes which are informative of their class, and subsequent analysis can be carried out with this set. This procedure is robust to outliers and different normalization schemes, and immediately reduces the size of the data with little loss of information. We study the properties of this algorithm and apply it to the data set from cancer patients. We quantify the information in a given set of genes by comparing its distribution of the score statistics to a set of distributions generated by permutations that preserve the correlation structure among the genes.

pdf

Pages