Kulldorff M, Tango M, Park PJ.
Power Comparisons for Disease Clustering Tests. Computational Statistics and Data Analysis 2003;42(4):665-684.
AbstractMany different methods have been proposed to test for geographical disease clustering, and more generally, for spatial clustering of any type of observations while adjusting for an inhomogeneous background population generating the observations. Despite the many proposed test statistics, there has been few formal comparisons conducted. We present a collection of 1,220,000 simulated benchmark data sets generated under 51 different cluster models and the null hypothesis, to be used for power evaluations. We then use these data sets to compare the power of the spatial scan statistic, the maximized excess events test and the nonparametric M statistic. All have good power, the first having an advantage for localized hot-spot type clusters and the second for global clustering where randomly located cases generate other cases close by. By making the simulated data sets publicly available, new tests can easily be compared with previously evaluated tests by analyzing the same benchmark data.
Park PJ, Kahane IS, Kim JH.
Rank-Based Nonlinear Normalization of Oligonucleotide Arrays. Genomics & Informatics 2003;1(2):94-100.
AbstractMOTIVATION: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. RESULTS: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of nondifferentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.
Kuo WP, Mendez E, Chen C, Whipple ME, Farell G, Agoff N, Park PJ.
Functional relationships between gene pairs in oral squamous cell carcinoma. AMIA Annu Symp Proc 2003;:371-5.
Abstract
We developed a novel method for the discovery of functional relationships between pairs of genes based on gene expression profiles generated from microarrays. This approach examines all possible pairs of genes and identifies those in which the relationship between the two genes changes in different diseases or conditions. In contrast to previous methods that have focused on differentially expressed genes, this method attempts to find changes in the correlation between genes. These changes may be indicative of the functional relationships related to a disease mechanism. We demonstrate the utility of this approach by applying it to an oral squamous cell carcinoma (OSCC) microarray data set. Our results suggest new directions for future experimental investigations.
pdf