Research

The overall aim of our computational genomics laboratory is to understand genetic and epigenetic mechanisms related to disease processes using high-throughput sequencing data

Mutational processes in cancer and normal cells - How do mutations arise in the human genome?  A large amount of data generated from high-throughput sequencing technology allow us to identify both germline and somatic mutations in the DNA. We have developed novel algorithms for accurate identification of sequence variations (single nucleotide variants and indels) and structural variations (e.g., copy number changes, complex rearrangements, transposable element insertions, microsatellite instability, viral insertions) in the genome. Based on the patterns of these mutations, we aim to infer the mechanisms that operate in normal cells, as well as in the steps that lead to tumorigenesis.

3D structure of the genome - How do chromatin organization and higher-order structure of the genome impact gene regulation? We have long been interested in chromatin structure and function. For instance, we have been part of the ENCODE (Encyclopedia of DNA Elements) project to profile a large number of histone modifications and chromatin-associated proteins to understand the relationship between chromatin structure and gene function, especially for the model organism Drosophila melanogaster. We have led the cross-species (human-fly-worm) chromatin analysis for the consortium (the supplementary website for the final paper is here). More recently, we have been interested in the utilizing Hi-C and related assays to map the three dimensional contacts in the nucleus.  These types of epigenetic information help us in better understanding how genes are activated/suppressed and in assessing the consequences of non-coding variants.

Methods development - How do we derive insights from a large amount of complex genomic data?  We develop innovative computational approaches to analyze data generated using the latest genomics technologies (some datasets are hundreds of terabytes in size). We are particularly interested in analysis of whole-genome sequencing datasets including those from single cells. Our algorithms and software packages are used by numerous research groups and companies around the world.

Representative publications (adapted from NIH biosketch):

Characterization of structural variation in cancer genomes  We have led development and application of computational approaches for identification of structural alterations based on exome and whole-genome sequencing data. Our work has revealed the role of retrotransposition events, complex structural alterations, microsatellite instability, and large-scale chromosomal amplifications/deletions in tumorigenesis. We have also made significant contributions to The Cancer Genome Atlas (TCGA).

  1. Lee JJK, Jung YL, Cheong TC, Valle-Inclan JE, Chu C, Gulhan DC, Ljungstrom V, Jin H, Viswanadham VV, Watson EV, Cortes-Ciriano I, Elledge SJ, Chiarle R, Pellman D, Park PJ (2023) ERɑ-associated translocations underlie oncogene amplifications in breast cancer, Nature, 618:1024-1032.
  2. Cortés-Ciriano I, Gulhan DC, Lee JJ, Melloni GEM, Park PJ. (2022) Computational analysis of cancer genome sequencing data. Nat Rev Genet, 23:298-314
  3. Yang L, Luquette LJ, Gehlenborg N, Xi R, Haseley PS, Hsieh C, Zhang C, Ren X, Protopopov A, Chin L, Kucherlapati R, Lee C, Park PJ. (2013) Diverse mechanisms of somatic structural variations in human cancer genomes, Cell, 153:919-29. 
  4. Kim TM, Laird PW, Park PJ. (2013) The landscape of microsatellite instability in colorectal and endometrial cancer genomes, Cell, 155:858-68. 
  5. Davoli T*, Xu AW*, Mengwasser KE, Sack LM, Yoon JC, Park PJ, Elledge SJ. (2013) Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome, Cell, 155:948-962. 
  6. Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV**, Park PJ**, and The Cancer Genome Atlas Research Network (2012) Landscape of somatic retrotransposition in human cancers, Science, 337:967-71.  
  7. The Cancer Genome Atlas Research Network (2008-2015): we performed structural variation analysis for about a dozen consortium marker papers in Nature and Cell.

Somatic mutations in the brain  Recent papers have suggested that some neurodevelopmental and neurodegenerative diseases may be caused by somatic mutations. We have employed single cell technology to address some of these questions, demonstrating that neurons in the brain of a phenotypically normal individual contain a large number of single nucleotide mutations as well as some transposon insertions.

  1. Luquette LJ*, Miller MB*, Zhou Z*, Bohrson CL, Zhao Y, Jin H, Gulhan D, Ganz J, Bizzotto S, Kirkham S, Hochepied T, Libert C, Galor A, Kim J, Lodato MA, Garaycoechea JI, Gawad C, West J, Walsh CA**, Park PJ** (2022) Single-cell genome sequencing of human neurons reveals somatic point mutation and indel enrichment in regulatory elements, Nat Genet, 54:1564-1571
  2. Bizzotto S*, Dou Y*, Ganz J*, Doan RN, Kwon M, Bohrson CL, Kim SN, Bae T, Abyzov A, NIMH Brain Somatic Mosaicism Network; Park PJ**, Walsh CA** (2021). Landmarks of human embryonic development inscribed in somatic mutations. Science, 371:1249-1253
  3. Dou Y, Kwon M, Rodin RE, Cortes-Ciriano I, Doan R, Luquette LJ, Galor A, Bohrson CL, Walsh CA, Park PJ. (2020) MosaicForecast: accurate detection of mosaic variants in sequencing data without matched controls. Nat Biotech,38:314-319
  4. Bohrson CL, Barton AR, Lodato MA, Rodin RE, Luquette LJ, Viswanadham VV, Gulhan DC, Cortés-Ciriano I, Sherman MA, Kwon M, Coulter ME, Galor A, Walsh CA, Park PJ. (2019) Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat Genet, 51:749-754
  5. Lodato MA*, Rodin RE*, Bohrson CL*, Coulter ME*, Barton AR*, Kwon M*, Sherman MA, Vitzthum CM, Luquette LJ, Yandava C, Yang P, Chittenden TW, Hatem NE, Ryu SC, Woodworth MB, Park PJ**, Walsh CA** (2018). Aging and neurodegeneration are associated with increased mutations in single human neurons. Science, 359:555-559
  6. Lodato MA*, Woodworth MB*, Lee S*, Evrony GD, Mehta BK, Karger A, Lee S, Chittenden TW, D'Gama AM, Cai X, Luquette LJ, Lee E, Park PJ**, Walsh CA**. (2015) Somatic mutation in single human neurons tracks developmental and transcriptional history. Science, 350:94-8.
  7. Evrony GD*, Lee E*, Mehta BK, Benjamini Y, Johnson RM, Cai X, Yang L, Haseley P, Lehmann HS, Park PJ**, Walsh CA**. (2015) Cell lineage analysis in human brain using endogenous retroelements. Neuron 85:49-59.
  8. Evrony GD*, Lee E*, Park PJ**, Walsh CA**. Resolving rates of mutation in the brain using single-neuron genomics. (2016) Elife, 5:e12966

Large-scale analysis of chromatin modification and epigenetic data  We have developed one of the first algorithms for identification of regions of histone modifications and DNA-protein interactions. We have subsequently developed methods for joint analysis of multiple histone modifications to define "chromatin states". This concept has become ani important tool in identifying functionally important regions in the human genome and genomes of model organisms. We have also analyzed other chromatin features and their impact in gene regulation.

  1. Ho JWK*, Jung YL*, Liu T*, Alver BH, Lee S, Ikegami K, Sohn KA, Minoda A, Tolstorukov MY, Appert A, Parker SCJ, Gu T, Kundaje A, Riddle NC, Bishop E, Egelhofer TA, Hu SS, Alekseyenko AA, Rechtsteiner A, Asker D, Belsky JA, Bowman SK, Chen QB, Chen RA, Day DS, Dong Y, Dose AC, Duan X, Epstein CB, Ercan S, Feingold EA, Ferrari F, Garrigues JM, Gehlenborg N, Good PJ, Haseley P, He D, Herrmann M, Hofman MM, Jeffers TE, Kharchenko PV, Kolasinska-Zwierz P, Kotwaliwale CV, Kumar N, Langley SA, Larschan EN, LatorreI, Libbrecht MW, Lin X, Park R, Pazin MJ, Pham HN, Plachetka A, Qin B, Schwartz YB, Shoresh N, Stempor P, Vielle A, Wang C, Whittle CM, Xue H, Kingston RE, Kim JH, Bernstein BE, Dernburg AF, Pirrotta V, Kuroda MI, Noble WS, Tullius TD, Kellis M, MacAlpine DM, Strome S, Elgin SCR, Liu XS, Lieb JD, Ahringer J, Karpen GH, Park PJ. (2014) Comparative analysis of metazoan chromatin organization, Nature, 512:449-52
  2. Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, Linder-Basso D, Plachetka A, Shanower G, Tolstorukov MY, Luquette LJ, Xi R, Jung YL, Park RW, Bishop EP, Canfield TP, Sandstrom R, Thurman RE, MacAlpine DM, Stamatoyannopoulos JA, Kellis M, Elgin SCR, Kuroda MI, Pirrotta V, Karpen GH**, Park PJ**. (2011) Comprehensive analysis of the chromatin landscape in Drosophila melanogaster, Nature, 471:480-5.
  3. Tolstorukov MY, Volfovsky N, Stephens RM, Park PJ (2011) Impact of chromatin structure on sequence variability in the human genome, Nature Structural & Molecular Biolology, 18:510-5. 
  4. Park PJ. (2009) ChIP-seq: advantages and challenges of a maturing technology, Nature Reviews Genetics, 10:669-80.  
  5. Kharchenko PV, Tolstorukov MY, Park PJ. (2008) Design and analysis of protein binding experiments with ChIP-sequencing, Nature Biotechnology, 6:1351-9.  

Epigenetic regulation in Drosophila dosage compensation  We have engaged in several collaborative efforts in epigenetics research. With Mitzi Kuroda in the Department of Genetics at Harvard Medical School, we have applied genome analysis to understand the mechanisms behind dosage compensation process of the Drosophila X chromosome, a model system for coordinate gene regulation on a genome-scale. Our analyses have uncovered the motif that directs spreading of the key complex in this process and identified the mechanism of transcriptional elongation for up-regulation of the X chromosome genes.

  1. Alekseyenko AA*, Peng S*, Larschan E, Gorchakov AA, Lee OK, Kharchenko PV, McGrath S, Wang CI, Mardis E, Park PJ**, Kuroda MI**. (2008) A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome, Cell, 134:599-609.
  2. Larschan E*, Bishop EP*, Kharchenko PV, Core L, Lis JT, Park PJ**, Kuroda MI** (2011) X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila, Nature, 471:115-8. 
  3. Kharchenko PV, Xi R, Park PJ. (2011) Evidence for dosage compensation between the X and autosomes in mammals, Nature Genetics, 43:1167-9.
  4. Ferrari F*, Jung YL*, Kharchenko PV, Plachetka A, Alekseyenko AA, Kuroda M**, Park PJ**. (2013) Comment on "Drosophila dosage compensation involves enhanced Pol II recruitment to male X-linked promoters", Science, 340:273. [technical comment]
  5. Ferrari F, Alekseyenko AA, Park PJ, Kuroda MI. (2014) Transcriptional control of a whole chromosome: emerging models for dosage compensation. Nature Structural & Molecular Biology 21:118-25.

Epigenetic regulation in stem cells and cancer  We have also applied our expertise in epigenetic analysis to other important problems in stem cells and cancer. These include analysis of nucleosome dynamics as well as long-range regulatory relationships using three-dimensional chromatin interactions data in ES and iPS cells. Given that recent studies in cancer have shown enrichment of point mutations in chromatin regulators, we have also explored the impact of such mutations on epigenetic landscape.

  1. Wang X*, Lee RS*, Alver BH*, Haswell JR, Wang S, Mieczkowski J, Drier Y, Gillespie SM, Archer TC, Wu JN, Tzvetkov EP, Troisi EC, Pomeroy SL, Biegel JA, Tolstorukov MY, Bernstein BE**, Park PJ**, Roberts CWM**. (2017) SMARCB1-mediated SWI/SNF complex function is essential for enhancer regulation. Nature Genetics 49:289-295.
  2. De Los Angeles A*, Ferrari F*, Fujiwara Y, Mathieu R, Lee S, Lee S, Tu HC, Ross S, Chou S, Nguyen M, Wu Z, Theunissen TW, Powell BE, Imsoonthornruksa S, Chen J, Borkent M, Krupalnik V, Lujan E, Wernig M, Hanna JH, Hochedlinger K, Pei D, Jaenisch R, Deng H, Orkin SH, Park PJ**, Daley GQ**. (2015) Failure to replicate the STAP cell phenomenon, Nature, 525:E6-9. 
  3. Choi J*, Lee S*, Mallard W, Clement K, Tagliazucchi GM, Lim H, Choi IY, Ferrari F, Tsankov AM, Pop R, Lee G, Rinn JL, Meissner A, Park PJ**, Hochedlinger K**. (2015) A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs, Nature Biotechnology, 33:1173-81.
  4. Apostolou E, Ferrari F, Walsh RM, Bar-Nur O, Stadtfeld M, Cheloufi S, Stuart HT, Polo JM, Ohsumi TK, Borowsky ML, Kharchenko PV, Park PJ**, Hochedlinger K**. (2013) Genome-wide Chromatin Interactions of the Nanog Locus in Pluripotency, Differentiation, and Reprogramming, Cell Stem Cell, 12:699-712.
  5. Tolstorukov MY*, Sansam CG*, Lu P, Koellhoffer EC, Helming KC, Alvera BH, Tillman EJ, Evans JA, Wilson BG, Park PJ**, Roberts CWM**. (2013) The Swi/Snf tumor suppressor complex establishes nucleosome occupancy at target promoters, Proc Natl Acad Sci USA, 110:10165-70.  

Development of bioinformatics methods  These software packages are being used by a multitude of researchers around the world, and we are continuing to improve them. These include a statistically rigorous algorithm for identifying activated pathways in expression data, various methods for detecting genomic alterations in cancer data, and a system for visualization of complex genomic datasets.

  1. Streit M*, Lex A*, Gratzl S, Partl C, Schmalstieg D, Pfister H, Park PJ**, Gehlenborg N**. (2014) Guided visual exploration of genomic stratifications in cancer, Nature Methods, 11:884-5. 
  2. Xi R, Hadjipanayis AG, Luquette LJ, Kim TM, Lee E, Zhang J, Johnson MD, Muzny DM, Wheeler DA, Gibbs RA, Kucherlapati R, Park PJ. (2011) Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion, Proc Natl Acad Sci USA, 108:E1128-36.
  3. Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. (2005) Discovering statistically significant pathways in expression profiling studies, Proc Natl Acad Sci USA, 102:13544-9.
  4. Lai W, Johnson MJ, Park PJ. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data, Bioinformatics 21:3763-70.
  5. Kim R, Park PJ, (2004) Improving identification of differentially expressed genes using public databases. Genome Biology 5:R70.