Please report all software questions/issues here


Note: Our software packages are free for academic use. For commercial use, please contact the Harvard Office of Technology Development ( with cc to Dr. Park. Patents have been filed for some algorithms.


SigMA (
For detection of homologous recombination deficiency (Signature 3) from panel, exome, or whole-genome data. See Gulhan et al, Nature Genetics, 2019.

MosaicForecast (
For identification of somatic mutation from bulk whole-genome sequencing data. See Dou et al, in press.

LiRA (
For identification of somatic mutation from single cell DNA whole-genome sequencing data using phasing information (calls are made for phasable regions). See Bohrson et al, Nature Genetics, 2019.

Single cell genotyper for whole-genome sequencing data (calls are made for the whole genome). See Luquette et al, Nature Communication, 2019.

ShatterSeek (
For identification of chromothripsis events from whole-genome sequencing data. See Ciriano-Cortes et al, Nature Genetics, in press.

Tibanna (
For running genomic pipelines on Amazon Web Services (AWS). It supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell commands. See Lee et al, Bioinformatics, 2019.

xTEA (available soon)
An improved version of our software package TEA for detecting transposable elements from whole-genome data

HiNT (
For identification of copy number and structural variants from Hi-C data. 

Whole-genome analysis

Meerkat (
This program identifies structural variations from whole-genome sequencing data using patterns of discordant read clusters

Tea (Transposable Element Analyzer) (
This is a tool for detection of retrotransposition events in whole-genome sequencing data.

BIC-seq2 (Bayesian Information Criteria-seq)  (
This program (R code) is for identification of copy number variants in whole-genome sequencing data. The original paper was published in 2011 (Xi et al, PNAS, 2011). An updated version, BIC-seq2, was published in 2016 (Xi et al, Nucleic Acids Research, 2016). The main difference in BIC-seq2 is to not require a paired control, i.e., it can be used both for somatic and germline CNV calls. 


Antibody Validation Database (
This site contains antibody validation data from ENCODE, modENCODE, and Epigenome Roadmap projects. See Egelhofer et al (2011) for details

ChIP-seq analysis (SPP) (
This R package by Peter Kharchenko implements tools for analysis of sequencing data from chromatin immunoprecipitation experiments. It includes normalization of the binding profile, detection of enriched regions, and an estimate of read depth needed to achieve saturation of binding sites. See Kharchenko et al, Nature Biotechnology (2008) for details.

Repeat enrichment estimator (
This tool aims to measure the enrichment of annotated repeat types in ChIP-seq data

Quantized correlation coefficient (QCC) (
This R package computes a robust measurement of the reproducibility of ChIP-chip data.

ChIP-chip normalization
The R package is available here (v1.0.1). Here is the instruction sheet (part of the package).

RNA-seq data

This program quantifies gene and mRNA isoform expression levels from RNA-seq data. 

Data Visualization

Nozzle (
A report generation toolkit for data analysis pipelines.

Also available at CRAN: The source code is available at

StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype (
This is a visual exploration tool for identification and characterization of clusters and correlations in genomics data

modENCODE chromatin data browser (
This website allows one to explore the enrichment profiles of histone marks and chromosomal proteins in the Drosophila genome.


NGSCheckMate (
Software for validating sample identity in next-generation sequencing studies within and across data types. It works for a variety of data types, including whole-genomes, whole-exomes, RNA-seq, ChIP-seq, and targeted panel.

CGHweb (
This tool provides an interface to apply several popular algorithms to segment a copy-number profile from CGH (comparative genomic hybridization) data. It generates a heatmap panel of the segmented profiles for each method as well as a consensus profile. The clickable heatmap can be moved along the chromosome and zoomed in or out. It also displays the time that each algorithm took and provides numerical values of the segmented profiles for download.

nuScore (
This allows estimation of the affinity of the histone core to DNA and prediction of nucleosome arrangement on a given sequence. The algorithm is based on assessment of the energy cost of imposing the deformations required to wrap DNA around the histone surface

This tool is for integrating data from multiple generations of Affymetrix GeneChips. Matching probes based on Affymetrix "best-match" is inadequate for most analyses. This tool allows the user to derive a list of similar probes based on the user-specified criteria on probe sequence similarity and the minimum number of probe pairs needed for each probe set.

sigPathway (for finding significant pathways from microarray data)
For the R package, please see the Supplementary Material page for the Tian et al article (Proc Natl Acad Sci, 2005).