Quality control

NGSCheckMate (
Software for validating sample identity in next-generation sequencing studies within and across data types. It works for a variety of data types, including whole-genomes, whole-exomes, RNA-seq, ChIP-seq, and targeted panel.

Whole-genome analysis

Meerkat (
This program identifies structural variations from whole-genome sequencing data using patterns of discordant read clusters

Tea (Transposable Element Analyzer) (
This is a tool for detection of retrotransposition events in whole-genome sequencing data.

BIC-seq (Bayesian Information Criteria-seq)  (
This program (R code) is for identification of copy number variants in whole-genome sequencing data. The original paper was published in 2011 (Xi et al, PNAS, 2011). An updated version, BIC-seq2, was published in 2016 (Xi et al, Nucleic Acids Research, 2016). The main difference in BIC-seq2 is to not require a paired control, i.e., it can be used both for somatic and germline CNV calls. 

Gene expression

This program quantifies gene and mRNA isoform expression levels from RNA-seq data. 


Antibody Validation Database (
This site contains antibody validation data from ENCODE, modENCODE, and Epigenome Roadmap projects. See Egelhofer et al (2011) for details

ChIP-seq analysis (SPP) (
This R package by Peter Kharchenko implements tools for analysis of sequencing data from chromatin immunoprecipitation experiments. It includes normalization of the binding profile, detection of enriched regions, and an estimate of read depth needed to achieve saturation of binding sites. See Kharchenko et al, Nature Biotechnology (2008) for details.

Repeat enrichment estimator (
This tool aims to measure the enrichment of annotated repeat types in ChIP-seq data

Quantized correlation coefficient (QCC) (
This R package computes a robust measurement of the reproducibility of ChIP-chip data.

ChIP-chip normalization
The R package is available here (v1.0.1). Here is the instruction sheet (part of the package).

Data Visualization

Nozzle (
A report generation toolkit for data analysis pipelines.

Also available at CRAN: The source code is available at

StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype (
This is a visual exploration tool for identification and characterization of clusters and correlations in genomics data

modENCODE chromatin data browser (
This website allows one to explore the enrichment profiles of histone marks and chromosomal proteins in the Drosophila genome.


CGHweb (
This tool provides an interface to apply several popular algorithms to segment a copy-number profile from CGH (comparative genomic hybridization) data. It generates a heatmap panel of the segmented profiles for each method as well as a consensus profile. The clickable heatmap can be moved along the chromosome and zoomed in or out. It also displays the time that each algorithm took and provides numerical values of the segmented profiles for download.

nuScore (
This allows estimation of the affinity of the histone core to DNA and prediction of nucleosome arrangement on a given sequence. The algorithm is based on assessment of the energy cost of imposing the deformations required to wrap DNA around the histone surface

This tool is for integrating data from multiple generations of Affymetrix GeneChips. Matching probes based on Affymetrix "best-match" is inadequate for most analyses. This tool allows the user to derive a list of similar probes based on the user-specified criteria on probe sequence similarity and the minimum number of probe pairs needed for each probe set.

sigPathway (for finding significant pathways from microarray data)
For the R package, please see the Supplementary Material page for the Tian et al article (Proc Natl Acad Sci, 2005).