Resources and challenges for integrative analysis of nuclear architecture data

doi:10.1016/j.gde.2020.12.009

Current Opinion in Genetics & Development

Volume 67, April 2021, Pages 103-110

https://doi.org/10.1016/j.gde.2020.12.009 Get rights and content

A large amount of genomic data for profiling three-dimensional genome architecture have accumulated from large-scale consortium projects as well as from individual laboratories. In this review, we summarize recent landmark datasets and collections in the field. We describe the challenges in collection, annotation, and analysis of these data, particularly for integration of sequencing and microscopy data. We introduce efforts from consortia and independent groups to harmonize diverse datasets. As the resolution and throughput of sequencing and imaging technologies continue to increase, more efficient utilization and integration of collected data will be critical for a better understanding of nuclear architecture.

Introduction

The rapid pace of technology development in genome and epigenome profiling has led to major advances in our understanding of genome architecture and function. The initial techniques for measuring three-dimensional interactions among genomic loci based on chromosome conformation capture [1, 2, 3] have matured in terms of protocol optimization and have led to the development of numerous related techniques, for example, enriching for interactions with a protein of interest [4,5]. Aided by decreasing sequencing cost, researchers can now produce high-quality data that allow for more sensitive detection of long-range interactions.

In addition to published data from individual laboratories, the US National Institutes of Health (NIH) as well as other governments’ agencies have launched consortium efforts to systematically profile epigenomes across many cell lines and tissue types, generating a large amount of data including 3D interaction data. These data provide an opportunity for researchers to engage in integrative analysis that combines their DNA, RNA, and/or local epigenetic data with publicly available 3D interactions data.

In this review, we will first summarize the resources currently available for those interested in 3D data analysis. Then, we will describe several challenges in collection, curation, and integration of data, as well as steps that can be taken to maximize the value of the data resources for the scientific community. We will focus on nuclear architecture data, but the issues and approaches are also relevant for other data types.

Section snippets

Landmark nuclear architecture datasets

Here, we highlight several datasets that represent key advances in terms of data quality and resolution. For chromosome conformation capture assays, advances in experimental protocols improved the spatial resolution of long-range interactions. The first Hi-C maps with more than a billion reads, using in situ Hi-C, was in 2014, providing resolution reaching 1 kb and identifying ∼10 k loops anchored by CTCF [6]. A subsequent dataset with a similar resolution was in mouse, resolving dynamic

Databases for nuclear architecture and epigenomics data

The largest coordinated initiative focusing on 3D genome architecture is the 4D Nucleome Network (the authors are associated with the Data Coordination and Integration Center of this project) [19^••]. This initiative aims to understand the principles underlying nuclear organization in space and time (hence the ‘4D’), the role of nuclear organization in gene expression and cellular function, and the impact of changing nuclear organization in various diseases. 4D Nucleome in Phase I (2015–2020)

Data visualization tools

Exploratory analysis of Hi-C or other 3D interaction data typically begins with visual inspection of the interaction matrix, which shows the estimated frequency of interactions between every pair of loci. These datasets are large in size: the minimum number of reads required for a Hi-C experiment in the 4D Nucleome consortium is 600 million (a standard RNA-seq may contain on the order of 10–40 million reads). Thus, a tool that allows visualization of the interaction maps quickly without having

Challenges and best practices in the analysis of chromatin interaction data

To ensure the validity of a study based on chromatin interaction data, evaluation of data quality and reproducibility is essential. In addition to the common statistics on read alignments, several additional measures specific to 3D data are often informative, such as the fraction of valid pairs, the ratio between intra-chromosomal and inter-chromosomal contacts, and the fraction of short-range compared to long-range interactions [38]. Many Hi-C analysis pipelines, such as HiC-Pro [39], generate

Opportunities and challenges for data reuse

The key datasets highlighted above and the hundreds of other published datasets present many opportunities for deriving new insights without the need to perform expensive experiments. For instance, a cancer biologist may have found a recurrent non-coding mutation in colorectal cancers that, based on the histone mark H3K27ac or H3K4me1, appears to be in an enhancer region. To identify which genes may be regulated by the enhancer, she could generate her own data. Alternatively, she could first

Importance of metadata collection for reproducible science

To take full advantage of the existing data, proper metadata (‘data about data’) must be available at the repositories. Lack of proper metadata is one of the main factors that hinder reproducibility of published results. To increase scientific rigor and transparency, NIH has implemented policies that emphasize the ‘FAIR’ principle: findability, accessibility, interoperability, and reusability [48^••]. The idea behind this principle is to encourage data producers and publishers to provide

Collecting imaging data

Collection, curation, and re-analysis of microscopy data present additional challenges to the ones we have outlined above for genomic assays. Whereas sequencing experiments have common data formats (e.g. FASTQ) and common coordinate systems (genome builds), microscopy experiments are diverse in many aspects, including image resolutions, biological sample preparation methods, imaging modalities, and data formats. Imaging experiments are sometimes performed with extensive protocol variations even

Conclusion

In recent years, we have seen major advances in our understanding of nuclear architecture, aided by the increase in the resolution and throughput with which we can probe chromatin organization. An important byproduct of these advances are the high-quality datasets that have been generated. We have highlighted some datasets that provide the highest resolutions of genomic interactions to date. We have described how data portals such as those by 4DN and ENCODE increase the utility of datasets with

Conflict of interest statement

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgement

This work was supported by the National Institutes of Health (U01CA200059).

References (50)

S.S.P. Rao et al.
A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping
Cell
(2014)
B. Bonev et al.
Multiscale 3D genome rewiring during mouse neural development
Cell
(2017)
T.H.S. Hsieh et al.
Resolving the 3D landscape of transcription-linked mammalian chromatin folding
Mol Cell
(2020)
T. Nagano et al.
Cell-cycle dynamics of chromosomal organization at single-cell resolution
Nature
(2017)
H.D. Ou et al.
ChromEMT: visualizing 3D chromatin structure and compaction in interphase and mitotic cells
Science (80-)
(2017)
D. Yang et al.
3DIV: a 3D-genome interaction viewer and database
Nucleic Acids Res
(2018)
Y. Wang et al.
The 3D genome browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions
Genome Biol
(2018)
E. Yaffe et al.
Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture
Nat Genet
(2011)
B.R. Lajoie et al.
The hitchhiker’s guide to Hi-C analysis: practical guidelines
Methods
(2015)
C.A. Horton et al.
GiniQC: a measure for quantifying noise in single-cell Hi-C data
Bioinformatics
(2020)

J. Dekker et al.

Capturing chromosome conformation

Science (80-)

(2002)

E. Lieberman-aiden et al.

Comprehensive mapping of long-range interactions reveals folding principles of the human genome

Science (80-)

(2009)

T. Nagano et al.

Single-cell Hi-C reveals cell-to-cell variability in chromosome structure

Nature

(2013)

M.J. Fullwood et al.

ChIP-based methods for the identification of long-range chromatin interactions

J Cell Biochem

(2009)

M.R. Mumbach et al.

HiChIP: efficient and sensitive analysis of protein-directed genome architecture

Nat Methods

(2016)

Y. Zhang et al.

Transcriptionally active HERV-H retrotransposons demarcate topologically associating domains in human pluripotent stem cells

Nat Genet

(2019)

N. Krietenstein et al.

Ultrastructural details of mammalian chromosome architecture

Mol Cell

(2020)

I.M. Flyamer et al.

Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition

Nature

(2017)

L. Tan et al.

Three-dimensional genome structures of single diploid human cells

Science (80-)

(2018)

E.H. Finn et al.

Extensive heterogeneity and intrinsic variation in spatial genome organization

Cell

(2019)

G. Nir et al.

Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling

PLoS Genet

(2018)

S. Wang et al.

Spatial organization of chromatin domains and compartments in single chromosomes

Science

(2016)

H.Q. Nguyen et al.

3D mapping and accelerated super-resolution imaging of the human genome using in situ sequencing

Nat Methods

(2020)

J. Dekker et al.

The 4D nucleome project

Nature

(2017)

F. Abascal et al.

Expanded encyclopaedias of DNA elements in the human and mouse genomes

Nature

(2020)

Cited by (1)

Construction of Sports and Health Data Resources and Transformation of Teachers' Orientation Based on Web Database
2022, Journal of Healthcare Engineering

View full text

Resources and challenges for integrative analysis of nuclear architecture data

Introduction

Section snippets

Landmark nuclear architecture datasets

Databases for nuclear architecture and epigenomics data

Data visualization tools

Challenges and best practices in the analysis of chromatin interaction data

Opportunities and challenges for data reuse

Importance of metadata collection for reproducible science

Collecting imaging data

Conclusion

Conflict of interest statement

References and recommended reading

Acknowledgement

Cell

Cell

Mol Cell

Nature

Science (80-)

Nucleic Acids Res

Genome Biol

Nat Genet

Methods

Bioinformatics

Capturing chromosome conformation

Science (80-)

Comprehensive mapping of long-range interactions reveals folding principles of the human genome

Science (80-)

Single-cell Hi-C reveals cell-to-cell variability in chromosome structure

Nature

ChIP-based methods for the identification of long-range chromatin interactions

J Cell Biochem

HiChIP: efficient and sensitive analysis of protein-directed genome architecture

Nat Methods

Transcriptionally active HERV-H retrotransposons demarcate topologically associating domains in human pluripotent stem cells

Nat Genet

Ultrastructural details of mammalian chromosome architecture

Mol Cell

Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition

Nature

Three-dimensional genome structures of single diploid human cells

Science (80-)

Extensive heterogeneity and intrinsic variation in spatial genome organization

Cell

Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling

PLoS Genet

Spatial organization of chromatin domains and compartments in single chromosomes

Science

3D mapping and accelerated super-resolution imaging of the human genome using in situ sequencing

Nat Methods

The 4D nucleome project

Nature

Expanded encyclopaedias of DNA elements in the human and mouse genomes

Nature