Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Computational analysis of cancer genome sequencing data

Abstract

Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow for cancer genome analysis.
Fig. 2: Identifying variants and artefacts in sequencing reads.
Fig. 3: Mutational signature analysis of cancer genomes.
Fig. 4: Impact of different copy number alterations on read depth and BAF profiles.
Fig. 5: Examples of detecting somatic structural variants from patterns of paired-end reads.

Similar content being viewed by others

References

  1. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018). This study reports the analysis of nearly 10,000 exomes from TCGA, identifying ~300 cancer driver genes and finding that more than half of the samples have potentially actionable events.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020). This is the flagship paper for an international effort to analyse WGS data from 2,658 primary tumours, describing the consortium’s variant calling steps as well as reporting the landscape of somatic mutation especially for structural variation.

    Article  CAS  Google Scholar 

  3. Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).

    Article  CAS  PubMed  Google Scholar 

  4. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ma, X. et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371–376 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018).

    Article  CAS  PubMed  Google Scholar 

  7. Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017). This study describes the analysis of panel sequencing data from a prospective clinical sequencing initiative to demonstrate the clinical utility of tumour molecular profiling.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019). This paper reports the mutational landscape of >2,500 metastatic tumours, finding genetic variants that may be used to stratify patients towards therapies for >60% of the cases.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Pleasance, E. et al. Pan-cancer analysis of advanced patient tumors reveals interactions between therapy and genomic landscapes. Nat. Cancer 1, 452–468 (2020).

    Article  CAS  PubMed  Google Scholar 

  10. Koche, R. P. et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma. Nat. Genet. 52, 29–34 (2020).

    Article  CAS  PubMed  Google Scholar 

  11. Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).

    Article  CAS  PubMed  Google Scholar 

  12. Weghorn, D. & Sunyaev, S. Bayesian inference of negative and positive selection in human cancers. Nat. Genet. 49, 1785–1788 (2017).

    Article  CAS  PubMed  Google Scholar 

  13. Marty, R. et al. MHC-I genotype restricts the oncogenic mutational landscape. Cell 171, 1272–1283.e15 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017). This paper examines the selection pressures on somatic single-nucleotide mutations, finding near-complete absence of negative selection.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hu, Z. et al. Quantitative evidence for early metastatic seeding in colorectal cancer. Nat. Genet. 51, 1113–1122 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Zhang, X. & Meyerson, M. Illuminating the noncoding genome in cancer. Nat. Cancer 1, 864–872 (2020).

    Article  PubMed  Google Scholar 

  17. McGranahan, N. & Swanton, C. Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613–628 (2017).

    Article  CAS  PubMed  Google Scholar 

  18. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020). This study describes comprehensive identification and classification of SVs based on WGS data from >2,600 tumours, and reports 16 structural variation signatures and their characteristics.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Cortés-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sieverling, L. et al. Genomic footprints of activated telomere maintenance mechanisms in cancer. Nat. Commun. 11, 733 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020). This paper reports a large-scale analysis of the timing of point mutations and CNAs, and describes the common trajectories of tumour development across multiple tumour types.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Calabrese, C. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 52, 342–352 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zapatka, M. et al. The landscape of viral associations in human cancers. Nat. Genet. 52, 320–330 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Meyerson, M., Gabriel, S. & Getz, G. Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet. 11, 685–696 (2010).

    Article  CAS  PubMed  Google Scholar 

  28. Ding, L., Wendl, M. C., McMichael, J. F. & Raphael, B. J. Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet. 21, 171–189 (2020).

    Article  CAS  PubMed  Google Scholar 

  30. Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Liu, X. S. & Mardis, E. R. Applications of immunogenomics to cancer. Cell 168, 600–612 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).

    Article  CAS  PubMed  Google Scholar 

  33. Castro, L. N. G., Tirosh, I. & Suvà, M. L. Decoding cancer biology one cell at a time. Cancer Discov. 11, 960–970 (2021).

    Article  CAS  PubMed Central  Google Scholar 

  34. Lim, B., Lin, Y. & Navin, N. Advancing cancer research and medicine with single-cell genomics. Cancer Cell 37, 456–470 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Chakravarty, D. & Solit, D. B. Clinical cancer genomic profiling. Nat. Rev. Genet. 22, 483–501 (2021).

    Article  CAS  PubMed  Google Scholar 

  36. Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 1–16 (2020).

    Article  Google Scholar 

  38. Cescon, D. W., Bratman, S. V., Chan, S. M. & Siu, L. L. Circulating tumor DNA and liquid biopsy in oncology. Nat. Cancer 1, 276–290 (2020).

    Article  CAS  PubMed  Google Scholar 

  39. Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Fritz, M. H. Y., Leinonen, R., Cochrane, G. & Birney, E. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21, 734–740 (2011).

    Article  CAS  Google Scholar 

  41. Costello, M. et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 332 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Andrews, S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/fastqc (2010).

  43. Rausch, T., Hsi-Yang Fritz, M., Korbel, J. O. & Benes, V. Alfred: interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing. Bioinformatics 35, 2489–2491 (2019).

    Article  CAS  PubMed  Google Scholar 

  44. Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  48. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  PubMed  Google Scholar 

  52. Gao, G. F. et al. Before and after: comparison of legacy and harmonized TCGA Genomic Data Commons’ data. Cell Syst. 9, 24–34.e10 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).

    Article  CAS  PubMed  Google Scholar 

  55. Cortes-Ciriano, I., Lee, S., Park, W.-Y. Y., Kim, T.-M. M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015). This study examines the impact of different experimental and computational strategies in characterization of a complex tumour and provides a resource of validation data for 200,000 SNVs.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Xu, C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput. Struct. Biotechnol. J. 16, 15–24 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Article  Google Scholar 

  60. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article  CAS  PubMed  Google Scholar 

  61. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Jones, S. et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci. Transl Med. 7, 283ra53 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).

    Article  CAS  PubMed  Google Scholar 

  67. O’Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Krøigård, A. B., Thomassen, M., Lænkholm, A. V., Kruse, T. A. & Larsen, M. J. Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLoS ONE 11, e0151664 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Wang, Q. et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).

    Article  CAS  PubMed  Google Scholar 

  72. Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Callari, M. et al. Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers. Genome Med. 9, 35 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Huang, W. et al. SMuRF: portable and accurate ensemble prediction of somatic mutations. Bioinformatics 35, 3157–3159 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Wood, D. E. et al. A machine learning approach for somatic mutation discovery. Sci. Transl Med. 10, eaar7939 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 28, 167–175 (2012).

    Article  CAS  PubMed  Google Scholar 

  77. Cantarel, B. L. et al. BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity. BMC Bioinformatics 15, 104 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Fang, L. T. et al. An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol. 16, 197 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983 (2018).

    Article  CAS  PubMed  Google Scholar 

  80. Sahraeian, S. M. E. et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Torracinta, R. et al. Adaptive somatic mutations calls with deep learning and semi-simulated data. Preprint at bioRxiv https://doi.org/10.1101/079087 (2016).

    Article  Google Scholar 

  82. Dou, Y. et al. Accurate detection of mosaic variants in sequencing data without matched controls. Nat. Biotechnol. 38, 314–319 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Li, H. & Wren, J. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).

    Article  Google Scholar 

  85. Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). This study introduces a computational framework for the discovery of driver genes that accounts for the variable mutation rates across the genome.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012).

    Article  CAS  PubMed  Google Scholar 

  90. Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).

    Article  CAS  PubMed  Google Scholar 

  93. Sabarinathan, R., Mularoni, L., Deu-Pons, J., Gonzalez-Perez, A. & Lopez-Bigas, N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532, 264–267 (2016).

    Article  CAS  PubMed  Google Scholar 

  94. Gonzalez-Perez, A., Sabarinathan, R. & Lopez-Bigas, N. Local determinants of the mutational landscape of the human genome. Cell 177, 101–114 (2019).

    Article  CAS  PubMed  Google Scholar 

  95. Dietlein, F. et al. Identification of cancer driver genes based on nucleotide context. Nat. Genet. 52, 208–218 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Nissim, S. et al. Mutations in RABL3 alter KRAS prenylation and are associated with hereditary pancreatic cancer. Nat. Genet. 51, 1308–1314 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Hess, J. M. et al. Passenger hotspot mutations in cancer. Cancer Cell 36, 288–301.e14 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).

    Article  CAS  PubMed  Google Scholar 

  99. Niu, B. et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48, 827–837 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Reimand, J. & Bader, G. D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 9, 637 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell 77, 1307–1321.e10 (2020).

    Article  CAS  PubMed  Google Scholar 

  102. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118–e118 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364, eaaw2872 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. McCarthy, D. J. et al. Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Yen, J. L. et al. A variant by any name: quantifying annotation discordance across tools and clinical databases. Genome Med. 9, 7 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  PubMed  Google Scholar 

  111. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).

    Google Scholar 

  112. Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).

    Article  CAS  PubMed  Google Scholar 

  113. Liu, Y. et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat. Genet. 52, 811–818 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Kanagawa, T. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J. Biosci. Bioeng. 96, 317–323 (2003).

    Article  CAS  PubMed  Google Scholar 

  115. Buckley, A. R. et al. Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls. BMC Genomics 18, 458 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Do, H. & Dobrovic, A. Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem. 61, 64–71 (2015).

    Article  CAS  PubMed  Google Scholar 

  118. Frampton, G. M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Kerick, M. et al. Targeted high throughput sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues, input amount and tumor heterogeneity. BMC Med. Genomics 4, 68 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Van Allen, E. M. et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat. Med. 20, 682–688 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Robbe, P. et al. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project. Genet. Med. 20, 1196–1205 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Cibulskis, K. et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics 27, 2601–2602 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Fiévet, A. et al. ART-DeCo: easy tool for detection and characterization of cross-contamination of DNA samples in diagnostic next-generation sequencing analysis. Eur. J. Hum. Genet. 27, 792–800 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor–normal pairs. Bioinformatics 32, 3196–3198 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Chun, H. & Kim, S. BAMixChecker: an automated checkup tool for matched sample pairs in NGS cohort. Bioinformatics 35, 4806–4808 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Lee, S. S. et al. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types. Nucleic Acids Res. 45, e103 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Schröder, J., Corbin, V. & Papenfuss, A. T. HYSYS: have you swapped your samples? Bioinformatics 33, 596–598 (2017).

    Article  PubMed  Google Scholar 

  128. Taylor-Weiner, A. et al. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods 15, 531–534 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). This is the first comprehensive study on mutational signatures, describing >20 mutational processes operative in >7,000 tumours using mutational signature analysis.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016). This study identifies mutational signatures in breast cancers, including the rearrangement signatures associated with BRCA1/2 mutations that can serve as a biomarker of homologous recombination deficiency.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Macintyre, G. et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Genet. 50, 1262–1270 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Steele, C. D. et al. Signatures of copy number alterations in human cancer. Preprint at bioRxiv https://doi.org/10.1101/2021.04.30.441940 (2021).

    Article  Google Scholar 

  134. Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).

    Article  CAS  PubMed  Google Scholar 

  135. Fischer, A., Illingworth, C. J. R., Campbell, P. J. & Mustonen, V. EMu: probabilistic inference of mutational processes and their localization in the cancer genome. Genome Biol. 14, R39 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Funnell, T. et al. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. PLoS Comput. Biol. 15, e1006799 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836.e16 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Pich, O. et al. The mutational footprints of cancer therapies. Nat. Genet. 51, 1732–1740 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  CAS  PubMed  Google Scholar 

  140. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  CAS  PubMed  Google Scholar 

  141. Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Li, S., Crawford, F. W. & Gerstein, M. B. Using sigLASSO to optimize cancer mutation signatures jointly with sampling likelihood. Nat. Commun. 11, 3575 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Peharz, R. & Pernkopf, F. Sparse nonnegative matrix factorization with ℓ 0-constraints. Neurocomputing 80, 38–46 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Omichessan, H., Severi, G. & Perduca, V. Computational tools to detect signatures of mutational processes in DNA from tumours: a review and empirical comparison of performance. PLoS ONE 14, e0221235 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Riva, L. et al. The mutational signature profile of known and suspected human carcinogens in mice. Nat. Genet. 52, 1189–1197 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Baez-Ortega, A. et al. Somatic evolution and global expansion of an ancient transmissible cancer lineage. Science 365, eaau9923 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Cartolano, M. et al. CaMuS: simultaneous fitting and de novo imputation of cancer mutational signature. Sci. Rep. 10, 1–10 (2020).

    Article  CAS  Google Scholar 

  151. Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).

    Article  CAS  PubMed  Google Scholar 

  152. Färkkilä, A. et al. Immunogenomic profiling determines responses to combined PARP and PD-1 inhibition in ovarian cancer. Nat. Commun. 11, 2543 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

    Article  CAS  PubMed  Google Scholar 

  154. Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Herranz, D. et al. A NOTCH1-driven MYC enhancer promotes T cell development, transformation and acute lymphoblastic leukemia. Nat. Med. 20, 1130–1137 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Takeda, D. Y. et al. A somatically acquired enhancer of the androgen receptor is a noncoding driver in advanced prostate cancer. Cell 174, 422–432.e13 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  158. Kallioniemi, A. et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 (1992).

    Article  CAS  PubMed  Google Scholar 

  159. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  161. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Shen, R. & Seshan, V. E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 44, e131 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.9.1–15.9.17 (2016).

    Article  Google Scholar 

  164. Xi, R., Lee, S., Xia, Y., Kim, T. M. & Park, P. J. Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants. Nucleic Acids Res. 44, 6274–6286 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  165. Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Chen, X. et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat. Methods 12, 527–530 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Fischer, A., Vázquez-García, I., Illingworth, C. J. R. R. & Mustonen, V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 7, 1740–1752 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Cun, Y., Yang, T.-P., Achter, V., Lang, U. & Peifer, M. Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust. Nat. Protoc. 13, 1488–1501 (2018).

    Article  CAS  PubMed  Google Scholar 

  170. Kleinheinz, K. et al. ACEseq — allele specific copy number estimation from whole genome sequencing. Preprint at bioRxiv https://doi.org/10.1101/210807 (2017).

    Article  Google Scholar 

  171. Li, Y. et al. Allele-specific quantification of structural variations in cancer genomes. Cell Syst. 3, 21–34 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Hadi, K. et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, 197–210.e32 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Aganezov, S. & Raphael, B. J. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res. 30, 1274–1290 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. Amarasinghe, K. C. et al. Inferring copy number and genotype in tumour exome data. BMC Genomics 15, 732 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  175. Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).

    Article  CAS  PubMed  Google Scholar 

  176. Magi, A. et al. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol. 14, R120 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  177. Sathirapongsasuti, J. F. et al. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 27, 2648–2654 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. Xu, H., DiCarlo, J., Satya, R. V., Peng, Q. & Wang, Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. Li, J. et al. CONTRA: copy number analysis for targeted resequencing. Bioinformatics 28, 1307–1313 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. Bao, L., Pu, M. & Messer, K. AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data. Bioinformatics 30, 1056–1063 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. Nam, J. Y. et al. Evaluation of somatic copy number estimation tools for whole-exome sequencing data. Brief. Bioinform. 17, 185–192 (2016).

    Article  CAS  PubMed  Google Scholar 

  182. Zare, F., Dow, M., Monteleone, N., Hosny, A. & Nabavi, S. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinformatics 18, 286 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  183. Kuilman, T. et al. CopywriteR: DNA copy number detection from off-target sequence data. Genome Biol. 16, 1–15 (2015).

    Article  Google Scholar 

  184. Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  185. Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).

    Article  CAS  PubMed  Google Scholar 

  186. Yang, L. et al. Analyzing somatic genome rearrangements in human cancers by using whole-exome sequencing. Am. J. Hum. Genet. 98, 843–856 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  187. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, 1–14 (2011).

    Article  CAS  Google Scholar 

  188. Haider, S. et al. Systematic assessment of tumor purity and its clinical implications. JCO Precis. Oncol. 4, 995–1005 (2020).

    Article  Google Scholar 

  189. Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).

    Article  CAS  PubMed  Google Scholar 

  190. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  191. Kosugi, S. et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 20, 117 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  192. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  193. Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat. Methods 8, 652–654 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  195. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  196. Cameron, D. L. et al. GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing. Genome Biol. 22, 1–25 (2021).

    Article  CAS  Google Scholar 

  197. Lee, A. Y. et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol. 19, 188 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  198. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  199. Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  200. Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. Glodzik, D. et al. A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers. Nat. Genet. 49, 341–348 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  202. Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  203. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011). This paper reports the discovery of a mutational process, termed chromothripsis, characterized by tens to hundreds of structural rearrangements acquired in a single cell division.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  204. Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013). By examining the patterns of structural variation, this study finds ‘chromoplexy’, a large chain of rearrangements that affect multiple chromosomes and may drive prostate carcinogenesis.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  205. Anderson, N. D. et al. Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors. Science 361, eaam8419 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  206. Liu, P. et al. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 146, 889–903 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  207. Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 467, 1109–1113 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  208. Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  209. Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  210. Notta, F. et al. A renewed model of pancreatic cancer evolution based on genomic rearrangement patterns. Nature 538, 378–382 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  211. Yang, J. et al. CTLPScanner: a web server for chromothripsis-like pattern detection. Nucleic Acids Res. 44, W252–W258 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  212. Govind, S. K. et al. ShatterProof: operational detection and quantification of chromothripsis. BMC Bioinformatics 15, 78 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  213. Wang, S. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 21, 73 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  214. Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  215. Chaisson, M. J. P. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  216. Kumar, S., Vo, A. D., Qin, F. & Li, H. Comparative assessment of methods for the fusion transcripts detection from RNA-seq data. Sci. Rep. 6, 21597 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  217. Liu, S. et al. Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res. 44, e47 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  218. Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  219. Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  220. Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  221. McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-seq data. PLoS Comput. Biol. 7, e1001138 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  222. Tian, L. et al. CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biol. 21, 1–18 (2020).

    Article  CAS  Google Scholar 

  223. Davidson, N. M., Majewski, I. J. & Oshlack, A. JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med. 7, 43 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  224. Picco, G. et al. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR–Cas9 screening. Nat. Commun. 10, 2198 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  225. Gao, Q. et al. Driver fusions and their implications in the development and treatment of human cancers. Cell Rep. 23, 227–238.e3 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  226. Heyer, E. E. et al. Diagnosis of fusion genes using targeted RNA sequencing. Nat. Commun. 10, 1388 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  227. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).

    Article  CAS  PubMed  Google Scholar 

  228. Tarabichi, M. et al. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat. Methods 18, 144–155 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  229. Dentro, S. C., Wedge, D. C. & Van Loo, P. Principles of reconstructing the subclonal architecture of cancers. Cold Spring Harb. Perspect. Med. 7, a026625 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  230. Salcedo, A. et al. A community effort to create standards for evaluating tumor subclonal reconstruction. Nat. Biotechnol. 38, 97–107 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  231. Miller, C. A. et al. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10, e1003665 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  232. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  233. Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  234. Caravagna, G. et al. Subclonal reconstruction of tumors by using machine learning and population genetics. Nat. Genet. 52, 898–907 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  235. Yang, L. et al. An enhanced genetic model of colorectal cancer progression history. Genome Biol. 20, 168 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  236. Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857.e21 (2019).

    Article  CAS  PubMed  Google Scholar 

  237. Mitchell, T. J. et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell 173, 611–623.e17 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  238. Watkins, T. B. K. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  239. Schwartz, R. & Schäffer, A. A. The evolution of tumour phylogenetics: principles and practice. Nat. Rev. Genet. 18, 213–229 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  240. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  241. Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  242. Liu, D. et al. Mutational patterns in chemotherapy resistant muscle-invasive bladder cancer. Nat. Commun. 8, 2193 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  243. Behjati, S. et al. Mutational signatures of ionizing radiation in second malignancies. Nat. Commun. 7, 1–8 (2016).

    Article  CAS  Google Scholar 

  244. Robinson, J. T. et al. Integrative Genomics Viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  245. Cerami, E. et al. The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  PubMed  Google Scholar 

  246. Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  247. Zhou, X. et al. Exploration of coding and non-coding variants in cancer using genomepaint. Cancer Cell 39, 83–95.e4 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  248. Zhang, J. et al. The International Cancer Genome Consortium data portal. Nat. Biotechnol. 37, 367–369 (2019).

    Article  CAS  PubMed  Google Scholar 

  249. Saunders, G. et al. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat. Rev. Genet. 20, 693–701 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  250. Molnár-Gábor, F., Lueck, R., Yakneen, S. & Korbel, J. O. Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally. Genome Med. 9, 1–12 (2017).

    Article  Google Scholar 

  251. Chen, P. H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).

    Article  CAS  PubMed  Google Scholar 

  252. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).

    Article  CAS  PubMed  Google Scholar 

  253. Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  254. Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  Google Scholar 

  255. Parikh, A. R. et al. Liquid versus tissue biopsy for detecting acquired resistance and tumor heterogeneity in gastrointestinal cancers. Nat. Med. 25, 1415–1421 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  256. Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell 179, 1207–1221.e22 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  257. Sanders, A. D. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 38, 343–354 (2020).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from EMBL (to I.C-C.) and the Harvard Ludwig Center (to P.J.P.) and an award from the Cancer Research UK Grand Challenge and the Mark Foundation for Cancer Research to the SPECIFICANCER team.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to all aspects of the article.

Corresponding author

Correspondence to Peter J. Park.

Ethics declarations

Competing interests

D.C.G. and P.J.P. have filed a patent application on SigMA. I.C-C., J.J.-K.L. and G.E.M.M. declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks M. Peifer, J. Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

1000 Genomes Project: https://www.internationalgenome.org/home

Catalogue of Somatic Mutations In Cancer: https://cancer.sanger.ac.uk/cosmic

cBioPortal: http://www.cbioportal.org/

ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/

COSMIC Mutational Signatures: https://cancer.sanger.ac.uk/signatures

OncoKB: https://www.oncokb.org/

Pan-Cancer Analysis of Whole Genomes (PCAWG) project: https://dcc.icgc.org/pcawg

The Cancer Genome Atlas (TCGA): https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

Glossary

‘Driver’ mutations

Somatic alterations of the DNA sequence that confer a selective advantage to cells harbouring it.

Mutational signatures

Distinct patterns of mutational spectra, often associated with specific mutational processes.

Mapping quality

A measure of confidence that a sequencing read originated from the aligned position.

Single-nucleotide variants

(SNVs). Changes in the sequence of the DNA involving one base pair.

Variant allele fraction

(VAF). The number of reads supporting a candidate mutation divided by the read depth at that position.

Read depth

Number of sequenced reads at a genomic position.

Tumour purity

The fraction of cancer cells in the sequenced sample.

Tumour ploidy

The amount of DNA that cancer cells contain, usually estimated for the major clone in a tumour sample.

Panel of normals

A set of ‘normal’ samples that are used as a control to remove germline variants in a population as part of somatic variant calling.

GC content

Percentage of bases in a genomic region that are either guanine (G) or cytosine (C).

Strand bias

In the context of variant calling, the presence of the variant allele in either forward or reverse reads with a frequency higher than expected for binomial sampling.

Read ‘pile-up’

Text-based format that represents the base calls in sequencing reads aligned to a reference genome.

Quantile–quantile plots

A graphical method used to compare two probability distributions by plotting the quantiles of one distribution against the same quantiles of a second distribution.

‘Clock-like’ signatures

Mutational signatures that correspond to those mutations that accumulate in normal somatic cells at a steady rate.

Non-negative least squares

(NNLS). A method for finding the optimal non-negative coefficients for a set of predefined vectors such that their weighted sum is as close as possible to another given vector. In signature analysis, it is used to calculate the contribution of each signature to the mutational spectrum of a sample.

Enhancer hijacking

Juxtaposition of an active enhancer element from a distant locus into the proximity of another gene, usually caused by a genomic rearrangement, leading to gene activation.

Enhancer amplification

Increased copy number of regulatory regions (enhancers) that leads to the overexpression of target genes.

Loss of heterozygosity

(LOH). Loss of one allele in biallelic regions, which often results from a somatic deletion.

B-allele frequency

(BAF). The fraction of sequencing reads supporting one allele at a heterozygous single-nucleotide polymorphism (SNP) with respect to the total read depth at that position.

Split reads

Reads containing two contiguous DNA sequences mapping to non-adjacent regions in the reference genome.

Discordant read pairs

Pairs of sequencing reads that do not map to the reference genome with the expected forward–reverse orientation or insert size, suggesting the presence of structural variation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cortés-Ciriano, I., Gulhan, D.C., Lee, J.JK. et al. Computational analysis of cancer genome sequencing data. Nat Rev Genet 23, 298–314 (2022). https://doi.org/10.1038/s41576-021-00431-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-021-00431-y

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer