Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold standard
Lin YY, Breuer K, Weichenhan D, Lafrenz P, Sarnataro A, Wilk A, Chepeleva M, Mücke O, Schönung M, Petermann F, Kensche PR, Weiser L, Thommen F, Giacomelli G, Nordstroem K, Gonzalez-Avalos E, Merkel A, Kretzmer H, Fischer J, Krämer S, Iskar M, Wolf S, Buchhalter I, Esteller M, Lawerenz C, Twardziok S, Zapatka M, Hovestadt V, Schlesner M, Schulz MH, Hoffmann S, Gerhauser C, Walter J, Hartmann M, Lipka DB, Assenov Y, Bock C, Plass C, Toth R, Lutsik P.
Nucleic Acids Res
DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposures, and disease. Whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods remains the reference method for DNA methylation profiling genome-wide. While numerous software tools facilitate processing of DNA methylation sequencing reads, a comprehensive benchmarking study has been lacking. In this study, we systematically compared complete computational workflows for processing DNA methylation sequencing data using a dedicated benchmarking dataset generated with five whole-genome profiling protocols. As an evaluation reference, we employed accurate locus-specific measurements from our previous benchmark of targeted DNA methylation assays. Based on this experimental gold-standard assessment and multiple performance metrics, we identified workflows that consistently demonstrated superior performance and revealed major workflow development trends. To ensure the long-term utility of our benchmark, we implemented an interactive workflow execution and data presentation platform, adaptable to user-defined criteria and readily expandable to future software.
Jump to pubmed