HiCMC: High-Efficiency Contact Matrix Compressor

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Yeremia Gunawan Adhisantoso
  • Tim Körner
  • Fabian Müntefering
  • Jörn Ostermann
  • Jan Voges

Externe Organisationen

  • University of Navarra
  • IdiSNA
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer296
Seitenumfang15
FachzeitschriftBMC BIOINFORMATICS
Jahrgang25
Ausgabenummer1
Frühes Online-Datum10 Sept. 2024
PublikationsstatusVeröffentlicht - 2024

Abstract

Background: Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. Results: By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. Conclusion: HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc.

ASJC Scopus Sachgebiete

Zitieren

HiCMC: High-Efficiency Contact Matrix Compressor. / Adhisantoso, Yeremia Gunawan; Körner, Tim; Müntefering, Fabian et al.
in: BMC BIOINFORMATICS, Jahrgang 25, Nr. 1, 296, 2024.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Adhisantoso, YG, Körner, T, Müntefering, F, Ostermann, J & Voges, J 2024, 'HiCMC: High-Efficiency Contact Matrix Compressor', BMC BIOINFORMATICS, Jg. 25, Nr. 1, 296. https://doi.org/10.1186/s12859-024-05907-2
Adhisantoso, Y. G., Körner, T., Müntefering, F., Ostermann, J., & Voges, J. (2024). HiCMC: High-Efficiency Contact Matrix Compressor. BMC BIOINFORMATICS, 25(1), Artikel 296. https://doi.org/10.1186/s12859-024-05907-2
Adhisantoso YG, Körner T, Müntefering F, Ostermann J, Voges J. HiCMC: High-Efficiency Contact Matrix Compressor. BMC BIOINFORMATICS. 2024;25(1):296. Epub 2024 Sep 10. doi: 10.1186/s12859-024-05907-2
Adhisantoso, Yeremia Gunawan ; Körner, Tim ; Müntefering, Fabian et al. / HiCMC : High-Efficiency Contact Matrix Compressor. in: BMC BIOINFORMATICS. 2024 ; Jahrgang 25, Nr. 1.
Download
@article{c7a8285d132443d0add8848dc4cf9d00,
title = "HiCMC: High-Efficiency Contact Matrix Compressor",
abstract = "Background: Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. Results: By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. Conclusion: HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc.",
keywords = "3C, Compression, Contact matrix, Hi-C",
author = "Adhisantoso, {Yeremia Gunawan} and Tim K{\"o}rner and Fabian M{\"u}ntefering and J{\"o}rn Ostermann and Jan Voges",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
doi = "10.1186/s12859-024-05907-2",
language = "English",
volume = "25",
journal = "BMC BIOINFORMATICS",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "1",

}

Download

TY - JOUR

T1 - HiCMC

T2 - High-Efficiency Contact Matrix Compressor

AU - Adhisantoso, Yeremia Gunawan

AU - Körner, Tim

AU - Müntefering, Fabian

AU - Ostermann, Jörn

AU - Voges, Jan

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024

Y1 - 2024

N2 - Background: Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. Results: By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. Conclusion: HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc.

AB - Background: Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. Results: By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. Conclusion: HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc.

KW - 3C

KW - Compression

KW - Contact matrix

KW - Hi-C

UR - http://www.scopus.com/inward/record.url?scp=85203473959&partnerID=8YFLogxK

U2 - 10.1186/s12859-024-05907-2

DO - 10.1186/s12859-024-05907-2

M3 - Article

C2 - 39256681

AN - SCOPUS:85203473959

VL - 25

JO - BMC BIOINFORMATICS

JF - BMC BIOINFORMATICS

SN - 1471-2105

IS - 1

M1 - 296

ER -

Von denselben Autoren