GABAC: An arithmetic coding solution for genomic data

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Jan Voges
  • Tom Paridaens
  • Fabian Müntefering
  • Liudmila S. Mainzer
  • Brian Bliss
  • Mingyu Yang
  • Idoia Ochoa
  • Jan Fostier
  • Jörn Ostermann
  • Mikel Hernaez

Research Organisations

External Research Organisations

  • Ghent University
  • University of Illinois at Urbana-Champaign
View graph of relations

Details

Original languageEnglish
Pages (from-to)2275-2277
Number of pages3
JournalBIOINFORMATICS
Volume36
Issue number7
Early online date12 Dec 2019
Publication statusPublished - 1 Apr 2020

Abstract

Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.

ASJC Scopus subject areas

Cite this

GABAC: An arithmetic coding solution for genomic data. / Voges, Jan; Paridaens, Tom; Müntefering, Fabian et al.
In: BIOINFORMATICS, Vol. 36, No. 7, 01.04.2020, p. 2275-2277.

Research output: Contribution to journalArticleResearchpeer review

Voges, J, Paridaens, T, Müntefering, F, Mainzer, LS, Bliss, B, Yang, M, Ochoa, I, Fostier, J, Ostermann, J & Hernaez, M 2020, 'GABAC: An arithmetic coding solution for genomic data', BIOINFORMATICS, vol. 36, no. 7, pp. 2275-2277. https://doi.org/10.1093/bioinformatics/btz922, https://doi.org/10.15488/10852
Voges, J., Paridaens, T., Müntefering, F., Mainzer, L. S., Bliss, B., Yang, M., Ochoa, I., Fostier, J., Ostermann, J., & Hernaez, M. (2020). GABAC: An arithmetic coding solution for genomic data. BIOINFORMATICS, 36(7), 2275-2277. https://doi.org/10.1093/bioinformatics/btz922, https://doi.org/10.15488/10852
Voges J, Paridaens T, Müntefering F, Mainzer LS, Bliss B, Yang M et al. GABAC: An arithmetic coding solution for genomic data. BIOINFORMATICS. 2020 Apr 1;36(7):2275-2277. Epub 2019 Dec 12. doi: 10.1093/bioinformatics/btz922, 10.15488/10852
Voges, Jan ; Paridaens, Tom ; Müntefering, Fabian et al. / GABAC : An arithmetic coding solution for genomic data. In: BIOINFORMATICS. 2020 ; Vol. 36, No. 7. pp. 2275-2277.
Download
@article{99a202abfa084898bdad2d09d7402ee1,
title = "GABAC: An arithmetic coding solution for genomic data",
abstract = "Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.",
author = "Jan Voges and Tom Paridaens and Fabian M{\"u}ntefering and Mainzer, {Liudmila S.} and Brian Bliss and Mingyu Yang and Idoia Ochoa and Jan Fostier and J{\"o}rn Ostermann and Mikel Hernaez",
note = "Funding information: This work has been partially supported by grants 2018-182798 and 2018-182799 from the Chan Zuckerberg Initiative DAF, a donor advised fund of the Silicon Valley Community Foundation, a Strategic Research Initiative from UIUC and the Mayo Clinic Center for Individualized Medicine, and the Todd and Karen Wanek Program for Hypoplastic Left Heart Syndrome. This work was a part of the Mayo Clinic and Illinois Strategic Alliance for Technology-Based Healthcare.",
year = "2020",
month = apr,
day = "1",
doi = "10.1093/bioinformatics/btz922",
language = "English",
volume = "36",
pages = "2275--2277",
journal = "BIOINFORMATICS",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "7",

}

Download

TY - JOUR

T1 - GABAC

T2 - An arithmetic coding solution for genomic data

AU - Voges, Jan

AU - Paridaens, Tom

AU - Müntefering, Fabian

AU - Mainzer, Liudmila S.

AU - Bliss, Brian

AU - Yang, Mingyu

AU - Ochoa, Idoia

AU - Fostier, Jan

AU - Ostermann, Jörn

AU - Hernaez, Mikel

N1 - Funding information: This work has been partially supported by grants 2018-182798 and 2018-182799 from the Chan Zuckerberg Initiative DAF, a donor advised fund of the Silicon Valley Community Foundation, a Strategic Research Initiative from UIUC and the Mayo Clinic Center for Individualized Medicine, and the Todd and Karen Wanek Program for Hypoplastic Left Heart Syndrome. This work was a part of the Mayo Clinic and Illinois Strategic Alliance for Technology-Based Healthcare.

PY - 2020/4/1

Y1 - 2020/4/1

N2 - Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.

AB - Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.

UR - http://www.scopus.com/inward/record.url?scp=85083073632&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz922

DO - 10.1093/bioinformatics/btz922

M3 - Article

C2 - 31830243

AN - SCOPUS:85083073632

VL - 36

SP - 2275

EP - 2277

JO - BIOINFORMATICS

JF - BIOINFORMATICS

SN - 1367-4803

IS - 7

ER -

By the same author(s)