An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data.

Research output: Contribution to journalArticleResearchpeer review

Authors

Research Organisations

External Research Organisations

  • University of Illinois at Urbana-Champaign
  • University of Navarra
  • École polytechnique fédérale de Lausanne (EPFL)
View graph of relations

Details

Original languageEnglish
Article number9455132
Pages (from-to)1607-1622
Number of pages16
JournalProc. IEEE
Volume109
Issue number9
Publication statusPublished - Sept 2021

Abstract

The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than 600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 'Biotechnology' has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.

Keywords

    Bioinformatics, computational biology, data compression, DNA, genomics, standardization

ASJC Scopus subject areas

Sustainable Development Goals

Cite this

An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data. / Voges, Jan; Hernaez, Mikel; Mattavelli, Marco et al.
In: Proc. IEEE, Vol. 109, No. 9, 9455132, 09.2021, p. 1607-1622.

Research output: Contribution to journalArticleResearchpeer review

Voges J, Hernaez M, Mattavelli M, Ostermann J. An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data. Proc. IEEE. 2021 Sept;109(9):1607-1622. 9455132. doi: 10.1109/JPROC.2021.3082027
Voges, Jan ; Hernaez, Mikel ; Mattavelli, Marco et al. / An Introduction to MPEG-G : The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data. In: Proc. IEEE. 2021 ; Vol. 109, No. 9. pp. 1607-1622.
Download
@article{fd403181a252400b8cc2c37215f0a746,
title = "An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data.",
abstract = "The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than 600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 'Biotechnology' has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.",
keywords = "Bioinformatics, computational biology, data compression, DNA, genomics, standardization",
author = "Jan Voges and Mikel Hernaez and Marco Mattavelli and J{\"o}rn Ostermann",
note = "Acknowledgment: The development of the MPEG-G specification is a collaborative effort. The following people contributed to the actual MPEG-G development: Junaid J. Ahmad, Claudio Alberti, Simone Casale-Brunet, Patrick Cheung, Jaime Delgado, Jan Fostier, Silvia Llorente, Liud- mila S. Mainzer, Fabian M{\"u}ntefering, Daniel Naro, Ibrahim Numanagi{\' }c, Idoia Ochoa, Tom Paridaens, Massimo Ravasi, Daniele Renzi, Paolo Ribeca, and Giorgio Zoia. MPEG received additional input from other experts, including Bonnie Berger, Noah Daniels, Nicolas Guex, Christian Iseli, Raymond Krasinski, Christian Rohlfing, S. Cenk Sahinalp, and Ioannis Xenarios.",
year = "2021",
month = sep,
doi = "10.1109/JPROC.2021.3082027",
language = "English",
volume = "109",
pages = "1607--1622",
journal = "Proc. IEEE",
issn = "1558-2256",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "9",

}

Download

TY - JOUR

T1 - An Introduction to MPEG-G

T2 - The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data.

AU - Voges, Jan

AU - Hernaez, Mikel

AU - Mattavelli, Marco

AU - Ostermann, Jörn

N1 - Acknowledgment: The development of the MPEG-G specification is a collaborative effort. The following people contributed to the actual MPEG-G development: Junaid J. Ahmad, Claudio Alberti, Simone Casale-Brunet, Patrick Cheung, Jaime Delgado, Jan Fostier, Silvia Llorente, Liud- mila S. Mainzer, Fabian Müntefering, Daniel Naro, Ibrahim Numanagi ́c, Idoia Ochoa, Tom Paridaens, Massimo Ravasi, Daniele Renzi, Paolo Ribeca, and Giorgio Zoia. MPEG received additional input from other experts, including Bonnie Berger, Noah Daniels, Nicolas Guex, Christian Iseli, Raymond Krasinski, Christian Rohlfing, S. Cenk Sahinalp, and Ioannis Xenarios.

PY - 2021/9

Y1 - 2021/9

N2 - The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than 600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 'Biotechnology' has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.

AB - The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than 600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 'Biotechnology' has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.

KW - Bioinformatics

KW - computational biology

KW - data compression

KW - DNA

KW - genomics

KW - standardization

UR - http://www.scopus.com/inward/record.url?scp=85112220633&partnerID=8YFLogxK

U2 - 10.1109/JPROC.2021.3082027

DO - 10.1109/JPROC.2021.3082027

M3 - Article

VL - 109

SP - 1607

EP - 1622

JO - Proc. IEEE

JF - Proc. IEEE

SN - 1558-2256

IS - 9

M1 - 9455132

ER -

By the same author(s)