Predictive Coding of Aligned Next-Generation Sequencing Data

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings - DCC 2016
Subtitle of host publication2016 Data Compression Conference
EditorsMichael W. Marcellin, Ali Bilgin, Joan Serra-Sagrista, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages241-250
Number of pages10
ISBN (electronic)9781509018536
Publication statusPublished - Dec 2016
Event2016 Data Compression Conference, DCC 2016 - Snowbird, United States
Duration: 29 Mar 20161 Apr 2016

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314

Abstract

Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

ASJC Scopus subject areas

Cite this

Predictive Coding of Aligned Next-Generation Sequencing Data. / Voges, Jan; Munderloh, Marco; Ostermann, Jörn.
Proceedings - DCC 2016: 2016 Data Compression Conference. ed. / Michael W. Marcellin; Ali Bilgin; Joan Serra-Sagrista; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2016. p. 241-250 7786168 (Data Compression Conference Proceedings).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Voges, J, Munderloh, M & Ostermann, J 2016, Predictive Coding of Aligned Next-Generation Sequencing Data. in MW Marcellin, A Bilgin, J Serra-Sagrista & JA Storer (eds), Proceedings - DCC 2016: 2016 Data Compression Conference., 7786168, Data Compression Conference Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 241-250, 2016 Data Compression Conference, DCC 2016, Snowbird, United States, 29 Mar 2016. https://doi.org/10.1109/dcc.2016.98
Voges, J., Munderloh, M., & Ostermann, J. (2016). Predictive Coding of Aligned Next-Generation Sequencing Data. In M. W. Marcellin, A. Bilgin, J. Serra-Sagrista, & J. A. Storer (Eds.), Proceedings - DCC 2016: 2016 Data Compression Conference (pp. 241-250). Article 7786168 (Data Compression Conference Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/dcc.2016.98
Voges J, Munderloh M, Ostermann J. Predictive Coding of Aligned Next-Generation Sequencing Data. In Marcellin MW, Bilgin A, Serra-Sagrista J, Storer JA, editors, Proceedings - DCC 2016: 2016 Data Compression Conference. Institute of Electrical and Electronics Engineers Inc. 2016. p. 241-250. 7786168. (Data Compression Conference Proceedings). doi: 10.1109/dcc.2016.98
Voges, Jan ; Munderloh, Marco ; Ostermann, Jörn. / Predictive Coding of Aligned Next-Generation Sequencing Data. Proceedings - DCC 2016: 2016 Data Compression Conference. editor / Michael W. Marcellin ; Ali Bilgin ; Joan Serra-Sagrista ; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 241-250 (Data Compression Conference Proceedings).
Download
@inproceedings{034b35c71af94857a23cec3d67ef3f97,
title = "Predictive Coding of Aligned Next-Generation Sequencing Data",
abstract = "Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.",
author = "Jan Voges and Marco Munderloh and J{\"o}rn Ostermann",
year = "2016",
month = dec,
doi = "10.1109/dcc.2016.98",
language = "English",
series = "Data Compression Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "241--250",
editor = "Marcellin, {Michael W.} and Ali Bilgin and Joan Serra-Sagrista and Storer, {James A.}",
booktitle = "Proceedings - DCC 2016",
address = "United States",
note = "2016 Data Compression Conference, DCC 2016 ; Conference date: 29-03-2016 Through 01-04-2016",

}

Download

TY - GEN

T1 - Predictive Coding of Aligned Next-Generation Sequencing Data

AU - Voges, Jan

AU - Munderloh, Marco

AU - Ostermann, Jörn

PY - 2016/12

Y1 - 2016/12

N2 - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

AB - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

UR - http://www.scopus.com/inward/record.url?scp=85010064563&partnerID=8YFLogxK

U2 - 10.1109/dcc.2016.98

DO - 10.1109/dcc.2016.98

M3 - Conference contribution

AN - SCOPUS:85010064563

T3 - Data Compression Conference Proceedings

SP - 241

EP - 250

BT - Proceedings - DCC 2016

A2 - Marcellin, Michael W.

A2 - Bilgin, Ali

A2 - Serra-Sagrista, Joan

A2 - Storer, James A.

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 Data Compression Conference, DCC 2016

Y2 - 29 March 2016 through 1 April 2016

ER -

By the same author(s)