Predictive Coding of Aligned Next-Generation Sequencing Data

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings - DCC 2016
Untertitel2016 Data Compression Conference
Herausgeber/-innenMichael W. Marcellin, Ali Bilgin, Joan Serra-Sagrista, James A. Storer
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten241-250
Seitenumfang10
ISBN (elektronisch)9781509018536
PublikationsstatusVeröffentlicht - Dez. 2016
Veranstaltung2016 Data Compression Conference, DCC 2016 - Snowbird, USA / Vereinigte Staaten
Dauer: 29 März 20161 Apr. 2016

Publikationsreihe

NameData Compression Conference Proceedings
ISSN (Print)1068-0314

Abstract

Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

ASJC Scopus Sachgebiete

Zitieren

Predictive Coding of Aligned Next-Generation Sequencing Data. / Voges, Jan; Munderloh, Marco; Ostermann, Jörn.
Proceedings - DCC 2016: 2016 Data Compression Conference. Hrsg. / Michael W. Marcellin; Ali Bilgin; Joan Serra-Sagrista; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2016. S. 241-250 7786168 (Data Compression Conference Proceedings).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Voges, J, Munderloh, M & Ostermann, J 2016, Predictive Coding of Aligned Next-Generation Sequencing Data. in MW Marcellin, A Bilgin, J Serra-Sagrista & JA Storer (Hrsg.), Proceedings - DCC 2016: 2016 Data Compression Conference., 7786168, Data Compression Conference Proceedings, Institute of Electrical and Electronics Engineers Inc., S. 241-250, 2016 Data Compression Conference, DCC 2016, Snowbird, USA / Vereinigte Staaten, 29 März 2016. https://doi.org/10.1109/dcc.2016.98
Voges, J., Munderloh, M., & Ostermann, J. (2016). Predictive Coding of Aligned Next-Generation Sequencing Data. In M. W. Marcellin, A. Bilgin, J. Serra-Sagrista, & J. A. Storer (Hrsg.), Proceedings - DCC 2016: 2016 Data Compression Conference (S. 241-250). Artikel 7786168 (Data Compression Conference Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/dcc.2016.98
Voges J, Munderloh M, Ostermann J. Predictive Coding of Aligned Next-Generation Sequencing Data. in Marcellin MW, Bilgin A, Serra-Sagrista J, Storer JA, Hrsg., Proceedings - DCC 2016: 2016 Data Compression Conference. Institute of Electrical and Electronics Engineers Inc. 2016. S. 241-250. 7786168. (Data Compression Conference Proceedings). doi: 10.1109/dcc.2016.98
Voges, Jan ; Munderloh, Marco ; Ostermann, Jörn. / Predictive Coding of Aligned Next-Generation Sequencing Data. Proceedings - DCC 2016: 2016 Data Compression Conference. Hrsg. / Michael W. Marcellin ; Ali Bilgin ; Joan Serra-Sagrista ; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2016. S. 241-250 (Data Compression Conference Proceedings).
Download
@inproceedings{034b35c71af94857a23cec3d67ef3f97,
title = "Predictive Coding of Aligned Next-Generation Sequencing Data",
abstract = "Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.",
author = "Jan Voges and Marco Munderloh and J{\"o}rn Ostermann",
year = "2016",
month = dec,
doi = "10.1109/dcc.2016.98",
language = "English",
series = "Data Compression Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "241--250",
editor = "Marcellin, {Michael W.} and Ali Bilgin and Joan Serra-Sagrista and Storer, {James A.}",
booktitle = "Proceedings - DCC 2016",
address = "United States",
note = "2016 Data Compression Conference, DCC 2016 ; Conference date: 29-03-2016 Through 01-04-2016",

}

Download

TY - GEN

T1 - Predictive Coding of Aligned Next-Generation Sequencing Data

AU - Voges, Jan

AU - Munderloh, Marco

AU - Ostermann, Jörn

PY - 2016/12

Y1 - 2016/12

N2 - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

AB - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.

UR - http://www.scopus.com/inward/record.url?scp=85010064563&partnerID=8YFLogxK

U2 - 10.1109/dcc.2016.98

DO - 10.1109/dcc.2016.98

M3 - Conference contribution

AN - SCOPUS:85010064563

T3 - Data Compression Conference Proceedings

SP - 241

EP - 250

BT - Proceedings - DCC 2016

A2 - Marcellin, Michael W.

A2 - Bilgin, Ali

A2 - Serra-Sagrista, Joan

A2 - Storer, James A.

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 Data Compression Conference, DCC 2016

Y2 - 29 March 2016 through 1 April 2016

ER -

Von denselben Autoren