Details
Original language | English |
---|---|
Title of host publication | Proceedings - DCC 2016 |
Subtitle of host publication | 2016 Data Compression Conference |
Editors | Michael W. Marcellin, Ali Bilgin, Joan Serra-Sagrista, James A. Storer |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 241-250 |
Number of pages | 10 |
ISBN (electronic) | 9781509018536 |
Publication status | Published - Dec 2016 |
Event | 2016 Data Compression Conference, DCC 2016 - Snowbird, United States Duration: 29 Mar 2016 → 1 Apr 2016 |
Publication series
Name | Data Compression Conference Proceedings |
---|---|
ISSN (Print) | 1068-0314 |
Abstract
Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings - DCC 2016: 2016 Data Compression Conference. ed. / Michael W. Marcellin; Ali Bilgin; Joan Serra-Sagrista; James A. Storer. Institute of Electrical and Electronics Engineers Inc., 2016. p. 241-250 7786168 (Data Compression Conference Proceedings).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Predictive Coding of Aligned Next-Generation Sequencing Data
AU - Voges, Jan
AU - Munderloh, Marco
AU - Ostermann, Jörn
PY - 2016/12
Y1 - 2016/12
N2 - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.
AB - Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-theart, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes,fitting in today's level 1 CPU caches.
UR - http://www.scopus.com/inward/record.url?scp=85010064563&partnerID=8YFLogxK
U2 - 10.1109/dcc.2016.98
DO - 10.1109/dcc.2016.98
M3 - Conference contribution
AN - SCOPUS:85010064563
T3 - Data Compression Conference Proceedings
SP - 241
EP - 250
BT - Proceedings - DCC 2016
A2 - Marcellin, Michael W.
A2 - Bilgin, Ali
A2 - Serra-Sagrista, Joan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 Data Compression Conference, DCC 2016
Y2 - 29 March 2016 through 1 April 2016
ER -