2016 Data Compression Conference (DCC)

chapter

Leveraging CABAC for No-Reference Compression of Genomic Data with Random Access Support

Tom Paridaens, Jens Panneel, Wesley De Neve, Peter Lambert, more

2016 Data Compression Conference (DCC) > 625

In previous work, the authors developed a modular no-reference framework that compresses FASTA files by applying a predict-and-residue method, as used in video coding. We extended this framework with support for Context-Adaptive Binary Arithmetic Coding (CABAC), while at the same time preserving random access functionality and offering support for the full IUB/IUPAC nucleic acid codes alphabet.

chapter

Computing LZ77 in Run-Compressed Space

Alberto Policriti, Nicola Prezza

2016 Data Compression Conference (DCC) > 23 - 32

2016 Data Compression Conference (DCC)

In this paper, we show that the LZ77 factorization of a text T ε Σn can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T (reversed). For (extremely) repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the text itself. Hence, our result finds important applications in the...

chapter

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, more

2016 Data Compression Conference (DCC) > 221 - 230

2016 Data Compression Conference (DCC)

This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring...

chapter

Efficient Compression of Genomic Sequences

Diogo Pratas, Armando J. Pinho, Paulo J. S. G. Ferreira

2016 Data Compression Conference (DCC) > 231 - 240

2016 Data Compression Conference (DCC)

The number of genomic sequences is growing substantially. Besides discarding part of the data, the only efficient possibility for coping with this trend is data compression. We present an efficient compressor for genomic sequences, allowing both reference-free and referential compression. This compressor uses a mixture of context models of several orders, according to two model classes: reference...

chapter

A Cluster-Based Approach to Compression of Quality Scores

Mikel Hernaez, Idoia Ochoa, Tsachy Weissman

2016 Data Compression Conference (DCC) > 261 - 270

2016 Data Compression Conference (DCC)

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Storing and sharing this large data has become a major bottleneck in the discovery and analysis of genetic variants that are used for medical inference. As such, lossless compression of this data has been proposed. Of the compressed data, more than 70% correspond...

chapter

Burrows-Wheeler Transform for Terabases

Jouni Siren

2016 Data Compression Conference (DCC) > 211 - 220

2016 Data Compression Conference (DCC)

In order to avoid the reference bias introduced by mapping reads to a reference genome, bioinformaticians are investigating reference-free methods for analyzing sequenced genomes. With large projects sequencing thousands of individuals, this raises the need for tools capable of handling terabases of sequence data. A key method is the Burrows-Wheeler transform (BWT), which is widely used for compressing...

chapter

Predictive Coding of Aligned Next-Generation Sequencing Data

Jan Voges, Marco Munderloh, Jorn Ostermann

2016 Data Compression Conference (DCC) > 241 - 250

2016 Data Compression Conference (DCC)

Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation...

chapter

CS2A: A Compressed Suffix Array-Based Method for Short Read Alignment

Hongwei Huo, Zhigang Sun, Shuangjiang Li, Jeffrey Scott Vitter, more

2016 Data Compression Conference (DCC) > 271 - 278

2016 Data Compression Conference (DCC)

Next generation sequencing technologies generate normous amount of short reads, which poses a significant computational challenge for short read alignment. Furthermore, because of sequence polymorphisms in a population, repetitive sequences, and sequencing errors, there still exist difficulties in correctly aligning all reads. We propose a space-efficient compressed suffix array-based method for short...

INFONA - science communication portal

2016 Data Compression Conference (DCC)

Leveraging CABAC for No-Reference Compression of Genomic Data with Random Access Support

Computing LZ77 in Run-Compressed Space

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

Efficient Compression of Genomic Sequences

A Cluster-Based Approach to Compression of Quality Scores

Burrows-Wheeler Transform for Terabases

Predictive Coding of Aligned Next-Generation Sequencing Data

CS2A: A Compressed Suffix Array-Based Method for Short Read Alignment

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 Data Compression Conference (DCC) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 Data Compression Conference (DCC)