Improving HTML Compression

P. Skibinski

doi:10.1109/DCC.2008.74

Source

Data Compression Conference (dcc 2008) > 545

Abstract

The primary objective of our research was to design an efficient way of compressing HTML documents, which will reduce Internet's traffic or will reduce storage requirements of HTML data. In our work we present the lossless HTML transform (LHT) aiming to improve lossless HTML compression in combination with existing general purpose compressors. The main components of our algorithm are: a static dictionary or a semi-static dictionary of frequent alphanumerical phrases, and binary encoding of popular patterns, like numbers, dates or IP addresses. Alphanumerical phrases are not limited to "words" in a conventional sense as they can be XML tags, XML entities, URL addresses, e-mails, and runs of spaces. We have developed two versions of LHT: static and semi-static. Both algorithms have some disadvantages. Static LHT uses a fixed English dictionary required for compression and decompression. Semi-static LHT does not support streams as input (offline compression) as it requires two passes over an input file. Semi-static LHT creates a dictionary in a first pass and stores it within the compressed file.

Identifiers

book ISSN :	1068-0314
book ISBN :	978-0-7695-3121-2
DOI	10.1109/DCC.2008.74

Keywords

transforms data compression document handling hypermedia markup languages compressed file HTML document compression lossless HTML transform static dictionary frequent alphanumerical phrase binary encoding semistatic LHT fixed English dictionary dictionary lossless compression HTML

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Improving HTML Compression

Source

Abstract

Identifiers

Authors

Skibinski, P.

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Improving HTML Compression $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Skibinski, P.

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Improving HTML Compression