13

I understand that compression methods may be split into two main sets:

  1. global
  2. local

The first set works regardless of the data being processed, i.e., they do not rely on any characteristic of the data, and thus need not to perform any preprocessing on any part of the dataset (before the compression itself). On the other hand, local methods analyze the data, extracting information that usually improves the compression rate.

While reading about some of these methods, I noticed that the unary method is not universal, which surprised me since I thought "globality" and "universality" referred to the same thing. The unary method does not rely on characteristics of the data to yield its encoding (i.e., it is a global method), and therefore it should be global/universal, shouldn't it?

My primary questions:

  • What is the difference between universal and global methods?
  • Aren't these classifications synonyms?
Rubens
  • 4,117
  • 5
  • 25
  • 42

1 Answers1

3

Consider the following chunk of data:

1010010110100101

Universal - these are generic compression algorithms that are data agnostic. A crude version of run length encoding would fall into this category. The advantage is that it is very fast to compress and decompress. The downside is that it may be extremely ineffective based on the data to be compressed.

1111111111111111 -> 16 1 (lucky case)

1010010110100101 -> 1010010110100101 (unlucky case)

Local - this method would consider smaller segments of a fixed length, say 4, look for patterns and compress them. Eg. This data only contains these two types of patterns - 1010 and 0101. These patterns can be represented as 0s and 1s and the overall data will be a table representing the mappings, and something like 0101. This has the potential to result in a much smaller compressed size.

1010010110100101 -> 1010 0101 1010 0101 -> 0101 (0=1010,1=0101)

Global - this method would look at the entire data and find the optimal / much better patterns to compress the data. The example data contains just one pattern 10100101 and represent it as 00 along with the mapping table. This has the potential to obtain the smallest possible compressed size, but is also computationally the heaviest.

1010010110100101 -> 10100101 10100101 -> 00 (0=10100101)

doodhwala
  • 169
  • 4