dog.png - | 125 bytes. | Few but likely. | |
noise.png - | 246 bytes. | Many but unlikely. |
As you walk around the world looking at all the dogs and things, the images that fall on your image sensor are much more likely to be selected from the relatively small set of interesting images with visual structure, than from the very large set of images that look like random noise.
This is because the world is sparse. Visual structure consists of repetition and redundancy, such as the large area of empty space around the dog, or the number of similar pixels lined up in a row to make the dog's leg.
Lossless data compression works by exploiting sparseness, the structure and redundancy of information, to produce a representation with less bits. A compression method expects a certain kind of structure in the input; it reduces the size of inputs that have that kind of structure, at the expense of increasing the size of inputs that do not. No compression method can reduce the size of all possible inputs.
Compression and intelligence are related, because both have to do with finding patterns and structure in information. It takes some intelligence to recognise that an image contains the patterns of a dog. It is sparseness that makes compression and intelligence possible.