3

I am searching how can I calculate the entropy of a image for a Capture The Flag. I know what is the entropy theory and I tried a pair of things:

  1. Convert the jpg image on binary image and calculate the entropy of binary text.
  2. Viewing the images and intuiting which has greater entropy

I don't know how can I solve it. What can I do?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240

1 Answers1

-1

You don't need the probability distribution of images to calculate the information entropy of an image. That isn't how the entropy calculation works. For example, we can certainly calculate the entropy value of a symbol - say [@] - it would have an entropy value of 0, if we add multiple symbols...[@#], then the entropy value of two distinct bits of information would be 1.0, if we add even more [@#$] then the entropy value of the three distinct bits would be 1.58 - interestingly, if you add in redundant characters to your string, then you will reduce its entropy value, because you are not adding anything new to the string that we did not already know.

Consequently we can apply the same principle to image analyses, we do it all the time. You would just vectorize the pixel data, read the flattened vector data into a dataframe, then do the entropy calc on the dataframe. In my case, I just use scipy library from Python (scipy.stats.entropy). We have analyzed many images using the technique, on average an image has an entropy somewhere between 3 and 4. You can test this concept, by calculating the entropy of a pure white image (it should approximate zero, or be equal to zero), versus a more complicated image with graphs, people, chairs, cars, etc...(should be upwards of 4)

A fun way to cross check the validity of this, is to compare the entropy value of an "image of a string" to the entropy value "of the string bits" themselves - they won't be exactly the same (the image of a string is slightly higher (especially in languages that have multiple separated symbols within a single character bit [i.e., farsi, kanji, etc...]) - but in English they are pretty close.