Is LUKS Anti-Forensic information splitter (AFsplit) indistinguishable from random data?

Question

I want to know if AFsplit(random_data_input, stripes, digestmod=sha) output is indistinguishable from random data.

The attacker only has access to AFsplit output data.

EDIT: This question is too vague as the answer depends on the randomness hypothesis taken about random_input_data. Take care of it reading the proposed answers.

The purpose is to use this function in a deniable encryption implementation.

About AFsplit: AFsplit provides a security against forensic attacks on disk bad blocks. AFsplit splits data among several disk blocks so that if a block is put in reserved "bad block" area, it doesn't contain a complete sensitive information (particularly, a revoked key slot). LUKS uses AFsplit to store the key slots.

Note: I'm not speaking of LUKS itself that is, of course, a plain header. I just want to focus on this particular function.

Below is AFsplit Python implementation:

import sha, string, math, struct
from Crypto.Util.randpool import RandomPool
from Crypto.Cipher import XOR
def _xor(a, b):
    """Internal function to performs XOR on two strings a and b"""

    xor = XOR.new(a)
    return xor.encrypt(b)

def _diffuse(block, size, digest):
    """Internal function to diffuse information inside a buffer"""

    # Compute the number of full blocks, and the size of the leftover block
    full_blocks = int(math.floor(float(len(block)) / float(digest.digest_size)))
    padding = len(block) % digest.digest_size

    # hash the full blocks
    ret = ""
    for i in range(0, full_blocks):

        hash = digest.new()
        hash.update(struct.pack(">I", i))
        hash.update(block[i*digest.digest_size:(i+1)*digest.digest_size])
        ret += hash.digest()

    # Hash the remaining data
    if padding > 0:
        hash = digest.new()
        hash.update(struct.pack(">I", full_blocks))
        hash.update(block[full_blocks * digest.digest_size:])
        ret += hash.digest()[:padding]

    return ret

def AFSplit(data, stripes, digestmod=sha):
    """AF-Split data using digestmod.  Returned data size will be len(data) * stripes"""

    blockSize = len(data)

    rand = RandomPool()

    bufblock = "\x00" * blockSize

    ret = ""
    for i in range(0, stripes-1):

        # Get some random data
        rand.randomize()
        rand.stir()
        r = rand.get_bytes(blockSize)
        if rand.entropy < 0:
            print "Warning: RandomPool entropy dropped below 0"

        ret += r
        bufblock = _xor(r, bufblock)
        bufblock = _diffuse(bufblock, blockSize, digestmod)
        rand.add_event(bufblock)

    ret += _xor(bufblock, data)
    return ret

Mike Edward Moras · Answer 1 · 2017-08-18T22:37:13.897

Is LUKS Anti-Forensic information splitter (AFsplit) indistinguishable from random data?

No.

AFSplitting is merely meant to provide diffusion, not cryptographically secure randomness.

When you check the LUKS On-Disk Format Specification (eg PDF of v1.1.1) you'll notice it states

LUKS uses anti-forensic information splitting as specified in [Fru05b]. The underlaying diffusion function shall be SHA1 for the reference implementation, but can be changed exactly as described in the remarks above.

Even though the (broken due collision attacks) SHA-1 function is used as a diffusion function in the reference implementation, that doesn't mean the AFSplitting function does provide any more than that – diffusion.

While it definitely has a (let's just call it) random character, you should not generally rely on it to provide cryptographically secure randomness indistinguishable from random data. That's not what AFSplitting (by itself) is intended to provide. While diffusion surely adds to randomness, there's a big difference between “diffusion” and “cryptographically secure randomness indistinguishable from random data”. Both are hardly exchangeable terms.

And I'm not yet considdering practical breaks against SHA-1 which also might or might not render the answer into a clear “no”… but that's another question.

EDIT

To clarify this a bit more indeep, trying to clear up some confusion in the comments by the asker.

Let's remember the question asked

I want to know if AFsplit(random_data_input, stripes, digestmod=sha) output is indistinguishable from random data, not knowing random_data_input.

and

… I just want to focus on this particular function.

The question didn't ask if PBKDF2 is indistinguishable from random data and the question did not assume input to AFSplit to be indistinguishable from random data either.

Therefore, when merely focussing on the AFSplit functionality (as the question asks), one can and has to cryptanalytically assume distinguishable data inputs too. Otherwise, the input to AFSplit would have needed to be defined as “exclusively data indistinguishable from random data”, but the question doesn't limit the input to such indistinguishable data when asking what AFSplitting outputs.

In LUKS the user password is entered and processed by PBKDF2 (which provides the randomness which should be indistinguishable from random data. The master key is then splitted by the AFsplitter into a number of stripes… via diffusion (using SHA-1 or an alternative replacement).

So, any data indistinguishable from random data needs to be produced outside the AFSplit function. If you'ld feed AFSplit distinguishable data, AFSplit won't magically turn the data you feed it into data indistinguishable from random data; it merely diffuses what you feed it. That's all you get when keeping the focus on this particular function (and the main reason for no as an answer). How the input to AFSplit should be produced in a LUKS implementation is another story, and goes well beyond the AFSplit function (which is merely a diffusion-providing piece of the LUKS cake) and therefore well beyond what was being asked – which was to focus on this particular function.

score -2 · Answer 2 · edited Aug 15 '17 at 13:13

-2

This technique allows to spread some small data all over the disk sector. The sector is partitioned in several random stripes and you need all of them to get the data. So, yes, the sector has a random aspect.

edited Aug 15 '17 at 13:13

Mike Edward Moras

18,161
12
87
240

answered Apr 16 '16 at 08:07

ddddavidee

3,364
2
24
34

KrisWebDev · Accepted Answer · 2017-08-19T09:55:59.627

EDIT to be more clear regarding hypothesis

I was too lazy to do the code analysis but it had to be done to provide an objective answer. This function mechanism is indeed poorly known.

The function itself doesn't provide an output indistinguishable from random data.

However, if and only if these 3 conditions are met, the output is indistinguishable from random data:

The implementation uses a secured RNG (random-number generator). Note that Python Crypto.Util.randpool is deprecated!
AFSplit input data is indistinguishable from random data.
The attacker has only access to AFsplit output data

Note that it is unecessary that AFSplit input digest (hashing algorithm) provides an output indistinguishable from random data. The hashing algorithm choice adds nothing to the randomness.

AFsplit is commonly used (by LUKS) to split a randomly-generated key, that's why the second condition is often met and was somehow implied in the question by using random_data_input variable name.

Demonstration

Human readable, not Python, . means concatenate, ⊕ means XOR, NULL means the NULL character, [:3] means the first 3 characters:

AFSplit("D₁D₂D₃", 2, sha256) = "R₁R₂R₃R₄R₅R₆".("S₄S₅S₆"⊕"D₁D₂D₃")

Where:

"D₁D₂D₃" is the chosen input sequence of 3 characters (example key).
2 is the chosen number of stripes (example).
"R₁R₂R₃R₄R₅R₆" is a sequence of 6 random characters generated by the RNG
"S₁S₂S₃" = _diffuse("R₁R₂R₃", 3, sha256) = sha256(NULL."R₁R₂R₃")[:3]. This is the result of the 1st _diffuse iteration.
"S₄S₅S₆" = _diffuse("R₄R₅R₆"⊕"S₁S₂S₃", 3, sha256) = sha256(NULL."R₄⊕S₁"."R₅⊕S₂"."R₆⊕S₃")[:3]. This is the result of the 2nd _diffuse iteration.

Note that AFSplit _diffuse function behaves a bit differently when len(data) >= digest.digest_size, by hashing the input in chunks. This is not taken into account in this example. The conclusion is the same anyway.

In more comprehensible terms, AFsplit output contains:

A bunch of leading stripes "R₁R₂R₃R₄R₅R₆" containing random data generated by the PRNG. The leading stripes length = stripes x length(random_data_input).
The last stripe ("S₄S₅S₆"⊕"D₁D₂D₃"). Its length is equal to length(random_data_input). This last stripe is a XOR between:
1. A hash combination "S₄S₅S₆" of the randomly-generated leading stripes
2. The input data "D₁D₂D₃"

So if some (but not all) stripes are seized in a disk bad block, this disk-seized stripes won't be sufficient to apply the opposite AFmerge function to reconstitute the leading stripes hash combination, in order to de-XOR random_input_data. That's the very purpose of AFsplit/AFmerge: you need ALL AFsplit output data to apply AFmerge and reconstitute random_input_data.

To come back to the question:

The leading stripes are generated by the PRNG, so they are as much random as what the PRNG provides.
Regarding the last stripe "S₄S₅S₆"⊕"D₁D₂D₃":
1. The hash combinaton "S₄S₅S₆" can be reconsituted from the known output leading stripes "R₁R₂R₃R₄R₅R₆", so the hashing adds no entropy/randomness to this last stripe.
2. The random_input_data "D₁D₂D₃" entropy/randomness is maximal: By hypothesis, random_input_data is indistinguishable from random data.
3. The resulting XORing ⊕ entropy is equal to the entropy of "D₁D₂D₃", so the result is indistinguishable from user data. See: Mixing Entropy Sources by XOR?

So the answer is yes, if and only if the 3 conditions above are met.

Is LUKS Anti-Forensic information splitter (AFsplit) indistinguishable from random data?

3 Answers3

EDIT

Demonstration

Linked