Increased CRC collision probability when adding bits to input message

Question

The Scenario

I have a message string I need to transport over a wireless network that may be unreliable. This message string is about 100 bits long, and is packaged with an 8-bit CRC. When the message is received, it's validated with the CRC and unpacked. I would like to add a few "message version" bits to the message, but there's no room available.

Solution Attempt

Since I can't add a version number to the message, I would like to add the message version bits to the input string when computing the CRC, but only transmit the basic message and CRC as before (without the version bits). When the message is received, I'll compute the CRC for Version 1. If it fails, I'll try computing the CRC for Version 2. If that fails, the message has been garbled. If either one succeeds, then I know the version number.

Question

The problem is that I need to prove (or at the very least, convince myself with math) that increasing the bit number of the input message doesn't significantly increase the probability of a CRC collision.

I do not need a perfect answer here, since I understand that network simulation and choice of the CRC polynomial play huge roles. I'm using 0x2F for my 8-bit CRC polynomial, if that matters. For network error, I think it's reasonable to assume the error noise is really random (unless you think otherwise... all advice welcome).

My Attempt

Source: http://web.mit.edu/course/16/16.682/www/leca.pdf (See section "Performance of CRC")

When using the equations in this document, I find that the probability of detecting a transmission failure depends only on the number of bits in my CRC (r=8, in this case), so P = 1 - 2^(-r) = 99.6%. But surely the input bits matters in some way, right? I've seen this in other papers too, but I can't figure out if a simplification is being made. Help!

Thanks!

score 11 · Accepted Answer · answered May 08 '13 at 15:45

First of all, if your goal is to keep the garbled messages to "once every hundred years", well, you already don't meet that goal, even before the change. With an 8 bit CRC, a random change has a probability 1/256 of being accepted; hence if your wireless network has a transmission error at least once every three months (which, to me, sounds like an unrealistically reliable transmission mechanism), you get an undetected garbling more often than "once every hundred years".

Now, what you're doing is effectively assigning two different CRC (or hash) values for each message; when you receive a message and its CRC (hash), you'll accept the message if either of those two hash values appear in the received CRC field, and reject the message if it's any of the other 254 values. Hence, you're reducing the probability of undetected error from 1 in 256 to 1 in 128. The actual values of the bits don't really change this, as long as we model the garbling process as 'random'.

Now, the CRC algorithm has the nice property that if the error is limited to an n-bit burst (that is, all bits outside of that n-bit region are received correctly, with n=8 in your case), then it'll always detect the error. This is a nice property if the garbling process will tend to affect only limited portions of the entire message. However, I don't see how you can maintain that property (even with n=7), while maintaining backwards compatibility (which I suspect is important).

So, the bottom line is that your idea will make undetected errors more likely; however you're already not meeting your stated goal. You may need to rethink this protocol more drastically if your reliability requirements are a hard requirement.

score 6 · Answer 2 · edited Apr 13 '17 at 12:48

This started as a comment to @Poncho's fine answer, and grew over the 600-char limit. Point is: a careful choice of the definition of V2 messages can keep some the existing capabilities of the original CRC to always detect some kinds of errors.

Foremost, we are interested in short error bursts (where all bits in error are within a small number of consecutive bits, a fair model of some errors likely to occur under many practical circumstances); and, marginally, errors affecting an odd number of bits (this is desirable in some communication contexts using a descrambler with the property that any one-bit error at the physical level expands to a fixed error burst with an odd number of bits; and, if all else was pointless, it at least ensures that any single-bit error is detected).

I'll assume the CRC for V1 is such that the remainder of the polynomial representing the message (including the CRC portion therof) by a binary reduction polynomial $P$ (with degree $d$ and constant term $1$) is some constant independent of message content (at least for a given message length, in the absence of error). Any textboox CRC, and most of these used in communication contexts, are built with this property. Such CRC always detect errors bursts only affecting bits all within a segment of at most $d$ bits in the message and CRC. This holds because for all polynomial $Q$ of degree less than $d$ and constant term $1$ (representing the error burst), for all $n\in \mathbb N$ (representing the position of the last bit in error from the last bit of the CRC), $(Q\cdot x^n)\bmod P\ne 0$ (proof is easy by induction). Also, such CRC detects any error affecting an odd number of bits when the reduction polynomial $P$ has an even number of terms.

At least, we want to ascertain that, in the absence of transmission error, a V2 message can not be confused with one of V1, and vice versa. This is achieved simply if, in V2 messages, the field used as CRC for V1 is the exclusive-OR of the correct CRC for V1 and some non-zero $d$-bit constant $K$. This construction can be expressed in terms of polynomial: we add the polynomial of degree at most $d-1$ with binary coefficients representing $K$ to the polynomial representing the CRC. We'd like to choose $K$ in a manner optimizing the capability to detect our special kinds of errors.

This is sometime possible. For example, if the reduction polynomial is $P=x^8+1$ (that is, if the CRC for V1 reduces to bytewide exclusive-OR, within some initialization value), we should choose $K=x^7+x^6+x^5+x^4+x^3+x^2+x+1$ (that is, use a CRC for V2 that is the bytewide complement of the CRC for V1). This will catch any error where all bits affected are within $7$ consecutive bits (versus $8$ consecutive bits without the introduction of V2). And, because both $P$ and $K$ have an even positive number of bit set, this also detects any error involving an odd number of bits.

With the reduction polynomial $P=x^8+x^2+x+1$ used by ATM communication, we should choose $K=x^7+x^6+x^5+x^4+x^3+x^2+1$; that will also catch any error where all bits affected are within $7$ consecutive bits. This is because $(K\cdot x)\bmod P=K$, hence for any polynomial $Q$ of degree $6$ or less with non-zero constant term (representing a burst of error of length at most $7$), and any integer $n$ (representing the position of the error), $(Q\cdot x^n)\bmod P\ne K$ (proof by induction). Hence the error $Q$ can not change a valid V1 message into a valid V2 message, or vice versa; while the aforementioned CRC property to always detect short error burst insure that the error $Q$ can't go undetected without version change.

For primitive reduction polynomials $P$ and long messages, no matter how we choose $K$, a single-bit error can change a V2 message to V1, or vice versa. But we can still choose the constant $K$ such that, up to a certain message size, short error bursts are always detected. Unless I err, with the primitive reduction polynomial $P=x^8+x^4+x^3+x^2+1$ used by AES3 (no relation to the block cipher AES), we can use $K=x^7+x^3+x^2+1$, and messages up to 148 bits (including CRC) are fully protected against errors within a burst of at most 3 bits. This looks quite like what the question asks for.

Further, we can build V2 messages by first appending some other error detection code (or MAC, to become topical for crypto.se), then appending the CRC for V1 XOR-ed with $K$ as above. This makes it unlikely that a V1 message is changed to V2, and the undetected error rate for long random errors can be back to almost that for V1 (with most undetected errors transforming a message of either version into another V1 message). Still further, once a source is known to use V2, perhaps a receiver can refuse V1 messages form that source, and then we'll have V2 more robust than V1.

score 2 · Answer 3 · answered May 13 '13 at 18:04

One problem not mentioned here is that CRC collisions are a certainty. If you were using a cryptographically secure hash, you would never encounter a false positive where both solutions were possible. In this scheme, every 256 messages would yield identical CRC values, and your different versions would be indistinguishable.

You might be able to "stutter" an internal number in the message to break the collisions when they occur, e.g. incrementing a record number.

Instead, could you restrict a common internal number to serve as version number? If transaction number were always a positive integer in bytes 8-11, could you use a negative transaction number to indicate version 2?

score 1 · Answer 4 · edited Apr 13 '17 at 12:48

1

CRCs are not cryptographically secure. If you need cryptographic security, replace the CRC with a message authentication code (MAC).

If you don't need cryptographic security, then your question is off-topic for Crypto.SE and you should probably flag it to ask the moderators to migrate it to Computer Science.SE.

edited Apr 13 '17 at 12:48

Community

1

answered May 08 '13 at 20:19

D.W.

36,982
13
107
196

Increased CRC collision probability when adding bits to input message

The Scenario

Solution Attempt

Question

My Attempt

4 Answers4