How to collect, process, and transmit data securely?

Question

In my question "Authenticating data generated by a particular build of an open source program", Dave Cary requested that I post a question stating my real problem on a high level rather than the partial solution and abstract approach I took. So here goes:

Need to develop a program that collects raw scores at a competition, compute the winning scores for that competition, and save that results so that it can be transmitted to a server for display and validation. The results must be in some human readable form (preferably XML). There must be some way to trace what version of a program generated a set of results. There must be complete transparency in the program in terms of how the winning scores are processed and computed. The server will only accept correctly computed results.. The server should be able to distinguish between results computed by an official build of the program versus a wildcat build. The machines running the program are essentially offline. Not all machines will be updated with new versions at the same time.

Marsh Ray · Answer 1 · 2011-08-28T12:47:22.800

This question comes up often enough in the context of cryptography that it probably is relevant in a practical sense. I suspect we'll hear even more about it if homomorphic encryption raises interest in "computation in an adversarial setting".

It's not just theoretically unsolvable. A great many software development organizations have tried to keep data secure on someone else's computer and they have all failed. The only exceptions are when no one cares enough about the data to give it any effort, but this is not reliably predictable in advance.

Consider the great effort online game software companies go to in order to discourage cheaters. Yet still, cheating is sometimes a problem even with closed (and continuously online) systems like the PlayStation 3. Even trusted hardware modules fall to reverse engineers with "hardware-based liquid chemical and gas technologies in a lab setting to probe with specialized needles to build tungsten bridges".

The only reasonable goal is to deploy enough obfuscation and misdirection that it delays the inevitable long enough to be worth its cost. As you say in the comments, perhaps you can make it more worthwhile for someone to pursue the contest legitimately.

Obfuscation may occasionally be worthwhile, but its costs and benefits are notoriously difficult to quantify in advance. For one thing, it's somewhat antithetical to your goal of "complete transparency". How can the participants know that the obfuscated code is something they should trust to run on their own computer? Furthermore, it's likely that some would find the challenge of hacking to be more interesting than the competition itself.

That's why every hard-core security professional will advise you not to attempt this when it really matters. Here are some things not to do:

Don't have the application initialize a PRNG from a wide variety of build-dependent data (e.g., a key supplied at build time) as well as system-dependent data. Don't incrementally mix into the PRNG data from intermediate processing steps. E.g., all program inputs, their timestamps, and the arguments of every function call. In the transmitted results, don't include all the runtime-specific data used to initialize the PRNG, the raw input, the timestamps, and some amount of final PRNG output. Don't use self-modifying code. Don't take hashes of the executable memory pages at various points in time. Don't fork the process periodically and take a crash dump which is then included in the submitted data. Don't make a custom gcc back-end targeting a custom VM runtime that is revealed only at the time of the competition. Don't include a bunch of other misdirecting code that ultimately amounts to meaningless random junk in the output. Don't use hardware exceptions and signals to effect ordinary control transfer. Don't develop another program to verify the results produced by the official build, the source of which is revealed only after the competition has closed.

You don't want to do any of that because it can all be defeated in only a few hours or days, max, against a skilled reverse engineer.

Read the history of Microsoft PatchGuard, a software-only system of preventing unauthorized modifications to a running kernel. Version 3, for example, held out for exactly one weekend.

How to collect, process, and transmit data securely?

1 Answers1