9

I've got an application (detailed below) that calls for the use of a cipher that is commutative. I've been doing some googling & reading, and there are two algorithms that seem to get mentioned in these discussions--SRA (not RSA) and Pohlig-Hellman exponentiation. (Supposedly SRA is a variant of RSA?). I need something strong, so XOR will not serve.

I'd hoped to run across some actual software that implemented one of these guys so I could play around with them & demonstrate a scheme I have in mind, but I have not been able to find any. It looks to me like these things are not in the source code that comes with Schneier's Applied Cryptography, frx. Can anyone recommend a source for implementations of either SRA or Pohlig-Hellman?


Here's how I'm thinking to use this tech if I can get my hands on it. I work for a healthcare provider A, which operates in the same geographic area as our colleague/competitor provider B. We know that we share some patients, but we don't know which, or even how many people we have in common.

We could just swap lists of say, the SSNs of our patients & find out who we share, except that we don't really trust each other to treat this sensitive data w/the care that it's due, and wow would people freak out and hate us if it got out that we were making free w/their SSNs, etc. etc.. So--please take for granted that sending raw SSNs is neither practical nor desirable.

But here's what we maybe can do. I grab my commutative-cipher software, generate an encryption key, save it off somewhere safe, and produce a file of encrypted SSNs (each SSN encrypted separately with my key). My colleague at provider B does likewise. Then we swap encrypted lists, and each encrypt the other guy's encrypted SSNs with our own keys. Then we swap the doubly-encrypted lists. Because the cipher is commutative, encrypting with key A first, and then key B should give the same results as encrypting with key B first and then key A. Which means that any values that are shared on the two doubly-encrypted lists represent the same SSN, and therefore a shared patient. So we know exactly who we share, without exposing the SSNs of our entire patient list.

kelalaka
  • 49,797
  • 12
  • 123
  • 211
Roy Pardee
  • 93
  • 1
  • 4

5 Answers5

8

To clarify a little, you need "encryption" that is commutative, deterministic (otherwise the commutative ciphertexts won't necessarily match even if the plaintexts do), and has a private encryption function (otherwise given that it is deterministic and the plaintext space is small, an exhaustive search would be possible). For example, Elgamal is commutative but it is also randomized and public key. Encryption is in quotes because, since you never decrypt, you actually just need a trapdoor function with those properties.

Pohlig-Hellman and SRA are suitable.

As far as implementations go, I do not know of any. However Pohlig-Hellman would be easy to implement with just a number theory library (it is just a single modular exponentiation -- you don't need to decrypt in your application). If you have a DSA library that has a function to generate a public key from a secret key, you can use that as the encryption function.

Similarly, SRA is just RSA with two key pairs. If you could call the key generate function of an RSA implementation is such a way as to get $e,d$ if you supply $p,q$, then that would work.

Both of these ciphers are old and were developed before certain attacks were known. For Pohlig-Hellman, you should work in a multiplicative subgroup and this requires encoding the plaintext. SRA could leak the Jacobi symbol of the plaintext like RSA (I think even if $e$ is not known?).

Finally, the application you are implementing is called "private set intersection." There are lots and lots of papers on how to do it, and potentially some implementations of the whole protocol. There are some issues with the way you are implementing it assuming that one of the parties is malicious.

If both parties are honest (only submit true values) and fair (one party learns the result before the other), it seems ok. But there are all sorts of attacks otherwise that could be resolved by using a private set intersection protocol designed for exactly this purpose.

PulpSpy
  • 8,767
  • 2
  • 31
  • 46
3

The right solution is to use a private set intersection protocol. Using Pohlig-Hellman is not the best approach. This problem has been studied in the literature, and there are sophisticated and effective solutions available; they go by the name "private set intersection".

In other words, you are asking the wrong question. You have a particular application, and you made the assumption that Pohlig-Hellman (a particular mechanism) is the right solution to that application. That assumption turns out not to be valid. Fortunately, you told us about your particular application -- good call, that was the right thing to do! In the future, when consulting with cryptographers, you may have better results if start off by explaining the application, without making any assumptions about what the best mechanism will necessarily be.

D.W.
  • 36,982
  • 13
  • 107
  • 196
0

If you consider each SSN as kind of like a (low-entropy) password, what you are trying to do seems very similar to what people implementing zero-knowledge password proof are trying to do.

Would the source code at any of these locations help?

ZK-SSH - A Zero Knowledge Implementation for OpenSSH http://zk-ssh.cms.ac/

Clipperz zero-knowledge web application http://code.google.com/p/clipperz/

Timed Zero Knowledge Proof (TZKP) http://rtds.cs.tamu.edu/tzkp.php

David Cary
  • 5,744
  • 4
  • 22
  • 35
-1

I've heard of some people with a similar problem, who have come up with a clever solution. There's a trusted organization that maintains a suppression list -- a "Do Not Email" list of email addresses, and that organization wants to tell a few bulk emailers not to send email to any of the addresses on that list.

However, the trusted organization doesn't want to simply give a copy of that list of email addresses to those bulk emailers -- there's the risk that one or more of those bulk emailers will "accidentally" (?) take that list and start sending spam to everyone on the list.

So, here's the clever bit -- the trusted organization only gives out MD5 hashes, one for each email address on their list. It never gives out the email addresses themselves.

Would that work for your application?

In other words, try this:

  • You generate a random string R1, and give it to your colleague at provider B.
  • For each of your own patient's SSN, generate a hash H by using SHA-256 on the concatenation of the R1 and the SSN -- perhaps using sha1sum.
  • Send each "hashed SSN" H1 to your colleague.
  • Discard R1 and the "hashed SSNs" that you generated from it.

Meanwhile, your colleague does the same, so you end up with another random string R2, and a long list of "hashed SSN" values.

  • For each of your own patient's SSN:
  • generate a hash H2 by using SHA-256 on the concatenation of the R2 and the SSN.
  • Compare that value H2 that you computed to every value in the list your colleague gave you. If that value H2 matches any of the "hashed SSN" values that your colleague handed to you, then you have a hit -- your colleague probably has the same SSN, and therefore a shared patient.

So now you know the exact SSN of every patient you share, without exposing the SSN of any patient.

(I'm tempted to generate a fresh random string Rn for each and every SSN, and send paired "Rn, hashed SSN" pairs to your colleague -- is that overkill?)

David Cary
  • 5,744
  • 4
  • 22
  • 35
-2

I suppose there are some commutative asymmetric ciphers out there.

$\newcommand\Enc{\operatorname{Enc}}\newcommand\KStr{\operatorname{KStr}}$But if you have longer messages, I would consider using a synchronous (binary) stream cipher. This is an algorithm which just outputs a long stream of pseudorandom bits, which then can be XOR-ed with the plaintext or ciphertext to get the other one: $\Enc_k(P) = \KStr(k) \oplus P$.

Use a stream cipher with random/unique initialization vector (a new one for each record) to make the keys reusable.

For two keys $A$ and $B$ and a plaintext $P$, we have then

$$ \Enc_A(\Enc_B(P)) = \KStr(A) \oplus \KStr(B) \oplus P = \KStr(B) \oplus \KStr(A) \oplus P = \Enc_B(\Enc_A(P)),$$

i.e. just your commutativity.

After actually reading your use case description, it looks like it won't work easily for your use case: you would need to manage the initialization vectors in a way that the same SSN is always encrypted with the same IV, without giving away something about the SSN by the IV. Sorry.

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119