11

I have a basic understanding of encryption and I got back to the topic because of an interesting site that encrypts financial data using homomorphic encryption (HE) and I would be happy for any input from the community here.

They don't really tell the precise method they use. In the blog they mention the Fan and Vercauteren scheme and on the other hand they mention order-preserving symmetric encryption.

They say that as addition and multiplication in HE is "preserved" one can apply machine learning algorithms (they usually use all operations - not only polynomial ones).

My question: if data is enrypted then the data that was originally on the real line is usually mapped to the algebraic structure of a ring. Thus if we get those elements of the ring, we have to perform the operations that are defined on this ring. Finally we can not (!) apply the usual real number operations that the ML algorithms consist of.

Is this tue for HE? Is it true for order-preserving symmetric encryption?

An example as EDIT as I am not a crypto-pro at all: say I am given the following data: $$ (0.2,0.1,0.5,0); (0.1,0.2,0.3,1); (0.02,0.7,0.33,1) $$ and several rows thousands of them (and in my application more columns). In this example the first 3 entries are inputs and the 4th one is the target. All I know is that the inputs were decrypted (either HE or order-preserving symmetric) and I see that each column has exactly 1001 unique values (which makes me think that the data is not real numbers but data on some grid or finite ring). If I interpret the inputs as real numbers and perform the usual ML-algorithms (logistic regression and more complex ones). Is this mathematically sound or am I doing complete nonsense (because the data is not real numbers but rather objects in an algebraic strucutre that does not allow for the usual $+$,$\times\ldots$?

Richi W
  • 163
  • 1
  • 9

1 Answers1

6

Homomorphic Encryption on Reals
In theory, homomorphic encryption can be done on real numbers. This answer describes two options you have when dealing with real numbers or operations that will result in real numbers. Kristin Lauter is doing some of the cutting edge research in this area. In a recent paper, CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy, the authors (which includes Kristin), state:

One thing to note is that the encryption scheme does not support floating-point numbers. Instead, we use fixed precision real numbers by converting them to integers by proper scaling, although there are also other ways to do this (Dowlin et al., 2015).

They go on to say:

The neural network takes as its input a vector of real numbers and, through a series of additions, multiplications, and other real functions, it computes its outputs, which are also real numbers. However, the homomorphic encryption scheme works over the ring $R_n^t := \mathbb{Z}_t [x] /(x^n + 1)$. This means that some conversion process between real numbers and elements of $R_n^t$ is needed.

So they are not operating on real numbers directly. Instead, they encode (using scaling as mentioned in the other answer I linked to). And then decode after decryption. III.B of the Dowlin paper discusses encoding methods for real numbers. Real numbers are encountered everywhere in data analytics, and so I expect there to be continued research into how best to operate on reals using homomorphic encryption. In other words, it is an active area of research and not something that you can just quickly pull a tool out of your hat and off you go. Some thought and care must be taken in making sure you aren't invalidating the analysis techniques by using some of the encoding methods that are out there.

Order-Preserving Symmetric Encryption
The claim in the blog post is

Simpler schemes like order-preserving symmetric encryption also allow strong security in certain settings, and are easy to use with out of the box machine learning tools.

The author doesn't appear to go into much more detail than that, which leaves me wondering what they really mean. Order-preserving symmetric encryption is all about encrypting data in such a way that a natural ordering of the plaintexts is preserved. This allows you to, given only ciphertexts, do comparisons. This leads to useful operations. Specifically, the paper the post links to mentions range queries on encrypted numbers. The 2004 SIGMOD paper by Agrawal describes how their OPE scheme can be directly applied to IEEE 754 single precision floating point numbers.

Not being a machine learning expert, I can't tell you how useful only being able to do comparisons, operations like min and max, and range queries would be for machine learning. I guess if your machine learning tools require only range queries and comparisons, OPE could be used with "out of the box machine learning tools", but something tells me that would leave much to be desired in terms of data analysis.

mikeazo
  • 39,117
  • 9
  • 118
  • 183