2

According to its mathematical definition, a random algorithm $M: D\rightarrow R$ satisfies $\epsilon$-differential privacy if the adjacent datasets $x, y \in D$ where $D$ is a whole dataset and datasets $x$ and $y$ differs by only one record, and any set of $S \in Range(M)$, $Pr(M(x) \in S) \leq Pr(M(y) \in S) * e^{\epsilon}$.

The additive one is shown in this question: Differential privacy definition.

Dr. Dwork explains the advantage of using the multiplicative definition over the additive one in the Microsoft Research Lecture 2 (at about 2'50''). In short, with this multiplicative definition, it could be ruled out the possibility that an individual's record would be randomly selected and published.
However, I struggle to understand her meaning. I would appreciate any help in understanding this definition!

AleksanderCH
  • 6,511
  • 10
  • 31
  • 64
Coco_viva
  • 53
  • 3

2 Answers2

5

In short, with this multiplicative definition, it could be ruled out the possibility that an individual's record would be randomly selected and published.

Consider a malicious algorithm $M^*$ that picks a random individual's record from the input database (of size $n$) and outputs it. Note that this $M^*$ should not be considered secure for a good differential privacy definition because it compromises the privacy of a random individual.

However, the additive differential privacy definition regards $M^*$ as a secure algorithm for $\epsilon=1/n$. To see this, consider two input databases $X,Y$ (both of size $n$) differ with a single element $d$ (where $d\in X$ but $d\not\in Y$). Then, it's not hard to see $\Pr[M^*(X)\in S]\leq\Pr[M^*(Y)\in S]+1/n$ holds for any subset $S\subseteq R$, i.e., $M^*$ satisfies the additive $1/n$-differential privacy.

The above issue is fixed by the standard multiplicative definition. Note that for $S=\{d\}$, there is no $\epsilon$ such that $\Pr[M^*(X)\in S]\leq\Pr[M^*(Y)\in S]\cdot e^\epsilon$ because $\Pr[M^*(X)=d]=1/n>0$ and $\Pr[M^*(Y)=d]=0$. That is, $M^*$ does not satisfy the standard $\epsilon$-differential privacy.

Shan Chen
  • 2,755
  • 1
  • 13
  • 19
1

In Salil Vadhan's textbook The Complexity of Differential Privacy the author states in section 1.4

The choice of a multiplicative measure of closeness between distributions is important, and we will discuss the reasons for it later. It is technically more convenient to use $e^\epsilon$ instead of $(1 + \epsilon)$, because the former behaves more nicely under multiplication $e^{\epsilon_1} \cdot e^{\epsilon_2} = e^{\epsilon_1 + \epsilon_2}$.

You can find a very nice general discussion of why the definition is the way it is in Section 1.6.

One advantage of using $e^\epsilon$ instead of $(1+\epsilon)$ as far as I understand is that the composition guarantees (see Lemma 2.2) behave nicely. For instance if you have $\epsilon_1$-differentially private mechanism $M_1$ and $\epsilon_2$-differentially private mechanism $M_2$, then the mechanism $M(x) = (M_1(x), M_2(x))$ is $(\epsilon_1 + \epsilon_2)$-differentially private.

Cryptonaut
  • 1,106
  • 7
  • 19