7

Call a family $F$ of subsets of $S=\{1,2,\ldots,n\}$ distinguishing if for every two distinct subsets $A,B$ of $S$ there exists $X\in F$ so that $|A \cap X|\ne |B \cap X|$. Show that there exists such a distinguishing family $F$ of $S$ of size $|F | \leq \dfrac{(2 + o(1)) n}{\log_3n}$, where $o(1)$ is a quantity which is much smaller than $1$.

This was asked me by a friend, who has trouble solving it. Can someone help me? He got it while trying to learn the probabilistic method. Thanks.

shadow10
  • 5,737
  • This seems to be similar to a coin weighings problem in exploiting a ternary expansion to distinguish all $A\neq B$ with a minimum number of tests $X$. – hardmath Jul 28 '14 at 17:10
  • Could you explain a bit? I don't know what you are talking about. Note it was mt friend who was learning the method, not me. I only have some elementary knowledge. :P @hardmath – shadow10 Jul 28 '14 at 17:13
  • 1
    We need to distinguish all $2^n$ possible subsets using size comparisons $|A\cap X|$ versus $|B\cap X|$, such that having all comparisons equal is to imply $A=B$. A lower bound on the number of comparisons needed is $n/log_2 n$, and the claim in the Q is we can come within a constant of this – hardmath Jul 28 '14 at 17:38
  • A quite different problem with similar nomenclature is Katona's Problem about the minimal size of a separating family of sets. Here $log_3 n$ makes an appearance in the solution, due independently to Yao(1976) and Mao-Cheng(1982). A short homework exercise sketches the solution. – hardmath Jul 29 '14 at 21:44
  • It is easy to find a distinguishing family with $n$ subsets, e.g. take all the singleton subsets of ${1,\ldots,n}$. The smallest example with fewer than $n$ subsets seems to be ${1,2,3,4}$, for which a family with three subsets can be distinguishing, e.g. ${{1,2,3},{1,2,4},{1,3,4}}$. – hardmath Jul 30 '14 at 10:32
  • The problem posed here (which continues to frustrate my efforts to find a widely used name in the literature) is a non-adaptive search in the sense that the family $\mathcal{F}$ is chosen ahead of time and doesn't change with test results for any $X\in \mathcal{F}$. An adaptive version of the problem was posed in this previous Math.SE Question. – hardmath Aug 04 '14 at 22:23

2 Answers2

2

What follows is a partial answer with a reasonable prospect of being polished into a fuller one.

A family of subsets $\mathcal{F}$ of the finite set $[1,\ldots,n]$ is a distinguishing family iff for any two distinct $A,B \subseteq [1,\ldots,n]$, there exists $X \in \mathcal{F}$ such that:

$$ |A\cap X| \neq |B\cap X| $$

The claim is there exists a distinguishing family $\mathcal{F}$ satisfying:

$$ |\mathcal{F}| \leq \frac{(2 + o(1)) n}{\log_3 n} $$

Letting $|\mathcal{F}| = m$, an equivalent statement is:

$$ \frac{n}{m \log_2 n} \geq \frac{1}{2} \log_3 2 - o(1) $$

with $\frac{1}{2} \log_3 2 \approx 0.31546$.

We are able to show the existence of infinitely many $n$ for which this can be met.

The problem of finding a small(est) distinguishing family $\mathcal{F}$ can be restated in terms of a binary (zero/one) $m\times n$ matrix $M$ such that multiplication by $M$ is an injective (one-to-one) map from $\{0,1\}^n$ into $\mathbb{N}^m$. Here the $m$ rows of $M$ are binary representations of the $m$ sets $X_i$ belonging to $\mathcal{F}$.

If $M_{ij} = 1$ when $j \in X_i$ and zero otherwise, and if $\vec{b}$ is a column s.t. $b_i = 1$ when $i \in B \subseteq [1,\ldots,n]$ and zero otherwise, then:

$$ M\vec{b} = (|B\cap X_i| : i=1,\ldots,m)^T $$

Since every subset $B \subseteq [1,\ldots,n]$ is represented by some $\vec{b} \in \{0,1\}^n$, $\mathcal{F}$ is a distinguishing family if and only if all the images $M\vec{b}$ are distinct, i.e. $|M\{0,1\}^n| = 2^n$.

The injectivity of multiplication by $M$ on $\{0,1\}^n$ is a weaker condition than having rank $n$, i.e. injectivity on all $\mathbb{N}^n$. A smallest binary matrix injective multiplication example with rank$(M) \lt n$ seems to be one like this:

$$ M = \begin{pmatrix} 1 & 1 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 \end{pmatrix} $$

which corresponds to distinguishing family $\{\{1,2,3\},\{1,2,4\},\{1,3,4\}\}$ of subsets of $\{1,2,3,4\}$.

Formulating the problem in matrix multiplication terms allows algebraic methods to be used. In particular a slightly different problem is considered in connection with Code Division Multiple Access (CDMA) systems in a 2009 paper by Sh. Dashmiz, P. Pad, F. Marvasti:

New Bounds for Binary and Ternary Overloaded CDMA

An explicit construction is given there for antipodal binary $m\times n$ matrices $A$ that are injective on $\{\pm 1\}^n$, where by antipodal binary we mean the entries of $A$ are drawn from $\{\pm 1\}$. Given an $m\times n$ solution, the construction provides one that is $2m\times (2n + m - 1)$.

Multiplication by any $m\times n$ matrix $A$ being injective on $\{\pm 1\}^n$ implies that it is also injective on $\{0,1\}^n$. To show this, apply the injective affine mapping $\vec{b} \to 2\vec{b}-\vec{1}$, so that if multiplication by $A$ of two distinct binary vectors in $\{0,1\}^n$ gives equal images, then already multiplication by $A$ of two distinct vectors in $\{\pm 1\}^n$ would have given equal images. The converse argument is similar, and henceforth we will refer to injective matrix multiplication without mention of whether $\{\pm 1\}^n$ or $\{0,1\}^n$ is the underlying domain.

Further: If $A$ is an $m\times n$ antipodal binary matrix with injective multiplication, then an $(m+1)\times n$ 0,1-binary matrix $M$ with injective multiplication can be formed from $A$ by appending a row $[1,1,\ldots,1]$ and using elementary row operations to clear any negative entries in $A$. Moreover the antipodal binary matrix can always be chosen to have already a row of all ones, so the need to add an extra row when converting from antipodal binary to 0,1-binary matrix can always be avoided.

Starting from a variety of $m\times n$ injective binary matrices, the ratio $\frac{n}{m \log_2 n}$ generated by the recursion $(m,n) \to (2m,2n+m-1)$ remains above $\frac{1}{2} \log_3 2$. However these recursive sequences have gaps in the $n$ values represented, and it is not yet clear that the bound can be established for all the intermediate values.

hardmath
  • 37,715
1

Note that $F$ distinguishes every pair $(A,B)$ of sets if any only if $F$ distinguishes every disjoint pair $(A,B)$ for which $A\cap B=\varnothing$.

To choose $F$, let us choose $f$ subsets independently and uniformly randomly from all $2^n$ possible subsets. Fix a particular pair $(A,B)$ for which $A\cap B=\varnothing$. For a single such random set $X$, we have \begin{align} P(|A\cap X|=|B\cap X|) &=2^{-(|A|+|B|)}\sum_{k=0}^{\min(|A|,|B|)}\binom{|A|}k\binom{|B|}k \\&=2^{-(|A|+|B|)}\sum_{k=0}^{\min(|A|,|B|)}\binom{|A|}k\binom{|B|}{|B|-k} \\&=2^{-(|A|+|B|)}\binom{|A|+|B|}{|B|} \\&\le C(|A|+|B|)^{-1/2} \end{align} for some constant $C$. The last inequality is true since the largest binomial coefficient is the central one, and $2^{-k}\binom{k}{\lfloor k/2\rfloor}\approx Ck^{-1/2}$. The constant $C$ can be found with Stirling's approximation; it is something like $\sqrt{2\pi}$, but it does not matter.

Then, the probability that $|A\cap X|=|B\cap X|$ for all of the randomly chosen $X\in F$ is at most $C^f(|A|+|B|)^{-f/2}$. Now, we upper bound the expected number of disjoint pairs $(A,B)$ for which $|A\cap X|=|B\cap X|$ for all $X\in F$. Since there are $\binom{n}k2^k$ disjoint pairs $(A,B)$ for which $|A|+|B|=k$, the expected number of "bad" pairs is at most $$ C^f\sum_{k=1}^n\binom nk2^k\, k^{-f/2}\le C^f\cdot 3^n(2n/3)^{-f/2}=3^n(C'n)^{-f/2} $$ The above inequality comes from noting $\sum_{k=1}^n\binom nk(2/3)^k(1/3)^{n-k} k^{-f/2}=E[Y^{-f/2}]$, where $Y\sim \text{Bin}(n,2/3)$. Since the function $x\mapsto x^{-f/2}$ is convex, we can upper bound this by $E[Y]^{-f/2}$. Also, $C'$ is just a different constant.

For the probabilistic method to work, the above quantity needs to be less than $1$. Starting from $3^n(C'n)^{-f/2}<1$, we derive $$f> \frac{2n\log3}{\log n+\log C'}=\frac{2n}{\log_3 n}\cdot \left(\frac{\log n}{\log n+\log C'}\right)=\frac{2n}{\log_3 n}\cdot (1+o(1)).$$ Therefore, setting $f$ equal to the smallest integer greater than $2n/\log_3 n(1+o(1))$ will ensure the expected number of bad disjoint pairs is less than $1$, so there is some random assignment for which it is zero.

Mike Earnest
  • 84,902