Positive semi-definite (psd) matrices associated with a family of partially recovering subsets of a given finite set

Question

This question is motivated by this one : Show a specially defined matrix is positive definite.

Let us take it from the beginning.

Let $S=\{s_1,s_2,\cdots s_N\}$ be a finite set.

Let $E_k \ (k=1\cdots n)$ be a family of $n$ subsets of $S$ with any of the 3 cases $n<N, n=N, n>N$.

Let us consider the classical coding of a subset $S$ by a sequence $s \in \{0,1\}^N$ with

$$s(k)=1 \iff s_k \in S.$$

Let us first define matrix $C$ (like "Common") by its entries :

$$C_{ij}=|E_i \cap E_j|$$

(where |.| means "number of elements").

Proposition:: $C$ is psd (positive semi-definite).

Proof : $C=EE^T$ where the $i$th row of $E$ is the code of subset $E_i$ (as described above).

Let us now define a kind of normalized version of matrix $C$:

$$A_{ij}=\frac{|E_i \cap E_j|}{|E_i \cup E_j|}$$

Proposition : $A$ is psd.

Proof: see the nice answer by @kimchilover to the question mentionned at the beginning.

Actually, $A_{ij}$ has been given a name : it is called the "Jaccard index of ressemblence" between sets $E_i$ and $E_j$ (https://en.wikipedia.org/wiki/Jaccard_index); its computation is mentionned in (Jaccard index, matrix notation).

Matrix $A$, like matrix $B$, has also a certain number of properties.

Let us now introduce a third matrix $B$ with entries :

$$B_{ij}=\frac{1}{|E_i \cup E_j|}$$

A conjecture, based on numerical tests, is that $B$ is psd like $A$ and $B$.

See as well the interesting and dense answer by @darij grinberg.

Questions :

a) Has somebody valuable references about matrices $A, B$ and $C$ ? In particular about their spectrum, the fact that there is usually a very dominant eigenvalue with a certain interpretation of the dominant eigenvector, etc... ?

b) Are there connections with correlation matrices ?

c) Are there connections with some associated matrix graphs ?

Remark: Gower's distance is somewhat related. The original article of Gower "A general coefficient of similarity and some of its properties" J.C. Gower, Biometrics, 27, 857-74, Dec. 1971, can be found here.

Edit 1: A connection with neural networks.

Edit 2: "Jaccard index" has another name: "Tanimoto index", especially in chemistry applications.

Edit 3: (2023/01/31) In the case $E_k = [1,k]$ (closed integer interval), matrix $A$ is a Lehmer matrix

Note: The positive semidefiniteness of $B$ follows from Theorem 4 in the post of mine you cited. I'm pretty sure these matrices appear somewhere (see Jaccard distance), but I don't know those areas well. — darij grinberg, Jul 25 '19 at 18:40
"Jaccard index" has another name: "Tanimoto index" especially in chemistry applications: https://www.pilosa.com/use-cases/chemical-similarity-and-the-tanimoto-algorithm/ — Jean Marie, Jan 30 '22 at 09:25
See references in this answer enlarging Jaccard index to cases where specific weighting has to be considered. — Jean Marie, Aug 27 '24 at 20:42

orangeskid · Accepted Answer · 2022-06-14T04:23:44.293

Let's first prove that the matrix $(|E_i \cap E_j|)$ is positive semi-definite.

We'l prove the following variant: if $v_i$, $i=1,n$ are vectors with entries $\ge 0$ in some fixed $\mathbb{R}^l$ ( $l$-component vectors) then the matrix $$M\colon = (|\min (v_i, v_j)|)_{1\le i,j\le n}$$ is positive semi-definite.

Note that $\min$ of two vectors in $\mathbb{R}^l$ is done component-wise, and for a vector $w$ we denote by $|w|$ the sum of its components.

Now, the matrix $M$ is a sum of $l$ matrices $M_k$, $1\le k \le l$, of the form $$M_l = (\min (v_{il}, v_{jl})_{ij}$$ It is enough to show that each $M_k$ is positive semi-definite.

So it's enough to show that for numbers $s_1$, $\ldots$, $s_n\ge 0$ the matrix $$(\min (s_i, s_j))_{ij}$$ is positive semi-definite.

A moment's thought ( conjugation by a permutation) convinces us that is enough to show that in the case $s_1< s_2 <\ldots < s_n$

Let us now notice that we have the formula
$$\det (u_{\min(i,j)})= \prod_{i=1}^n (u_{i} - u_{i-1})$$

We conclude that the determinant $\det (\min (s_i, s_j))$ is $\ge 0$. Therefore, all principal minors of a matrix $M_k$ are $\ge 0$, and so the matrix is positive semi-definite.

Adding up we comclude that $M= \sum_{k=1}^l M_k$ is positive semi-definite.

Now we are done with the case $(|E_i\cap E_j|)$.

Now consider the case of the matrix $|E_i \cup E_j|)$. So let us prove that for $v_i$ $n$ vectors in $\mathbb{R}^l$ with components $>0$ the $n\times n$ matrix $$\left (\frac{1}{|\max(v_i, v_j)|}\right)_{ij}$$ is positive semi-definite.

If we had vectors $v_i$ with $1$ component, this is a previous case with $\min$ for numbers. But for $l>1$ it is not so clear. Moreover, there is the problem of not even knowing how the columns of the matrix with rows $v_i$'s are ordered. There is no magic formula for the determinant apparently. What to do? Algebra is helpless here...

However... we have the Laplace transform!

Recall that we have

$$\frac{1}{s} = \int_0^{\infty} e^{-s t} dt $$

Now consider $n$ real numbers $x_1$, $\ldots$, $x_n$. We want to show that the sum $$S=\sum \frac{1}{|\max(v_i, v_j)|} x_i x_j \ge 0$$

Recall that $|\max (v_i, v_j)| = \sum_{k=1}^l \max (v_{ik}, v_{jk})$. Therefore,

$$S= \int_0^{\infty}\sum_{ij} e^{-\sum_{k=1}^l \max( v_{ik}, v_{jk})t } x_i x_j$$

Let us note that for each $t$ fixed, we have $$\sum_{ij} e^{-\sum_{k=1}^l \max( v_{ik}, v_{jk})t } x_i x_j = \sum_{ij} \prod_{k=1}^l \min (e^{- v_{ik} t}, e^{- v_{ik} t}) x_i x_j$$

Let us note that above we have a quadratic form applied to $(x_1, \ldots, x_n)$ and the matrix giving the quadratic form is a Hadamard product of matrices $$(\min (e^{- v_{ik} t}, e^{- v_{ik} t}))_{ij}$$ (there are $l$ factors in the Hadamard product -- depending on $k$; $\ \ t$ is fixed, a parameter for now) But we already know that such matrices are positive-semidefinite. Moreover, the Hadamard product of p-sd is p-sd (Schur product theorem) We conclude that for each $t$, the above expression is $\ge 0$. Now integrate. We are done.

Note: one can see right away a generalization by using instead of the function $s\mapsto \frac{1}{s}$ any other completely monotone function. Moreover, instead of vectors $v_i$ we could consider positive functions $f_i$ on some measure space. For instance, we get the result: let $f_1$, $\ldots$, $f_n$ be positive function on a measure space. Then for every $\alpha>0$ the matrix $$([\int_X \sup (f_i, f_j) d\mu]^{-\alpha})_{ij}$$

is positive semi-definite.

Taking the Schur product we conclude that the matrix $$\frac{|v_i \land v_j|}{|v_i \lor v_j|}$$ is psd. Its diagonal entries are $1$. Therefore its $n$ eigenvalues are $\ge 0$ with sum $n$. If the supports of the $v_i$ are pairwise disjoint we get the identity matrix. If all $v_i$'s are equal we get the matrix with all entries $1$, eigenvalues $(n, 0, \ldots, 0)$. It's not clear whether all sets of $n$ positive numbers with sum $n$ can be achieved as eigenvalues of matrices $\frac{|E_i \cap E_j|}{|E_i \cup E_j|}$.

Thanks ! See the last Edit (edit 3) I just added. – Jean Marie Jan 31 '23 at 08:29 — Jean Marie, Jan 31 '23 at 08:29
@Jean Marie: Thank you! A very nice question!! – orangeskid Jan 31 '23 at 08:58 — orangeskid, Jan 31 '23 at 08:58

Positive semi-definite (psd) matrices associated with a family of partially recovering subsets of a given finite set

1 Answers1

Linked