21

Why Laplacian matrix needs normalization and how come the sqrt-power of degree matrix? The symmetric normalized Laplacian matrix is defined as $$\ L = D^{1/2}AD^{-1/2}$$ where L is Laplacian matrix, A is adjacent matrix. Element $A_{ij}$ represents a measure of the similarity between data points with indices $i$ and $j$. D is diagonal matrix, defined as $\ D = \sum\limits_i A_{ij}$. In other words, it is the sum of similarity from node $j$ to all its neighbor node $i$.

I can't figure out some question about the formula:

  • Why we need to normalize Laplacian Matrix like that? $\ {1/2}$-power ?
  • Is there any special reason?
  • Is there any references?
Blade
  • 461
  • 3
  • 16
  • Note that "Laplacian" tag refers to a differential operator, where you seem to have in mind its use in (undirected) graph algorithms, where it is more of a difference operator. You should supply more context, because the answer to "why" one wants to define the symmetric normalized Laplacian matrix as shown depends on the purpose to which it is put. – hardmath Jan 21 '15 at 13:26
  • Perhaps a way to frame your Question is, what are the eigenvalues of the Laplacian matrix in comparison with the "symmetric normalized" Laplacian matrix? – hardmath Jan 21 '15 at 13:29
  • In Spectral clustering, a graph is been transformed to three matrix. 1. A Adjacent matrix. 2. D diagonal matrix, $d_{ii}$ represents the degree of node $i$. 3. L Laplacian where $L = D - W$. And we calculate the eigenvector of L. And we choice $m$ eigenvector where one eigenvector represents as one axis, which means $m$ eigenvector equals to the new coordination of the data point, for example,has a E the eigenvector matrix, row represent data $i$, col represents eigenvector. Then, we use k-means clustering data on the new coordination. There is no specific definition of eigenvalues. – John Buger Jan 21 '15 at 13:41
  • L laplacian matrix is be used as a data reduction tool in spectral clustering. – John Buger Jan 21 '15 at 13:47
  • Most references like this article http://www.math.ucsd.edu/~fan/cbms.pdf, p.5, it shows up the normalization formula but doesn't explain. It seems like it's a property of laplacian matrix in general but not depends on domain. – John Buger Jan 21 '15 at 13:57
  • 3
    I think you have omitted a minus sign in the exponent of the first matrix factor. Should it be $D^{-1/2} $? – hardmath Jan 22 '15 at 17:30
  • The equation is not correct. It should be $$\ L = D^{-1/2}AD^{-1/2}$$ – seralouk Jan 29 '21 at 09:23

3 Answers3

16

Unnormalized Laplacian serve in the approximation of the minimization of RatioCut, while normalized Laplacian serve in the approximation of the minimization of NCut.

Basically, in the unnormalized case, you optimize an objective relative to the number of nodes in each cluster. And, in the normalized case, you optimize the objective relative to the volume of each cluster.

The square root comes from this: $f^\top(D - A)f = g^\top(I-D^{-1/2}AD^{-1/2})g$ where $g=D^{1/2}f$.

So, if your optimization method relies on a symmetric PSD matrix, you use the symmetric normalized Laplacian, and then remove the bias by computing $D^{-1/2}g$ to recover $f$.


More insight about RatioCut and NCut.

Let $G=(V,E)$ be a graph with vertex set $V=\{v_1,\dots,v_n\}$ and edge set $E$, $w_{ij}$ denote the positive weight on the edge between $v_i$ and $v_j$. Let $\{C_i:1 \le i \le k\}$ be a disjoint partition of $V$. Define $$ Cut(C_i:1\le i\le k)\triangleq\frac{1}{2}\sum_{c=1}^k \sum_{i \in C_c,j\in \bar C_c}A_{ij} $$ $$ RatioCut(A_i:1\le i \le k)\triangleq\sum_{i=1}^k\frac{Cut(A_i,\bar A_i)}{|A_i|} $$ $$ NCut(A_i:1\le i \le k)\triangleq\sum_{i=1}^k\frac{Cut(A_i,\bar A_i)}{vol(A_i)} $$

RatioCut: Let $$\tag{1}\label{1} f_i=\begin{cases}\sqrt{|\bar A|/|A|} & \text{if } v_i\in A\\\sqrt{|A|/|\bar A|} & \text{if } v_i \in \bar A\end{cases} $$

\begin{align*} f^\top L f & = \frac{1}{2}\sum_{i,j=1}^n w_{ij}(f_i-f_j)^2 \\ & = \frac{1}{2}\sum_{i\in A,j\in \bar A}w_{ij}\left(\sqrt{\frac{|\bar A|}{|A|}}+\sqrt{\frac{|A|}{|\bar A|}}\right) + \frac{1}{2}\sum_{i\in \bar A,j\in A}w_{ij}\left(-\sqrt{\frac{|\bar A|}{|A|}}-\sqrt{\frac{|A|}{|\bar A|}}\right) \\ &=Cut(A,\bar A)\left(\frac{|\bar A|}{|A|}+\frac{|A|}{|\bar A|}+2\right)\\ &=Cut(A,\bar A)\left(\frac{|A|+|\bar A|}{|A|}+\frac{|A|+|\bar A|}{|\bar A|}\right)\\ &=|V| RatioCut(A,\bar A) \end{align*}

You can also see that $\sum_{i=1}^n f_i=0$, so $f\perp \mathbb 1$.

So minimizing RatioCut is equivalent to the following problem: $$ \min_{A\subset V}f^\top L f\text{ subject to $f\perp\mathbb 1$, $f_i$ as defined in Eq.\eqref{1}},\|f\|^2=n $$

NCut: For this case, define $$\tag{2}\label{2} f_i=\begin{cases} \sqrt{\frac{vol(\bar A)}{vol(A)}} &\text{if $v_i\in A$}\\ \sqrt{\frac{-vol(A)}{vol(\bar A)}} &\text{if $v_i\in \bar A$}\\ \end{cases} $$

Similar to above, we have $Df\perp \mathbb 1$, $f^\top D f=vol(V)$ and $f^\top L f=vol(V)NCut(A,\bar A)$. Minimizing NCut is equivalent to $$ \min_{A\subset V} f^\top L f \text{ subject to $f$ as in Eq.\eqref{2}, $Df\perp \mathbb 1$ and $f^\top Df=vol(V)$} $$ Substituting $g=D^{1/2}f$ we get $$ \min_{A\subset V} g^\top D^{-1/2}LD^{-1/2} g \text{ subject to $g=D^{1/2}f$, $f$ as in Eq.\eqref{2}, $g\perp D^{1/2}\mathbb 1$ and $\|g\|^2=vol(V)$} $$

Then observe that $D^{-1/2}LD^{-1/2}$ is the symmetric normalized Laplacian.

mather
  • 53
  • 8
davcha
  • 1,795
  • Why $D^{1/2}$ ? Why not $D^{1/3}$ or something else? – John Buger Jan 21 '15 at 14:43
  • 1
    Because $D^{-1}A$ is the transition matrix of the underlying markov chain. Whereas $D^{-2/3}A$ is nothing like that. – davcha Jan 21 '15 at 14:57
  • Sorry my poor math...can it be more detail? Why can $D^{1/2}$ represents the transform operation from RatioCut to NCut? – John Buger Jan 21 '15 at 15:09
  • I'm no longer in front of a computer. I'll edit my answer later to give you more details. – davcha Jan 21 '15 at 15:32
  • Why $f\perp \mathbb 1$? $\sum_{i=1}^n f_i=0$? Why we need to keep the property? Why we need constraints $|f|^2=\sqrt{n}$ ? $|g|^2=vol(V)$? – John Buger Jan 22 '15 at 13:22
  • Consider two vectors $u$ and $v$. $u\perp v\iff u\cdot v=\sum_{i=1}^n u_i v_i=0$. Now, since $\mathbb 1$ is the vector of all ones, $f\cdot \mathbb 1 = \sum_{i=1}^n f_i$. – davcha Jan 22 '15 at 13:24
  • You need to keep this property for the relaxation you're doing. In practice, $f$ is not restricted to particular values (and this leads to a possibily loose approximation). If you remove this condition, $f=\mathbb 1$ always minimize your problem. Also, $|f|^2=\sqrt{n}$ may not be required... But then $f=\mathbb 0$ also minimize the problem. – davcha Jan 22 '15 at 13:26
  • Does $Df\perp \mathbb 1$ want to transform $|A|$ to $vol(A)$? And $D^{1/2}$ of $g=D^{1/2}f$ want to keep the result of $g^\top D^{-1/2}LD^{-1/2} g $ is symmetric matrix? – John Buger Jan 22 '15 at 13:40
  • This is also related to the Rayleigh-Ritz theorem. The substitution $g=D^{1/2}f$ - and everything else - is there so we can use this theorem to solve our problem. – davcha Jan 22 '15 at 13:47
  • Sorry...I can't find where the substitution $g=D^{1/2}f$ in the article, can I get a hint? – John Buger Jan 22 '15 at 14:26
  • Look at Luxburg 2007, A Tutorial on Spectral Clustering – davcha Jan 22 '15 at 14:38
  • I learn spectral clustering from the article, but I can't figure out why $D^{1/2}$ appear.You mention that come from Rayleigh-Ritz theorem, but I still stuck. – John Buger Jan 22 '15 at 15:42
  • Are you using $|A|$ to denote the number of nodes in one partition? I understand that $|\cdot|$ sometimes denotes cardinality, but when applied to a matrix, people usually think of determinant. That'll be extra confusing once people keep reading and see $vol(A)$ because the determinant captures volume. Can you please clarify your notation? – RMurphy Mar 12 '18 at 21:40
  • Yeah, the problem is that $A_i$ is a set here... – davcha Mar 12 '18 at 21:43
  • @davcha OK. I think it's fair to say you never stated that $A$ was the adjmat, even if that's what it meant in earlier posts. So the notation is fine, but these comments of mine may be helpful to someone in the future.. – RMurphy Mar 13 '18 at 19:35
  • For RatioCut it should be $||f||^2 = n$ so $||f|| = \sqrt{n}$ right? – curiousgeorge Oct 05 '20 at 01:00
  • Yep, that's a typo – davcha Nov 14 '20 at 19:02
6

The notion of Normalised Laplace matrix was given by Fan R. K. Chung in her book Spectral Graph Theory. Two basic reasons of its widely acceptance is that its eigenvalues are consistent with the eigenvalues in the spectral geometry and in stochastic processes, and that many results which were only known for regular graphs can be generalised for all graphs. The Laplacian $L=D-A$ works well for the regular graphs but the Normalised laplacian $ℒ=D^{-1/2}LD^{1/2} =D^{-1/2}(D-A)D^{1/2}=I-D^{-1/2}AD^{1/2}$ not only works well for regular but also irregular graphs. First few pages of the book by Chung will answer your question in detail.

Crystal
  • 153
  • 7
-1

Note that relaxed version of NCut problem is that of minimization of generalized eigenproblem

$Lx = \lambda Dx$

consult the paper by Shi and Malik for a justification.

To solve this problem, i.e. to find eigenvector $x$ and eigenvalue $\lambda$, multiply both sides of this equation by $D^{-1/2}$:

$D^{-1/2}Lx=\lambda D^{1/2}x$

This equation can be rewritten as

$D^{-1/2}LD^{-1/2}D^{1/2}x=\lambda D^{1/2}x$

Setting $y=D^{1/2}x$ you obtain

$D^{-1/2}LD^{-1/2}y=\lambda y$

that is you reduced the generalized eigenproblem $Lx=\lambda Dx$ to the standard eigenproblem $\mathcal{L} y = \lambda y$, where $\mathcal{L} = D^{-1/2}LD^{-1/2}$ is the normalized Laplacian matrix.

Knowing eigenvector $y$ you can find the eigenvector $x=D^{-1/2}y$.

  • I think your first equation is already wrong since the right side of your equation is missing a - sign in the exponent of D – v.tralala Jan 03 '23 at 17:42