What is the definition of "convex relaxation" in clustering?

Question

I have following text from a paper i am trying to understand:

enter image description here

I don't understand what does below sentence refers to as being convex/non-convex

The problem is that even though the objectives (1) and (2) are convex the constraint that K is valid is certainly not convex.

And also i don't understand relax term in the line above.

Can anybody kindly explain me those concepts?

I'll leave this here just in case it helps: https://en.wikipedia.org/wiki/Convex_function — , Dec 01 '14 at 16:01

Michael Grant · Accepted Answer · 2014-12-01T15:41:43.803

Here's what's happening.

First of all: at risk of re-stating something you already know, a convex optimization problem typically involves minimizing a convex function of the variables, subject to the constraint that the variables must lie in a convex set. Notice that the term "convex" is used in three ways: to describe the optimization model, to describe the function, and to describe the set. Obviously, these three notions of convexity are tightly related. Nevertheless, I'm pointing out these distinct uses because this paper is weaving them all in here in one tight paragraph.

Originally, the problem was formulated as an unconstrained optimization problem: $$\begin{array}{ll} \text{minimize}_{\mathcal{C}} & \| A - K(\mathcal{C}) \|_1 \end{array}$$ where $K$ is a function whose input is a clustering $\mathcal{C}$ of the $n$ points in question, and whose output is an $n\times n$ incidence matrix: $K_{uv}=1$ if points $u$ and $v$ are in the same cluster, and $0$ otherwise. This is a nice way to express the problem conceptually, but it's not that useful computationally. (How exactly are you describing the clustering relationship $\mathcal{C}$? And how do you conceive of a "search direction" over the set of these clusterings?")

The first thing they did is write it this way: $$\begin{array}{ll} \text{minimize}_{\mathcal{C},\bar{K}} & \| A - \bar{K} \|_1 \\ \text{subject to} & \bar{K} = K(\mathcal{C}) \end{array}$$ The use of the bar in $\bar{K}$ is my addition, to make it easier to differentiate between the matrix variable $\bar{K}$ and the clustering-incidence mapping function $K$. Now $\bar{K}$ is just an $n\times n$ matrix. Still, the presence of $\mathcal{C}$ as an optimization "variable" is useful conceptually but problematic computationally. So they chose to eliminate it: $$\begin{array}{ll} \text{minimize}_\bar{K} & \| A - \bar{K} \|_1 \\ \text{subject to} & \bar{K} \in \left\{ K(\mathcal{C}) ~\middle|~ \text{all possible clusterings $\mathcal{C}$} \right\} \end{array}$$ Or, as they put it in (3), $$\begin{array}{ll} \text{minimize}_\bar{K} & \| A - \bar{K} \|_1 \\ \text{subject to} & \bar{K} \text{ is a valid clustering matrix.} \end{array}$$ Valid clustering (incidence) matrices can have only $\{0,1\}$ elements. Not only that, they must be permutations of block diagonal matrices where each diagonal block has all ones. (You'll have to convince yourself of that by reading the paper.)

So this is how they arrived at a computational version of their optimization model:

The objective function $\|A-\bar{K}\|_1$ is a convex function of $\bar{K}$. That is what the paper is referring to when they say "the objectives (1) and (2) are convex."
The set of valid $\bar{K}$ clustering (incidence) matrices is a non-convex set. That is what they are referring to when they say "the constraint that $K$ is valid is certainly not convex."
Because the constraint set is not convex, this is not a convex optimization problem.

And this is a problem, because non-convex optimization problems are, in general, intractable.

Their solution is to relax the constraint to make it convex; that is, they are going to replace the constraint with one that is strictly weaker, but convex. Every valid clustering matrix $K$ will still satisfy the relaxed constraint, but so will a lot of other, invalid matrices. For instance, any matrix $\alpha \bar{K}_1 + (1-\alpha)\bar{K}_2$, where $\bar{K}_1$ and $\bar{K}_2$ are valid clustering matrices and $0\leq\alpha\leq 1$, will be included in this relaxed set.

This relaxed model will now be convex by design, which will make it possible to solve it efficiently. If we are lucky, the solution to this relaxed problem will happen to be a valid incidence matrix---and if that is the case, then you have found the global solution to your original, non-convex problem. But again, that's if we are lucky. More likely, the solution to the relaxed problem will not be a valid incidence matrix. But it might be a good initial starting point for a local search for such a matrix.

What is the definition of "convex relaxation" in clustering?

1 Answers1