Questions tagged [clustering]

Clustering is grouping (partitioning) a set of objects so that items in the same group are more similar to each other than to items in different groups, where the notion of similarity may be variously defined.

Clustering is a task of grouping (partitioning) a set of objects so that items in the same group are more similar (closer) to each other than to items in different groups. Often the notion of similarity is expressed as a distance measure, with greater distance conveying less similarity. The study of clustering algorithms (cluster analysis) originated in the social sciences but has become important in statistical data analysis (data mining) and in machine learning.

Examples of such algorithms are $K$-means and self-organizing map.

322 questions

votes

1 answer

Size of connected regions on a randomly-colored infinite chessboard

Consider an infinite chessboard where each square is colored white with probability $p$ and black with probability $1-p$. Suppose without loss of generality that the square at $(0,0)$ is white. We can consider the entire connected region $W$ of…

asked Nov 17 '13 at 19:55

MJD

67,568
43
308
617

votes

1 answer

Theoretical link between the graph diffusion/heat kernel and spectral clustering

The graph diffusion kernel of a graph is the exponential of its Laplacian $\exp(-\beta L)$ (or a similar expression depending on how you define the kernel). If you have labels on some vertices, you can get labels on the rest of the vertices by a…

linear-algebra graph-theory heat-equation spectral-graph-theory clustering

asked Nov 24 '15 at 17:14

highBandWidth

votes

2 answers

Measure of "how much diagonal" a matrix is

I have a (biological) computational system that outputs square matrices. Sometimes, these matrices are diagonal-like, with higher values at and around the diagonal. I would like to have some summary measure on how "much diagonal" a matrix is, so…

matrices clustering

asked Aug 10 '15 at 22:39

lourencoj

votes

0 answers

Balanced linear partitioning of a set of points in $R^d$

Suppose we have a set of points in $R^d$ and for a given constant $\epsilon>0$ we want to find a hyperplane such that it divides the dataset into two balanced partitions, and that the number of points that are $\epsilon$-close the hyperplane is…

optimization integer-programming non-convex-optimization clustering

asked Jun 14 '19 at 19:44

kvphxga

votes

1 answer

Why do we use the Laplacian matrix in Spectral Clustering?

When we perform spectral clustering, given a similarity matrix $S$, we define the Laplacian matrix $L$ (normalized or unnormalized). Then, we do eigenvalue decomposition on $L$ and get its eigenvector matrix. Why do we do eigenvalue decomposition on…

laplacian spectral-graph-theory clustering graph-laplacian

asked Dec 01 '19 at 08:28

big_shw

votes

0 answers

Optimization / personalization within clusters

I have the following optimization problem: I have a (random and very noisy) objective function $f(A, P)$, where $A$ is a vector of "observable" parameters of the input and $P$ is the parameters that I can control. I'd like to find $P(A)$ for every…

reference-request optimization machine-learning clustering

asked Jan 16 '14 at 02:35

jkff

votes

2 answers

Clustering algorithm to cluster objects based on their relation weight

I have $n$ words and their relatedness weight that gives me an $n\times n$ matrix. I'm going to use this for a search algorithm but the problem is I need to cluster the entered keywords based on their pairwise relation. So let's say if the keywords…

algorithms clustering

asked Dec 09 '11 at 16:26

Tohid

votes

2 answers

What is the difference between an array and a vector?

Okay so I'm doing a little bit of vector calculus at university (mainly with neural networks and the k-means clustering for cluster analysis in a 3 dimensional field or hyperplane) And from what I understand (Forgive me I'm not sure how to format…

vector-spaces clustering

asked Sep 24 '12 at 06:46

classicjonesynz

votes

1 answer

Measure of the clusters quality in a graph

Suppose we have a graph $G=(V,E)$ with $n$ non-overlapping subgraphs, the clusters $C_1, C_2, \dots, C_n$ which covers the graph $C_1 \cup \dots \cup C_n = G$. I'm looking for a good metric to measure the quality of these clusters. Let's call it…

graph-theory clustering

asked Oct 27 '14 at 17:51

user2024621

votes

2 answers

How to see that K-means objective is non-convex?

I'm trying to proof that the objective of the K-means clustering algorithm is non-convex. The objective is given as $J(U,Z) = \|X-UZ\|_F^2$, with $X \in\mathbb{R}^{m\times n}, U\in \mathbb{R}^{m\times k}, \mathbb \{0,1\}^{k\times n}$. $Z$ represents…

optimization convex-analysis data-analysis clustering

asked Aug 09 '13 at 08:44

user1086105

votes

1 answer

Mutual Information for clustering

I'm working on a document clustering application and decided to use Normalized Mutual Information as one of the measures of effectivenes. But I don't really understand how to implement this in that situation. In…

statistics data-analysis pattern-recognition clustering data-mining

asked Jul 07 '13 at 11:50

user1315305

votes

1 answer

What are the use cases related to cluster analysis of different distance metrics?

I'm trying to use different distance metrics like Euclidean, Manhattan, cosine, chebyshev among other distance metrics in my k-means algorithm to calculate distances between the data points and the centers. In what situation would one distance…

metric-spaces euclidean-geometry examples-counterexamples applications clustering

asked Jan 23 '13 at 13:52

Prateek Kulkarni

votes

1 answer

What is the definition of "convex relaxation" in clustering?

I have following text from a paper i am trying to understand: I don't understand what does below sentence refers to as being convex/non-convex The problem is that even though the objectives (1) and (2) are convex the constraint that K is valid…

convex-analysis convex-optimization clustering non-convex-optimization relaxations

asked Nov 30 '14 at 11:29

dariush

votes

0 answers

Looking for an algo to "sorta" diagonalize a similarity matrix.

I've got a big fat similarity matrix. The rows and columns represent people, and the values represent some positive measure of their closeness (0 meaning no connection at all). The n-th row and n-th colum corresponds to the same person - thus the…

matrices algorithms clustering

asked Sep 17 '14 at 21:24

John Berryman

votes

2 answers

Sampling with an "oversampling" factor, in K-Means||

I'm trying to understand K-Means||, a scalable version of K-Means++, which itself is an "improved" version of the clustering algorithm K-Means. Please find here the link to K-Means||…

statistics probability-theory sampling clustering

asked Aug 05 '14 at 12:13

syoussry

2 3

…

21 22 Next