Highest Voted 'clustering' Questions - Computer Science Stack Exchange

30

votes

4 answers

What exactly is the difference between supervised and unsupervised learning?

I am trying to understand clustering methods. What I I think I understood: In supervised learning, the categories/labels data is assigned to are known before computation. So, the labels, classes or categories are being used in order to "learn" the…

machine-learning data-mining clustering

asked Jul 25 '12 at 12:22

Prot

403
1
4
5

8

votes

1 answer

Under what conditions is K-means clustering transformation-invariant?

Given a set of data points $X = \{x_1, x_2, \ldots, x_m\}$ where $x_i \in \mathbb{R}^d$ we run K-means on $X$ and obtain the clusters $c_1, c_2, \ldots, c_k$. Now, if we create a new dataset $Y = \{y_1, y_2, \ldots, y_m\}$ where $y_i = Ax_i + b$ and…

algorithms clustering

asked Oct 19 '16 at 16:13

Ana Echavarria

107
6

6

votes

1 answer

How to compare/cluster millions of strings?

I have around 1,000,000 of strings of variable length (from 200 to 50000) that can contain 5 characters (A, B, C, D, E). What I actually want is to cluster them together if they are similar enough. By similar enough I mean they have an edit distance…

algorithms reference-request strings clustering

asked Jul 10 '17 at 12:23

Ivan

273
2
7

5

votes

1 answer

How is the (local) clustering coefficient defined for vertices with degree 1

We want to compute the clustering coefficient $C$ for an undirected graph $G = (V, E)$. The clustering coefficient $C$ for a graph $G$ is the average over all local clustering coefficients $C_i$, whereby $C_i$ is the local clustering coefficient of…

graphs clustering

asked Oct 11 '16 at 13:39

confusedstudent

51
2

5

votes

1 answer

k-means clustered data: how to label newly incoming data

I have a data set with labels that were produced by a $k$-means clustering algorithm. Now there is some data (with the same data structure) from another source and I wonder what is the most sensible way to label this new, yet unseen data? I was…

machine-learning classification clustering svm

asked Mar 05 '19 at 14:11

Uli Niklas

51
1

4

votes

0 answers

Finding the "most modular" subset of graph vertices, i.e. that minimize disagreement inside and outside

Let $G = (V, E)$ be a graph. I want to find the subset of vertices of $G$ that minimizes a certain modularity cost. In our setting, the modularity cost of a subset $X$ is defined as the number of non-edges within $X$ plus the number of edges from…

algorithms graphs np-hard clustering

asked Sep 17 '18 at 19:48

Manuel Lafond

530
2
12

4

votes

2 answers

Creating Best Clusters of Objects Based on Distance Between Them

I have an array of images. And, there is a function that calculates the distance between two images. I wish to cluster the images based on this distance. So the clusters contain images that are all at short distance to each other. So only the…

clustering

asked Jul 29 '18 at 14:12

meaning-matters

141
4

4

votes

0 answers

Persistent Homology vs Clustering Methods

How do persistent homology and clustering methods for data point clouds differ? I'm specifically interested in the application to gene expression data of cancer patients, but any example works. I understand that a hierarchical clustering method…

clustering

asked Jul 08 '16 at 14:10

Emil_Longshore

41
2

4

votes

2 answers

Reduce k-means to Integer Programming

The k-means algorithm reduces to computing the objective function: $ \underset{\textbf{S}}{\operatorname{argmax}} \sum_{i=1}^k \sum_{\textbf{x}_j\in\textbf{S}_i} \lVert \textbf{x}_j - \mathbf{\mu}_i \rVert $ for some observations…

complexity-theory reductions integer-programming clustering

asked Sep 03 '14 at 20:09

user13675

1,684
12
19

4

votes

0 answers

Find a dynamic programming solution that minimize the sum of the diameters of two clusters?

I asked a question at this link, where I suggested a greedy algorithm for this problem: Suppose given $2n$ points in the plane and we want partition points into two clusters $C_1$ , $C_2$ such that each cluster contains exactly $n$ points and we…

algorithms dynamic-programming computational-geometry clustering

asked Mar 05 '22 at 10:33

All

83
6

4

votes

0 answers

K-means, but normalized and with max

Given points $x_1, \ldots, x_n$ in the Euclidean space and $K \in \mathbb N$, I'm interested in the following objective. Partition the points into $K$ clusters $C_1, \ldots, C_K$ so that: $$\max_{i \in [K]} \frac{1}{|C_i|}\sum_{j \in C_i} \|x_j -…

approximation clustering

asked Dec 20 '21 at 19:58

Dmitry

347
1
4
12

4

votes

1 answer

How to group intervals which overlap by some amount?

I have an algorithm that generates a list of intervals. The algorithm is run m times. Lets mark the intervals as tuples (s1, e1), (s2, e2), .., (sn, en). It is possible to add the run ID to the tuple (though I don't think it helps). The goal is to…

clustering intervals

asked Aug 05 '20 at 15:18

mibm

149
3

4

votes

1 answer

How to calculate the minimum number of groups, by grouping groups with capacity together?

I need to group cars (and their passengers) with other cars, and I don't know how to approach this problem. If I have, for example, 3 cars. Car A with 7 seats and 2 passengers (3/7 because of the driver). Car B, 2/2. Car C, 1/3. The most wasteful…

graphs trees network-flow clustering group-theory

asked Dec 06 '18 at 19:54

Ricardo Jesus

55
4

3

votes

1 answer

(DROP) Data Reduction Algorithm - How it works?

I am studing a PHD framework which the propose is to reduce the dataset with the most representative samples for training a classifier. Maybe I am loosing something, but I could not undestand a specific part. Basic this is the algoritm 1…

graphs machine-learning reinforcement-learning clustering

asked Aug 20 '18 at 06:22

rej

31
3

3

votes

0 answers

What is the definition of a "Clustering Feature" in BIRCH algorithm?

The paper for BIRCH (a clustering algorithm) contains definitions of a Clustering Feature (CF) where the notation is unclear (cf. PDF page 3 / section 4). A cluster contains N d-dimensional entries $ \{ \vec{X}_1, \vec{X}_2, \dots, \vec{X}_N \} $…

algorithms data-mining clustering

asked Jun 11 '18 at 12:58

c11o

31
2

Questions tagged [clustering]