Questions tagged [clustering]

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

1381 questions

202

votes

13 answers

K-Means clustering for mixed numeric and categorical data

My data set contains a number of numeric attributes and one categorical. Say, NumericAttr1, NumericAttr2, ..., NumericAttrN, CategoricalAttr, where CategoricalAttr takes one of three possible values: CategoricalAttrValue1, CategoricalAttrValue2 or…

data-mining clustering octave k-means categorical-data

asked May 14 '14 at 05:58

IgorS

5,474
11
34
43

votes

9 answers

Clustering geo location coordinates (lat,long pairs)

What is the right approach and clustering algorithm for geolocation clustering? I'm using the following code to cluster geolocation coordinates: import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2,…

machine-learning python clustering k-means geospatial

asked Jul 17 '14 at 09:50

rokpoto.com

votes

6 answers

Calculating KL Divergence in Python

I am rather new to this and can't say I have a complete understanding of the theoretical concepts behind this. I am trying to calculate the KL Divergence between several lists of points in Python. I am using this to try and do this. The problem that…

python clustering scikit-learn

asked Dec 08 '15 at 10:37

Nanda

votes

5 answers

Is it necessary to standardize your data before clustering?

Is it necessary to standardize your data before cluster? In the example from scikit learn about DBSCAN, here they do this in the line: X = StandardScaler().fit_transform(X) But I do not understand why it is necessary. After all, clustering does…

python clustering anomaly-detection

asked Aug 06 '15 at 20:58

makansij

votes

4 answers

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean? Overview of the task set: The task is to compute…

machine-learning nlp clustering similarity

asked Feb 12 '18 at 13:31

Logan

votes

8 answers

Best practical algorithm for sentence similarity

I have two sentences, S1 and S2, both which have a word count (usually) below 15. What are the most practically useful and successful (machine learning) algorithms, which are possibly easy to implement (neural network is ok, unless the architecture…

nlp clustering word2vec similarity

asked Nov 23 '17 at 14:40

DaveTheAl

votes

1 answer

What is the best Keras model for multi-class classification?

I am working on research, where need to classify one of three event WINNER=(win, draw, lose) WINNER LEAGUE HOME AWAY MATCH_HOME MATCH_DRAW MATCH_AWAY MATCH_U2_50 MATCH_O2_50 3 13 550 571 1.86 3.34 …

python neural-network classification clustering keras

asked Feb 01 '16 at 15:18

SpanishBoy

votes

1 answer

Word2Vec vs. Sentence2Vec vs. Doc2Vec

I recently came across the terms Word2Vec, Sentence2Vec and Doc2Vec and kind of confused as I am new to vector semantics. Can someone please elaborate the differences in these methods in simple words. What are the most suitable tasks for each…

machine-learning data-mining clustering nlp unsupervised-learning

asked Jun 30 '17 at 07:05

Smith

votes

2 answers

How to deal with time series which change in seasonality or other patterns?

Background I'm working on a time series data set of energy meter readings. The length of the series varies by meter - for some I have several years, others only a few months, etc. Many display significant seasonality, and often multiple layers -…

data-mining clustering time-series beginner

asked Dec 22 '14 at 03:30

Jo Douglass

votes

3 answers

K-means incoherent behaviour choosing K with Elbow method, BIC, variance explained and silhouette

I'm trying to cluster some vectors with 90 features with K-means. Since this algorithm asks me the number of clusters, I want to validate my choice with some nice math. I expect to have from 8 to 10 clusters. The features are Z-score scaled. Elbow…

clustering k-means

asked Jul 20 '15 at 08:03

marcodena

1,667
4
14
17

votes

5 answers

Clustering based on similarity scores

Assume that we have a set of elements E and a similarity (not distance) function sim(ei, ej) between two elements ei,ej ∈ E. How could we (efficiently) cluster the elements of E, using sim? k-means, for example, requires a given k, Canopy…

clustering algorithms similarity

asked May 16 '14 at 14:26

vefthym

votes

4 answers

K-means: What are some good ways to choose an efficient set of initial centroids?

When a random initialization of centroids is used, different runs of K-means produce different total SSEs. And it is crucial in the performance of the algorithm. What are some effective approaches toward solving this problem? Recent approaches are…

data-mining clustering k-means

asked Apr 30 '15 at 13:42

ngub05

votes

2 answers

K-means vs. online K-means

K-means is a well known algorithm for clustering, but there is also an online variation of such algorithm (online K-means). What are the pros and cons of these approaches, and when should each be preferred?

clustering algorithms k-means

asked Jun 18 '14 at 19:48

Rubens

4,117
5
25
42

votes

1 answer

Algorithms for text clustering

I have a problem of clustering huge amount of sentences into groups by their meanings. This is similar to a problem when you have lots of sentences and want to group them by their meanings. What algorithms are suggested to do this? I don't know…

clustering text-mining algorithms scikit-learn

asked Aug 15 '14 at 13:10

Andrey Rubliov

votes

2 answers

Clustering unique visitors by useragent, ip, session_id

Given website access data in the form session_id, ip, user_agent, and optionally timestamp, following the conditions below, how would you best cluster the sessions into unique visitors? session_id: is an id given to every new visitor. It does not…

clustering

asked May 15 '14 at 09:04

AdrianBR

2 3

…

91 92 Next