Clustering customer data stored in ElasticSearch

Question

I have a bunch of customer profiles stored in a elasticsearch cluster. These profiles are now used for creation of target groups for our email subscriptions.

Target groups are now formed manually using elasticsearch faceted search capabilities (like get all male customers of age 23 with one car and 3 children).

How could I search for interesting groups automatically - using data science, machine learning, clustering or something else?

r programming language seems to be a good tool for this task, but I can't form a methodology of such group search. One solution is to somehow find the largest clusters of customers and use them as target groups, so the question is:

How can I automatically choose largest clusters of similar customers (similar by parameters that I don't know at this moment)?

For example: my program will connect to elasticsearch, offload customer data to CSV and using R language script will find that large portion of customers are male with no children and another large portion of customers have a car and their eye color is brown.

score 6 · Accepted Answer · edited Oct 21 '22 at 03:12

One algorithm that can be used for this is the k-means clustering algorithm.

Basically:

Randomly choose k datapoints from your set, $m_1$, ..., $m_k$.
Until convergence:
1. Assign your data points to k clusters, where cluster i is the set of points for which m_i is the closest of your current means
2. Replace each $m_i$ by the mean of all points assigned to cluster i.

It is good practice to repeat this algorithm several times, then choose the outcome that minimizes distances between the points of each cluster i and the center $m_i$.

Of course, you have to know k to start here; you can use cross-validation to choose this parameter, though.

Clustering customer data stored in ElasticSearch

1 Answers1