Map similar like clusters in two different cluster sets

Question

I have two sets of clusters (Cluster_set_1 and Cluster_set_2) obtained by analysing two different sets of news paper articles.

A cluster includes word/word phrases as given in the examples below.

C1 in Cluster_set_1: Energy, Fuel, Oil, Mining
C2 in Cluster_set_1: school, education, students, schools, million, read
...
...
etc.

C1 in Cluster_set_2: Gas, oil, pipeline
C2 in Cluster_set_2: program, business, management,information, reports
...
...
etc.

Now I want to find clusters in the two cluster sets that are similar/related by considering the word/word phrases in the cluster as given in the example below.

Example: 
Cluster of 'Energy, Fuel, Oil, Mining' in Cluster_set_1 is mostly similar/related to 
Cluster of 'Gas, oil, pipeline' in Cluster_set_2

Reason: Because they both includes word/word phrases related to Energy

Since, I am dealing with two separate sets of clusters, what would be a suitable approach that I can use to connect the clusters in the two different sets?

score 3 · Answer 1 · answered Dec 19 '17 at 14:07

3

I would investigate a soft clustering approach using HDBSCAN clustering. Some reading on text clustering here, which includes discussion on clustering with DBSCAN, HDBSCAN's non-hierarchical precursor

answered Dec 19 '17 at 14:07

Matt

156
6

Map similar like clusters in two different cluster sets

1 Answers1