3

I have two sets of clusters (Cluster_set_1 and Cluster_set_2) obtained by analysing two different sets of news paper articles.

A cluster includes word/word phrases as given in the examples below.

C1 in Cluster_set_1: Energy, Fuel, Oil, Mining
C2 in Cluster_set_1: school, education, students, schools, million, read
...
...
etc.

C1 in Cluster_set_2: Gas, oil, pipeline
C2 in Cluster_set_2: program, business, management,information, reports
...
...
etc.

Now I want to find clusters in the two cluster sets that are similar/related by considering the word/word phrases in the cluster as given in the example below.

Example: 
Cluster of 'Energy, Fuel, Oil, Mining' in Cluster_set_1 is mostly similar/related to 
Cluster of 'Gas, oil, pipeline' in Cluster_set_2

Reason: Because they both includes word/word phrases related to Energy

Since, I am dealing with two separate sets of clusters, what would be a suitable approach that I can use to connect the clusters in the two different sets?

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
Volka
  • 731
  • 3
  • 6
  • 21

1 Answers1

3

I would investigate a soft clustering approach using HDBSCAN clustering. Some reading on text clustering here, which includes discussion on clustering with DBSCAN, HDBSCAN's non-hierarchical precursor

Matt
  • 156
  • 6