Given similarity scores of datasets, find dataset clusters?

Question

I have data of the format [dataset_N, dataset_N+1, similarity] where two different datasets have been compared, resulting in a similarity score. A score is generated for each possible (unique) pair of datasets (~7e6 pairs).

I am trying to determine how to cluster these datasets using just the similarity scores.

Something like K-means can't be used directly because the available information (similarity scores and which pair was compared) doesn't match its inputs.

Is there an existing method that can be used for this?

Edit: reworded for clarity

Edit 2: removed references to labels, since that was causing confusion

score 0 · Accepted Answer · answered Oct 11 '22 at 07:38

0

Thanks to the clues from the comments and this stack exchange post outlining graph clustering and modularity scores I was able to find a suitable method. I went with the NetworkX implementation of the Louvain Communities algorithm.

answered Oct 11 '22 at 07:38

Mandias

1
2

Given similarity scores of datasets, find dataset clusters?

1 Answers1