4

I've got a big fat similarity matrix. The rows and columns represent people, and the values represent some positive measure of their closeness (0 meaning no connection at all). The n-th row and n-th colum corresponds to the same person - thus the matrix is square.

I'm looking for an algorithm to find some permutation of rows/columns such that the resulting matrix has as much "mass" aligned towards the diagonal as possible. The goal is find a matrix so that average closeness of neighbors (neighboring rows/columns) is maximized.

The ultimate goal is to use this as a sort of clustering algorithm.

  • 1
    look on google for "Extremal_optimization"

    your problem can be optimized by this algorithm, but extremal opt. is not deterministic, so it doesnt always lead to the best solution possible

    – Frank Sep 17 '14 at 21:38
  • 1
    i'm working on a stochastic algorithm for your problem, since it interested me, are you interested in approximate but fast methods or only on an exact solver? – Frank Sep 17 '14 at 21:45
  • Nice question! Presumably the matrix is symmetric as well? More specifically, the entry in position $i,j$ is the same as that in $j,i$? – Stephen Sep 17 '14 at 22:23
  • Also, is there any sort of "triangle inequality" relating the numbers appearing in positions like $(1,2)$ and $(2,3)$ with that appearing in position $(1,3)$? In real world applications there probably will be such relations. – Stephen Sep 17 '14 at 22:25
  • @FrancescoAlem. An exact solution is better, but if this ends up being an NP problem, then I'll settle for an algorithm that gets me close. – John Berryman Sep 18 '14 at 02:13
  • @Stephen - so in other words "can the similarity be used as a metric" so that the space of "people" is a metrics space?" I've been wondering that myself. The matrix is symmetric for now, though I would like to loosen that for the case that A follows B but B does not follow A. – John Berryman Sep 18 '14 at 02:15
  • There is a caveat that shows my questions is not quite ideally posed - what of the circumstance that cliques A, B, and C are all equally "close" to one another? In this case you can still find a matrix that is optimal according to the specifications above - but it wouldn't be useful in identifying groups because I suspect two of these groups would be interleaved so that the optimization criteria is met. I guess it depends what criteria you use. In any case, the above definitions assumes that group members can basically be placed on a line. – John Berryman Sep 18 '14 at 02:22
  • @JohnBerryman, got it :) i managed to formalize it on paper, now i'm making a prototype in c++ to see how does it perform on a live scenario! if it turns out good, i'll write an answer hoping you like it. – Frank Sep 18 '14 at 12:06
  • Ok! Thanks, @FrancescoAlem. – John Berryman Sep 19 '14 at 19:40
  • unfortunately it breaks down once it reaches 1500x1500 dense matrixes, and you probably need to use much much bigger one's... also i tried to implement Extremal Optimization, and it doesnt work very good, so i've settled for a greedy algorithm, but i'm not sure of the quality of the permutation that it gives... so if you want i can give you a slow and 'not sure to actually work' algorithm... – Frank Sep 20 '14 at 13:55
  • 1
    Ha. Thanks for trying @FrancescoAlem. I've been thinking about it and the problem might generalize to a Minimal Spanning Tree problem. As stated here - rearranging rows in a matrix implies building a minimal graph in which each node has one or two neighbors. A MST imposes a hierarchy on the members which is more fitting since I'm hoping to group them. – John Berryman Sep 20 '14 at 19:39
  • That's good news, there are lots of fast algorithms to tackle MST's! if something comes up, please let me know! this problem is fascinating! – Frank Sep 20 '14 at 23:06
  • What do you know; I'm working on a similar question and have stumbled upon this. Have you made any progress since 2014? I started with swapping random rows/columns to increase the "neighbor similarity score" but this ends up stuck when several strong local groups form. Also, there is no particular benefit to 1-dimensional structure here, maybe we can try 2-dimensional neighbor structures to make the result more flexible. Looks a bit like Self-Organizing Maps to me! – PA6OTA Dec 27 '18 at 01:56

0 Answers0