0

Problem

I am trying to use weighted jaccard to compare two weighted sets, S and T, with weights that range from -1 to +1 – for example:

            Object A    Object B    Object C
Set S       0.1         1.0         0.5
Set T       -0.5        -0.2        1.0

I am using eq1 from this paper (which I found from this post):

$$Weighted Jaccard Score = \frac{∑_kmin⁡(S_k,T_k)}{∑_kmax⁡(S_k, T_k)}$$ This method requires positive weights.

Question

Can anyone point me in the direction of Weighted Jaccard-esque approaches for covering both negative and positive weights? The weights are correlations, so I would consider two similar negative weights to be analogous to two similar positive weights in terms of defining similarity. I have considered a couple of options, but they may be flawed (I am not a statistician):

1) Centring the data around 1:

$$Weighted Jaccard Score = \frac{∑_kmin⁡(1-S_k,1-T_k)}{∑_kmax⁡(1-S_k, 1-T_k)}$$ I am not massively keen on this as I think it will make differences seem smaller. E.g. if we put the array [[0.1, 0.2],[0.2,0.1]] into this, the weighted jaccard is 0.2/0.4 = 0.5, but the centred weighted jaccard is 1.6/1.8 = 0.89. I guess I could divide this by 2 though. It will also make negative numbers > positive numbers, but I don’t think this is problematic as the difference seems to be the most important thing.

2) Processing the arrays before using in Weighted Jaccard:

Process input arrays to get absolute arrays as described in the python pseudocode below. Then feed these arrays into standard weighted jaccard. This is what I’m going with right now as there is minimal changes to the established method and my weights are fairly sign independent as long as both weights have the same sign.

def norm_arrays(array1, array2):
    norm1 = []
    norm2 = []
    for i1, i2 in zip(array1, array2):
        if i1 > 0 and i2 > 0:
            norm1 += [i1]
            norm2 += [i2]
        elif i1 <0 and i2 <0:
            #take absolute values if both weights are negative
            norm1 += [abs(i1)]
            norm2 += [abs(i2)]
        else:
            #opposite signs - take the absolute difference
            norm1 += [0]
            norm2 += [abs(i1-i2)]
    return norm1, norm2
Asaf Karagila
  • 405,794

0 Answers0