Using class weights in Keras with multiple binary outputs which are not simply one-hot-encoded

Question

My labels are binary vectors of length 5, e.g., [0, 0, 1, 1, 1].

My label set is very biased, 1-to-50, where the case [0, 0, 0, 0, 0] is very common while all other combinations are not. I'd like to weight the uncommon versions using the class_weight argument in the model.fit() function in Keras. I get an error message which says class_weight must contain all classes in the data.

The classes {0, 1, 2, 3, 4} exist in the data but not in class_weight.

Based on this I suspect Keras expects that I only have classes of the form [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], etc., i.e., only 1 in each entry in the vector. And that I'm supposed to supply the weights as follows:

weight_classes = {
    0: 50.0,        # for [1, 0, 0, 0, 0]
    1: 1.0,         # for [0, 1, 0, 0, 0]
    # etc.
}

Is there a way to tell Keras to give weights in this fashion:

weight_classes = {
    (0, 0, 0, 0, 0): 1.0,
    (0, 0, 1, 1, 1): 50.0,
    (0, 0, 0, 0, 1): 50.0,
    # etc.
}

Or is there some other way to just say that the all zero case is less important?

I've consider just transform the label set to conform to the one-hot-encoding type of representation. But I'd rather not, because the labels will become very large, and there are interdependence between the labels I think might be better represented by having them encoded as I've done. They are interdependent similar to how words in a sentence are dependent, but where individual predictions are valuable independently.

I have also consider just sticking with sub/supersampling, and skip this whole class weighting.

score 3 · Answer 1 · answered Aug 07 '18 at 17:53

The solution to this question is to use sample_weight in the model.fit() function, (and as the third tuple entry in validation_data if you're using it). For each label sample you simply classify if it is rare or common using some algorithm, and set the weight accordingly.

For me this could be done with the following:

sample_weight = np.max(label_data, axis=1)) * 49 + 1

Here I took advantage of the fact that the common case is all zero per label row, e.g.:

     label_rows                             weights   label type
max([0, 0, 0, 0, 0])*49+1 = 0*49+1 =  0+1 =  1      # common
max([0, 0, 1, 1, 1])*49+1 = 1*49+1 = 49+1 = 50      # rare
max([0, 0, 0, 0, 0])*49+1 = 0*49+1 =  0+1 =  1      # common
max([0, 0, 0, 0, 1])*49+1 = 1*49+1 = 49+1 = 50      # rare
max([0, 0, 0, 1, 1])*49+1 = 1*49+1 = 49+1 = 50      # rare
...

That said, I ultimately recoded my labels so that there were only one positive entry in each label row, making it possible to set class weights. Not because weighting of the samples, but because it gives a better interpretation as a probabilistic output (using softmax). The following function does the weighting:

def create_class_weights(label_dim, rare_weight=10.0, common_weight=1.0):
    class_weight = {
        idx: rare_weight for idx in range(label_dim)
    }
    class_weight[label_dim-1] = common_weight
    return class_weight

It assumes the common case is the last per label row.

Using class weights in Keras with multiple binary outputs which are not simply one-hot-encoded

1 Answers1