1

I have a dataset that consists of 4 values in a target variable. I have performed Ordinal Encoding over that which worked for me but my question here's that if I apply one-hot encoding can I solve this problem?. As it would be 4 new columns that are generated from a single target variable.

|classes|classes_a|classes_b|classes_c|classes_d
|a      |1        |0        |0        |0
|------ |---------|---------|---------|---------
|b      |0        |1        |0        |0
|------ |---------|---------|---------|---------
|c      |0        |0        |1        |0
|-------|---------|---------|---------|---------
|d      |0        |0        |0        |1

Now I have these 4 columns classes_a,classes_b,classes_c, and classes_d. How can I deal with its requirement?

Shayan Shafiq
  • 1,008
  • 4
  • 13
  • 24
Adnan Khan
  • 11
  • 2

1 Answers1

1

As pointed out in the comments, the actual question is:

Would it still be possible to train the KNN model if you one-hot encoded the response data?

The answer is yes:

In case you have one target (one column) with four classes, you have a multiclass setting.

In case you have four targets (four columns) with binary class (1, 0), you have a multilabel setting.

See sklearn's overview of different approaches.

With Keras you can use the "functional API" to model a mult-label (multi-output) case using neural nets. You would write the model like this:

# Model
...

Outputs

out1 = Dense(1)(x) out2 = Dense(1)(x)

Compile/fit the model

model = Model(inputs=Input_1, outputs=[out1,out2]) model.compile(optimizer = ..., loss = ...)

Add actual data here in the fit statement

model.fit(train_data, [train_targets,train_targets2], epochs=..., batch_size=..., validation_split=0.2)

Here is a regression example of the functional API, which can be easily changed to classification.

However, the intuitive way to solve a problem like yours is to simply do multiclass-classification. I don't see a benefit in rearranging the target as "one hot".

Peter
  • 7,896
  • 5
  • 23
  • 50