3

The scikit-learn docs say it is the signed distance of that sample to the hyperplane.

I've taken the sum of the weights and their corresponding coefficient and added the intercept to that sum but this does not return the value given by the decision_function method.

When doing classification tasks how is the signed distance different from taking the linear summation of the weights, feature values, and intercept?

For example, If I have the following feature values, intercept, and weights I would calculate the distance from the point to the hyperplane as...

$x_1 = 3, x_2 = 2$

$w_1 = -0.28, w_2 = -0.20$

$b = -0.68$

$$\theta^T \cdot \textbf{X} + b$$

When sklearn calculates the decision_function it returns

return safe_sparse_dot(X, self.coef_.T,
                               dense_output=True) + self.intercept_ 

Where self is our fitted classifier and X is the example for which we are calculating the decision function.

How can I replicate what decision_function is doing?

berrypy
  • 213
  • 3
  • 7

1 Answers1

1

Both approaches seem to do the same:

safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_ 

seems to compute a dot product of the coefficients (you called these $w_1, w_2$, and combined them to $\theta$), and the point ($x_1, x_2$, combined to $\mathbf{X}$). The intercept $b$ is added. So the formulas should do exactly the same, once in mathematical, once in Python notation: they compute the distance to the separating hyperplane, called the margin.

You also wanted to know

how it works in the case of SGD Classifier

As that also fits a linear classifier, the computation should be the same (even though the hyperplane might have different parameters).

serv-inc
  • 111
  • 5