Meaning of Perceptron optimal weights

Question

Im studying the perceptron algorithm. I know that we can use the weights w as the coefficients of the hyperplane that will separate the vectors that need a classifications

In every web page i read up for a detailed explanation, it's generally said that the perceptron algorithm will find the optimal weights w. But optimal in which sense ?

score 1 · Answer 1 · answered Feb 24 '18 at 17:56

Media's explanation is true for regression problems. These are problems where you predict a continuous target variable.

Your image shows a classification problem. Here, the target variable takes only two values (typically -1 and 1 in the Perceptron algorithm). In that case, an optimal solution $w^*$ is a vector of weights that perfectly separates both classes. If such a solution exists, the Perceptron algorithm will find it. But: If there is one optimal solution, there are usually infinitely many other optimal solutions. You can easily see this in your image: You can move the line a little to the left or to the right, and you can rotate it a little, and it still perfectly separates the classes.

So while the Perceptron algorithm will find an optimal solution if there is one, you cannot know which one it will find. That depends on the random starting parameters.

This is different e.g. for support vector machines. Here, there is either no optimal solution or exactly one optimal solution.

Green Falcon · Answer 2 · 2017-12-26T14:51:59.897

Using perceptron, you specify a cost function, Mean Squared Error for regression tasks or maybe Cross Entropy for classification tasks. The input data are the constants and the weights are the parameters of your learning problem. When you specify the cost function, if you have error, the cost would be non-zero. You use algorithms like gradient descent to decrease the cost value. This is an optimization problem which you try to decrease the value of error. When we say Perceptron finds the optimal point, the reason is that the shape of cost function, e.g. MSE is convex and there is just one optimal point which gradient is zero there and the cost has the least possible value there. If you use neural networks, the cost with respect to its parameters, weights, is not convex and you usually can not find the optimal point.

I suggest you looking here and here for understanding more neural nets optimality.

Meaning of Perceptron optimal weights

2 Answers2