Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

3382 questions
199
votes
1 answer

Derivative of Softmax loss function

I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function: \begin{equation} p_j = \frac{e^{o_j}}{\sum_k e^{o_k}} \end{equation} This is used in a loss function of the…
Moos Hueting
  • 2,187
140
votes
11 answers

What is the difference between regression and classification?

What is the difference between regression and classification, when we try to generate output for a training data set $x$?
Bober02
  • 2,604
127
votes
8 answers

derivative of cost function for Logistic Regression

I am going over the lectures on Machine Learning at Coursera. I am struggling with the following. How can the partial derivative of $$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$ where…
68
votes
4 answers

Why we consider log likelihood instead of Likelihood in Gaussian Distribution

I am reading Gaussian Distribution from a machine learning book. It states that - We shall determine values for the unknown parameters $\mu$ and $\sigma^2$ in the Gaussian by maximizing the likelihood function. In practice, it is more convenient…
Kaidul Islam
  • 783
  • 1
  • 6
  • 7
60
votes
5 answers

Why divide by $2m$

I'm taking a machine learning course. The professor has a model for linear regression. Where $h_\theta$ is the hypothesis (proposed model. linear regression, in this case), $J(\theta_1)$ is the cost function, $m$ is the number of elements in the…
Daniel
  • 1,355
45
votes
2 answers

How is logistic loss and cross-entropy related?

I found that Kullback-Leibler loss, log-loss or cross-entropy is the same loss function. Is the logistic-loss function used in logistic regression equivalent to the cross-entropy function? If yes, can anybody explain how they are related? Thanks
36
votes
3 answers

Mathematical preparation for postgraduate studies in Linguistics

I am an undergraduate student in Mathematics and I would like to continue my postgraduate studies in the harder, more mathematical aspects of Linguistics. What exactly would that include is unknown even to me, but possible areas of interest would…
35
votes
3 answers

Mathematical introduction to machine learning

At first glance, this is once again a reference request for "How to start machine learning". However, my mathematical background is relatively strong and I am looking for an introduction to machine learning using mathematics and actually proving…
32
votes
2 answers

Invert the softmax function

Is it possible to revert the softmax function in order to obtain the original values $x_i$? $$S_i=\frac{e^{x_i}}{\sum e^{x_i}} $$ In case of 3 input variables this problem boils down to finding $a$, $b$, $c$ given $x$, $y$ and…
31
votes
7 answers

What are the best books to study Neural Networks from a purely mathematical perspective?

I am looking for a book that goes through the mathematical aspects of neural networks, from simple forward passage of multilayer perceptron in matrix form or differentiation of activation functions, to back propagation in CNN or RNN (to mention some…
26
votes
1 answer

Log of Softmax function Derivative.

Could someone explain how that derivative was arrived at. According to me, the derivative of $\log(\text{softmax})$ is $$ \nabla\log(\text{softmax}) = \begin{cases} 1-\text{softmax}, & \text{if $i=j$} \\ -\text{softmax}, & \text{if $i \neq…
21
votes
3 answers

Logistic regression - Prove That the Cost Function Is Convex

I'm reading about Hole House (HoleHouse) - Stanford Machine Learning Notes - Logistic Regression. You can do a find on "convex" to see the part that relates to my question. Background: $h_\theta(X) = sigmoid(\theta^T X)$ --- hypothesis/prediction…
21
votes
4 answers

Deriving the normal distance from the origin to the decision surface

While studying discriminant functions for linear classification, I encountered the following: .. if $\textbf{x}$ is a point on the decision surface, then $y(\textbf{x}) = 0$, and so the normal distance from the origin to the decision surface is…
20
votes
2 answers

What is divergence in image processing?

What is the difference between gradient and divergence? I understood that gradient points in the direction of steepest ascent and divergence measures source strength. I couldn't relate this to the concept of divergence in image processing. What is…
19
votes
4 answers

Deriving cost function using MLE :Why use log function?

I am learning machine learning from Andrew Ng's open-class notes and coursera.org. I am trying to understand how the cost function for the logistic regression is derived. I will start with the cost function for linear regression and then get to my…
1
2 3
99 100