These holidays (two weeks) I need to learn probability applied to Machine Learning.
Reading a paper, I have found this: "The training objective is thus to minimize the pixel-wise multi-class crossentropy loss ...", and I have no idea about what it is, so I realize that I need to learn probability.
NOTE: I know what is a loss function and how to minimize it. My problem is with crossentropy. I've been searching about it and it is related to information theory and with probability.
Do you know any good probability crash course?