I am teaching myself Artificial Intelligence from scratch without libraries
I have a decent handle on most of it
UPDATE-EDIT
I am lost however on the next step mathematically after deriving the softmax activation function
as an example to hopefully clarify
lets call Softmax Derivative dSM and if that is the name of the function and the index of the value outputted is i
then it would be dSM_i
when the index i is equal to k which i will define as the ground truth vector index
then
the matrix would look like
(dSM_i * (1 - dSM_i)) (-dSM_i * dSM_k) (-dSM_i * dSM_k)
(-dSM_i * dSM_k) (dSM_i * (1 - dSM_i)) (-dSM_i * dSM_k)
(-dSM_i * dSM_k) (-dSM_i * dSM_k) (dSM_i * (1 - dSM_i))
but I dont know what to do from there
how do i go from there to the equation
derivative Of sum of loss w.r.t derivative of activation
multiplied by
derivative of activation w.r.t derivative of input
multiplied by
derivative of input w.r.t derivative of weight
each row of the jacobian matrix has 3 values when all I need has is 1
Please someone help Thanks I cant find anything yet just how to get to the place i can get to already
where if i == k then its (sm_i * (1 - sm_i) and if i != k then -sm_i* sm_k
there are versions that use the kronekers delta but it all amounts to the same and the diagonal is the same because it represents a matrix of diff indices
where indices go from 1 to 3
– The Thinkrium Sep 11 '23 at 17:12Im still learning so the current example only has 3 inputs to softmax activation
– The Thinkrium Sep 11 '23 at 17:19