I'll write down here my thoughts that I am ending up with, may be someone will find it useful. Thanks for everyone who answered!
So as as I see it which is kind of mind boggling - gradient is not a value of a function, neither the argument of a function, it is a relation of both. But gradient gets treated like it is a value of a function in a way that it is placed in the same space where the value of a function "lives" as a vector. This is what made me confused for a while, because I was trying to connect it intuitively with both - change in a function and a change in an argument.
And for a while I thought of derivative as a speed of a change of a function(Feynman's lectures), which has it is full intuitive meaning for physical equations because change in time is only positive there - we cannot go back in time(i.e. if a function depends on time we cannot adjust time parameter decreasing it to get increase in a function), so thinking in this fashion do not rises questions which appear in more common place.
Mu current intuition - given single parameter of a multivariable function we take partial derivative of this function with respect to single parameter and we get rate of a change of a function with respect to a little change in its parameter. If $f(x + \Delta x) - f(x)$ is positive, then function increases give a positive change in its parameter, if it is negative then in decreases given positive change in its parameter. So as will the ratio $(f(x + \Delta x) - f(x)) / \Delta x$ .
Then as it is ratio positive, we can say that increase in a $x$ gives increase in a function value, if ratio is negative then increase $x$ gives decrease in a function value. So $sign((f(x + \Delta x) - f(x)) / \Delta x)$ can be used as a direction in which our parameter $x$ needs to be adjusted to get increase in a function. We basically have only 2 directions in which we can adjust our parameter $x$ value, so it is ether gets increase or decrease in a function(or in some special cases neither I guess, for example $f(x) = 1 + x - x$), and it holds true for every parameter in a multivariable function.
So for $f(x_1,x_2,...,x_n)$ we will get a bunch of directions, $\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_1},\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_2},,...,\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_n}$ and each of them gives direction for a change in a parameter to get increase in a function. Combined in a vector, it tells us direction in a parameter space which we should move to get increase in a function. And in is the stepeest direction because each of our single parameter derivatives points in a direction of increase(and the second and only direction for a single parameter is a decrease in a function value), so by that how partial derivative is defined we get that direction of steepest ascent.
Though I do not see how yet dot product comes in handy.
It sounds and looks quite simple now though it is hard for me to grasp, hope someone might find it helpful.