An intuitive way of thinking about it is that standard deviation is a measure of spread. So you need some way of saying for every value in the data set, on average how far away is that value from the mean.
So you take the differences $d_i = x_i - \bar{x}$. What do you do with the differences? If you simply add them up, you'll get zero. You want to give equal weight to differences that are positive $d_i=+k$ as negative $d_i=-k$ so the first thing you think of is to take the absolute value: spread = $\sum|d_i|$. This is called the average absolute deviation, and it is, along with related measures, an accepted way to measure spread.
Unfortunately, taking the absolute value doesn't play well with calculus, and you want to use calculus to differentiate the spread to be able to minimize it. Minimizing it is important when for example you want to fit a line to a set of data points. So what function gives equal weight to $\pm d_i$ and is easy to differentiate? The simplest function is taking the square of each difference. The average of squared differences, the variance, is easy to differentiate and we can scale back to the size of our original data items by taking the square root of the sum to get standard deviation.
So, at last, why not take the cubed differences? It's because taking the cube of a negative difference gives a different result to taking the cube of a positive difference of the same magnitude. I.e, if $d_i=-k$ and $d_j=k$ then $d_i^3 \neq d_j^3$. And in fact, with a symmetric distribution, taking the average of the cubed differences gives you zero, which clearly isn't a measure of spread. What taking the cubed differences does is tell you how skewed the distribution is. (If you go further and look at taking fourth powers, you get into something called kurtosis, which measure how fat the tails of the distribution are.)