One defines $\bar y = (y_1+\cdots+y_n)/n$ and similarly with $x$. The bar means the average of the $n$ observations.
Notice that you don't have $n$ different observed values of $\alpha$ or $\beta$ --- in fact you don't have any observed values of those at all. The idea is that $\{(x_i,y_i):i=1,\ldots,n\}$ is a sample of size $n$ taken from a large population, and $\alpha$ and $\beta$ are in effect properties of that population, whereas $\hat\alpha$ and $\hat\beta$ are estimates of $\alpha$ and $\beta$ based on the observed sample of $n$ data points. If you toss the $n$ observed individuals back into the population and stir it up and take another random sample of size $n$, then the values of $\hat\alpha$ and $\hat\beta$ change, but the (unobservable) values of $\alpha$ and $\beta$ remain the same. Another use of the hat notation is when one writes
$$
\hat y_i = \hat\alpha+\hat\beta x_i.
$$
Notice that $\hat y_i$ has a "hat" and $x_i$ does not. The number $\hat y_i$ is the $i$th "fitted value". It is an estimate of the average $y$-value among members of the population for which the $x$-value is $x_i$. The difference $\hat\varepsilon_i=y_i-\hat y_i$ is the $i$th residual, also equal to $y_i-(\hat\alpha+\hat\beta x_i)$, not to be confused with the $i$th error $\varepsilon_i=y_i-(\alpha+\beta x_i)$. The residuals are observable; the errors are not. The residuals must satisfy the two constraints $\sum_{i=1}^n \hat\varepsilon_i=0$ and $\sum_{i=1}^n \hat\varepsilon_i x_i=0$; the errors are subject to no such constraints. That there are two such linear constraints is why one says there are $n-2$ degrees of freedom for error.
I don't use derivatives to find $\hat\beta$ and $\hat\alpha$; instead I talk about matrices and orthogonal projections. But let's see if I can do it using derivatives:
$$
\begin{align}
\frac{\partial Q}{\partial\alpha} & = -2\sum_{i=0}^{n} (y_i - \alpha - \beta x_i) =0 \tag1 \\[10pt]
\frac{\partial Q}{\partial\beta} & = -2\sum_{i=0}^{n} (y_i - \alpha - \beta x_i)x_i = 0 \tag2
\end{align}
$$
From $(1)$ we conclude that
$$
\alpha = \bar y - \beta \bar x. \tag 3
$$
Thus the least-squares line must pass through the "point of averages" $(\bar x,\bar y)$, which is just the average of the observed data points $(x_i,y_i),\ i=1,\ldots,n$.
Now substitute $(3)$ for $\alpha$ in the $(2)$:
$$
\sum_{i=1}^n (y_i - (\bar y - \beta \bar x) -\beta x_i)x_i,
$$
or:
$$
\sum_{i=1}^n ((y_i-\bar y) -\beta(x_i-\bar x))x_i=0
$$
$$
\beta = \frac{\sum_{i=1}^n (y_i-\bar y)x_i }{\sum_{i=1}^n (x_i-\bar x)x_i}.
$$
That that is the same as
$$
\beta = \frac{\sum_{i=1}^n (y_i-\bar y)(x_i-\bar x) }{\sum_{i=1}^n (x_i-\bar x)(x_i-\bar x)}
$$
I leave as an exercise in algebra for the moment, but I'll post more on it if you have questions about that.