Consider the linear regression model in the case where number of training samples $n$ < number of parameters $d$ (overparametrized linear regression). The model is
$$y_i = <\beta,\mathbf{x}_i> +\epsilon_i, \quad \beta\in\mathbb{R}^d,\ \epsilon_i\in\mathbb{R},\ i=1,\dots,n$$
where $\epsilon$ is zero-mean noise. Let $\mathbf{X}\in\mathbb{R}^{n\times d}$ be the data matrix:
\begin{align} \mathbf{X} &= \begin{bmatrix} \mathbf{x}_1\\ \mathbf{x}_2\\ \vdots \\ \mathbf{x}_n \end{bmatrix} \end{align}
Now, since $n<d$, $\mathbf{X}^T\mathbf{X}\in\mathbb{R}^{d\times d}$ is not invertible, since $$\text{rank}(\mathbf{X})=\text{rank}(\mathbf{X}^T) \leq \min \{n,d\}=n\implies \text{rank}(\mathbf{X}^T\mathbf{X})\leq n<d$$
There are clearly infinitely many $\beta$ such that
$$\mathbf{y}=\mathbf{X}\beta$$
I'm told that the minimum norm solution is
$$\beta^*=\mathbf{X}^{\dagger}y$$
where, in this specific case $n < d$, $\mathbf{X}^{\dagger}=\mathbf{X}^T(\mathbf{X} \mathbf{X}^T)^{-1}$. Can you help me prove this in a constructive manner? I tried to premultiply by $\mathbf{X}^T\in\mathbb{R}^{d\times n}$ (which is legit since $\mathbf{y}\in\mathbb{R}^n$), but I get
$$\mathbf{X}^T\mathbf{y}=\mathbf{X}^T\mathbf{X}\beta$$
and now I'm stuck because, as noted above, $=\mathbf{X}^T\mathbf{X}$ is not invertible.
EDIT: the solution
$$\beta^*=\mathbf{X}^{\dagger}y$$
is valid iif $\mathbf{X}\mathbf{X}^T$ is invertible. However, since $\mathbf{X}\mathbf{X}^T\in\mathbb{R}^{n\times n}$, $\text{rank}(\mathbf{X}^T\mathbf{X})\leq n $ and $\mathbf{X}$ is a random matrix, meaning that its rows are random vectors sampled from an (unknown) probability distribution, for most nondegenerate probability distributions $\mathbf{X}\mathbf{X}^T$ will be full rank.
EDIT2: a question was mentioned in the comments, which isn't related to my question. I explicitly asked for a constructive proof of the fact that, when $n<d$, the minimum norm solution is
$$\beta^*=\mathbf{X}^T(\mathbf{X} \mathbf{X}^T)^{-1}y$$
instead of the usual one, when $\mathbf{X}$ is full column rank
$$\beta^*=(\mathbf{X}^T \mathbf{X})^{-1}\mathbf{X}^Ty$$
Thus, answers that don't constructively derive this expression, are not answers to this question.