Problem like these lie in the realm of differential topology, and therefore are best viewed with a differential topological viewpoint rather than an analytical viewpoint. In fact the result holds for general $C^k$ and $C^\infty$ manifolds, the proofs more or less being the same. I'll try to break down the answer given here.
If $f$ is a diffeomorphism, we are done. If we can find a submanifold $K \subseteq U$ on which $f$ is a diffeomorphism, we are also done. It's always good to keep in mind the canonical submersion and immersion, namely
$$f: (x_1, \dots, x_n, \dots x_{n + l}) \mapsto (x_1, \dots, x_n), \qquad h: (x_1, \dots, x_n) \mapsto (x_1, \dots, x_n, 0, \dots, 0)$$
Obviously if $g \circ f : (x_1, \dots, x_n, \dots, x_{n + k}) \mapsto g(x_1, \dots, x_n)$ is $C^k$, then so is $g: (x_1, \dots, x_n) \mapsto g(x_1, \dots, x_n)$. To be pedantic, but ultimately illustrative, it's because we can write $g = g \circ f \circ h$, where $h$ is clearly smooth.
We can adopt this perspective by the submersion theorem, which is essentially a variant of the inverse and implicit function theorems. This says that every submersion can be viewed in a local coordinate system as the canonical submersion. If you aren't familiar with coordinates, think of them as functions on your space which captures the Euclidean ($\mathbb R^n)$ structure, e.g. $(x, y) : \mathbb C \to \mathbb R^2$ given by
$$x(z) = \Re (z), \qquad y(z) = \Im (z)$$
or polar coordinates $(r, \theta) : \mathbb C \setminus [0, \infty) \to \mathbb R^2$ given by
$$r(z) = |z|, \qquad \theta(z) = \operatorname{arg} z.$$
But I digress. Since $f$ is a $C^k$ submersion, $f^{-1} (p)$ is a $C^k$ manifold of dimension $l$ (another corollary of the implicit function theorem). In fact, given the right local coordinate system $x$ on $M$, we can view $f^{-1} (p)$ as a level set
$$f^{-1} (p) \cap V = \{ x_{1} = \dots = x_{n} = 0 \}.$$
Notice the connection to canonical immersion $h$. To finish off, since $g \circ f$ is $C^k$ on $U$, it is also $C^k$ on the submanifold $C = \{x_{n + 1} = \dots = x_{n + l} = 0\}$ of dimension $n$, and moreover $f$ is a submersion on $C$ (check this! The point is that $C$ is transverse to $f^{-1} (p)$). Arguing by rank nullity, $f: C \to \mathbb R^n$ is a $C^k$ diffeomorphism, so it maps onto an open neighborhood $V$ of $p$ and admits a $C^k$ inverse $h$, so we write $g = g \circ f \circ h$ locally on $V$. Composition of $C^k$ functions is $C^k$, so we are done.
The gist is that $g \circ f$ is a map on $\mathbb R^{n + l}$ but we can "throw away" $l$-many coordinate directions. Think of $f$ is a submersion as saying the domain is "too big", so by removing $l$-coordinate directions and restricting ourselves to a dimension $n$ subset of $\mathbb R^{n + l}$, we can view $f$ on this submanifold as a diffeomorphism.