The formulation of the rank theorem in Rudin is indeed quite confusing. However it is not quite so bad once the interpretation of the theorem is fleshed out.
Rank Theorem (Rudin): Suppose $m, n, r$ are nonnegative integers, $m > r$, $n > r$, $\mathbf{F}$ is a $C^1$-mapping of an open set $E\subseteq \mathbb{R}^n$ into $\mathbb{R}^m$, and $\mathbf{F}'(\mathbf{x})$ has rank $r$ for every $x\in E$. Fix $\mathbf{a} \in E$, put $A = \mathbf{F}'(\mathbf{a})$, let $Y_1$ be the range of $A$, and let $P$ be a projection in $\mathbb{R}^m$ whose range is $Y_1$. Let $Y_2$ be the null space of $P$. Then there are open sets $U$ and $V$ in $\mathbb{R}^n$, with $\mathbf{a} \in U$, $U \subseteq E$, and there is a
$1$-$1$ $C^1$-mapping $\mathbf{H}$ of $V$ onto $U$ (whose inverse is also of class $C^1$) such that
$$\mathbf{F}(\mathbf{H}(\mathbf{x})) = A\mathbf{x} + \varphi(A\mathbf{x})\qquad (\mathbf{x}\in V)$$
where $\varphi$ is a $C^1$-mapping of the open set $A(V)\subseteq Y_1$ into $Y_2$.
Let's contrast this with some other standard textbook formulations of the rank theorem, this one from Lee's Introduction to Smooth Manifolds, for example.
Rank Theorem (Lee): Suppose $M$ and $N$ are smooth manifolds of dimensions $m$ and $n$, respectively, and $F:N\rightarrow M$ is a smooth map with constant
rank $r$. For each $p \in M$ there exist smooth charts $(U,\psi)$ for $N$ centered $p$ and
$(V,\phi)$ for $M$ centered at $F(p)$ such that $F(U)\subseteq V$, in which $F$ has a coordinate representation of the form
$$\hat{F}(x^1,\cdots x^r,x^{r+1},\cdots, x^n) = (x^1,\cdots,x^r,0,\cdots ,0),$$
where $\psi^{-1}\circ F \circ \phi = \hat{F}: \mathbb{R}^n \rightarrow \mathbb{R}^m$.
To make the comparison more transparent, let me disregard the $C^1$ condition in Rudin's version of the theorem and just assume everything is smooth if necessary. We also don't need the full manifold formulation of Lee's version, so we will just take $M = \mathbb{R}^m$ and $N = \mathbb{R}^n$.
The key content of the rank theorem is that the local image of a rank $r$ map is an $r$-dimensional manifold. The difference between the two versions is that Lee's rank theorem fully flattens out the resulting manifold by mapping it to the linear subspace $(x^1,\cdots x^r,0,\cdots 0)$. He accomplishes this by transforming both the domain and the codomain.
On the other hand, Rudin never touches the codomain. He instead chooses coordinates on the domain so that the resulting image submanifold is given as an orthogonal decomposition (more precisely, the decomposition will be orthogonal if $P$ is chosen to be an orthogonal projection, otherwise it will be oblique) into the tangent plane at a point $\mathbf{a}$ (the $A\mathbf{x}$ term), plus deviations orthogonal to this tangent plane (the $\varphi(A\mathbf{x})$ term). Therefore you can view Rudin's version as sort of intermediate, where the image submanifold of $\mathbf{F}$ is straightened out partially along the tangent plane, but not quite all the way.
It follows that to go from Rudin's version to Lee's version, all we need to do is to fully "straighten out" this manifold. The map that does this is $f(\mathbf{x}) = \mathbf{x} - \varphi(P\mathbf{x})$. Intuitively, this map removes the deivations orthogonal to the tangent plane. Then we have
$$f(\mathbf{F}(\mathbf{H}(\mathbf{x})) = f(A\mathbf{x} + \varphi(A\mathbf{x})) = A\mathbf{x} + \varphi(A\mathbf{x}) - \varphi(A\mathbf{x}) = A\mathbf{x},$$
where we use the fact that $PA\mathbf{x}=A\mathbf{x}$ and $P\varphi(A\mathbf{x}) = \mathbf{0}$. Note that $f$ is in fact a smooth diffeomorphism, with inverse $f^{-1}(\mathbf{x}) = \mathbf{x} + \varphi(P\mathbf{x})$. Since $A$ is of rank $r$, there exists invertible matrices $Q_1$ and $Q_2$ such that $Q_1AQ_2 = I_r$, where $I_r$ is the matrix which is zero everywhere except the first $r$ diagonal entries, which are $1$. Let $Q_1\mathbf{y}=\mathbf{x}$. Then
$$Q_1f(\mathbf{F}(\mathbf{H}(Q_2\mathbf{y})) = I_r\mathbf{y}.$$
Therefore we can take the coordinate charts $\psi$ and $\phi$ in Lee's formulation to be $\phi = \mathbf{H}\circ Q_2$ and $\psi^{-1} = Q_1\circ f$.
Since we have the relations between the coordinate charts in Lee's version and the various maps in Rudin's version, we can easily go between the two formulations. These relations also makes precise the missing steps it takes to fully "straighten out" the image manifold.