I like to prove things like this as algorithmically as possible,
using matrices and row and column operations that are as elementary
as possible.
I construct an invertible matrix $A$ such that $A \matrix(a_1,\ldots,a_n)^T$ = $(1,0,\ldots,0)^T$. Let $B$ be the linear map corresponding to $A$ relative to the basis $\{y_1,\ldots,y_n\}$.
Then the desired basis is $\{x = B^{-1}y_1,\ldots,B^{-1}y_n\}$.
If $n = 1$, take $A = \matrix(1/a_1)$.
If $n = 2$, let $c = \gcd(a_1,a_2) = 1$ and choose $s_1$ and $s_2$ such that
$s_1 a_1 + s_2 a_2 = c$. Take
$$A = \begin{bmatrix} s_1 \ \ \ \ s_2 \\ -a_2/c \ \ a_1/c \end{bmatrix}.$$
Then $A \matrix(a_1,a_2)^T = \matrix(c,0)^T$. In this step,
$c = 1$, but we write it as $c$ for use in the induction step.
$A$ is invertible because its determinant is $1$.
If $n > 2$, use induction on the number of nonzero entries in $\matrix(a_1,\ldots,a_n)$.
If there is only $1$ nonzero entry, then it is a $\gcd$ of the entries and
in fact $1$ by the choice made in the previous step. Apply the most
elementary row operation of interchanging rows to move the nonzero entry
to the first entry. Row interchange operations expressed in matrix
form correspond to multiplication on the left of the matrix $A$ constructed
in previous stages of the induction. The product remains invertible.
If there are $2$ or more nonzero entries, first interchange rows so that
$a_1$ and $a_2$ are nonzero (this will actually make the row interchange
in the above step never occur). Apply the $n = 2$ step to the first
$2$ coordinates (more precisely, to the first $2$ rows) (augment the
$2 \times 2$ matrix with a $(n-2) \times (n-2)$ identity matrix).
Now $c = \gcd(a_1,a_2)$ is not necessarily a unit, but $A$ is
invertible because we divided its second row by $c$ to make its determinant
$1$. Most importantly, reducing
$\matrix(a_1,a_2,a_3,\ldots)^T$ to $\matrix(c,0,a_3,\ldots)^T$
leaves the $gcd$ of the entries unchanged, so the induction works.
All the matrices constructed correspond to not-always-elementary row
operations when they are used to multiply column vectors on the left.
Some are non-elementary since they use a linear combination of rows
while for pure elementary row operations only the special linear
combination $unit * row_1 + r_2 * row_2$ is permitted.
If $R$ is Euclidean, then the Euclidean algorithm essentially gives
a construction of the $2 \times 2$ $A$ as a product of matrices for
elementary row operations, so the final $A$ is also such a product.
I don't know of any practical algorithms for constructing the
$2 \times 2$ $A$ for PIDs that are not Euclidean.
This is the dual of a special case of the algorithm for reduction to
Hermite normal form. In the general case of it, we start with an
arbitrary matrix instead of a column vector, and do reversible (left)
row operations on it to get a triangular matrix (with some complications
for the matrix not being square). Hermite normal form is what you
can naturally get by doing reversible (right) column operations
instead. There is also Smith normal form. It is what you can naturally
get by doing both reversible row operations and reversible column
operations -- a diagonal matrix. All of the elementary theory of
modules over PIDs can be read off from these matrices.
I just edited the question to clarify that $y\neq 0$, I do need this assumption to make the sum direct. Thanks for your advice!
– Camilo Arosemena Serrato May 14 '13 at 05:26