Here's one simple way with a few parameters to vary. Don't know how well they mesh with the hashes (you need to be more specific, if special constraints come from there).
Split that $128$-bit integer $x$ into six chunks. Five chunks of $24$ bits each and a single chunk with only $8$ bits. So write
$$
x=x_6\cdot 2^{120}+x_5\cdot2^{96}+x_4\cdot2^{72}+x_3\cdot2^{48}+x_2\cdot 2^{24}+x_1,
$$
where $x_1,x_2,x_3,x_4,x_5$ have $24$ bits each, treated as integers in the range $0\le x_i\le 2^{24}-1$, and $0\le x_6\le 255$ has $8$ bits.
Map $x$ into a $4\times 4$ matrix $M(x)$ like follows:
$$
M(x)=\left(\begin{array}{cccc}
1&2^{16}x_6&x_5&x_3\\
0&1&x_4&x_2\\
0&0&1&x_1\\
0&0&0&1\end{array}\right).
$$
We view all the entries as integers modulo $2^{24}$. That explains the extra factor $2^{16}$ of $x_6$. Without it $x_6$ would be determined only modulo $2^8$, so we tag $16$ zeros to the LSB end.
The operation, call it $\star$, is then the usual product of $4\times4$ matrices. In other words, we define $x\star y$ to be the $128$-bit chunk determined by
$$
M(x\star y)=M(x)M(y),
$$
where on the right hand side we perform the usual matrix multiplication with all arithmetic on the entries done modulo $2^{24}$. It is easy to check that the product $M(x)M(y)$ is of the same form for all pairs $x,y\in[0,2^{128}-1]$.
You get "more non-commutativity" by using larger matrices. The method is to split those
128 bits into equal size chunks and fill the upper triangular part with them. One chunk will necessarily be smaller than the others, and you should put that into the $(1,2)$ position, padded with an appropriate number of zeros. Other variations with more complicated zero padding schemes are also possible.
The real question may be whether this is non-abelian enough? Using large chunks leaves a lot of linearity that an attacker may be able to exploit. For example, in the above example if $x$ and $y$ both have the highest $56$ bits all zeros, then
$$
\left(\begin{array}{cccc}
1&0&0&x_3\\
0&1&0&x_2\\
0&0&1&x_1\\
0&0&0&1\end{array}\right)\left(\begin{array}{cccc}
1&0&0&y_3\\
0&1&0&y_2\\
0&0&1&y_1\\
0&0&0&1\end{array}\right)
=\left(\begin{array}{cccc}
1&0&0&x_3+y_3\\
0&1&0&x_2+y_2\\
0&0&1&x_1+y_1\\
0&0&0&1\end{array}\right).
$$
In other words, we only have three additions modulo $2^{24}$. May be such problems, if at all relevant, should go into a follow up question?
I'm not sure, but the comments underneath leave nagging fears. Here's what the operation comes to
$$
\left(
\begin{array}{cccc}
1 & 2^{16} (x_6+y_6) & x_5+2^{16} x_6 y_4+y_5 &
x_3+x_5 y_1+2^{16} x_6y_2+y_3 \\
0 & 1 & x_4+y_4 & x_2+x_4 y_1+y_2 \\
0 & 0 & 1 & x_1+y_1 \\
0 & 0 & 0 & 1 \\
\end{array}
\right).
$$
So if $x\star y=z$, then
$$
\begin{aligned}
z_6&\equiv x_6+y_6\pmod{2^8},\\
z_5&\equiv x_5+y_5+2^{16}x_6y_4\pmod{2^{24}},\\
z_4&\equiv x_4+y_4\pmod{2^{24}},\\
z_3&\equiv x_3+y_3+x_5y_1+2^{16}x_6y_2\pmod{2^{24}},\\
z_2&\equiv x_2+y_2+x_4y_1\pmod{2^{24}},\\
z_1&\equiv x_1+y_1\pmod{2^{24}}.
\end{aligned}
$$
You see that the various parts $z_j$ are only affected by a limited number of parts of the factors $x$ and $y$.