3

It's a very well known result by Gauss that $(\mathbb{Z}/2^n \mathbb{Z})^\times = \langle -1 \rangle \times \langle 3 \rangle \cong C_2 \times C_{2^{n-2}}$.

Consider a faithful action $\mathrm{mul}: (\mathbb{Z}/2^n \mathbb{Z})^\times \to \mathrm{Sym}(\mathbb{F}_2^n)$ that is obtained by representing each element of $\mathbb{Z}/2^n \mathbb{Z}$ by a vector of its binary digits in $\mathbb{F}_2^n$. We can verify that of all functions of this action, only $\mathrm{mul}_3^0$ and $\mathrm{mul}_3^{2^{n - 3}}$ are $\mathbb{F}_2$-linear.

On the other hand, let $n$ be a power of $2$, and consider the linear shift operator $L \in \mathrm{End}_{\mathbb{F}_2}(\mathbb{F}_2^n)$, $(x_1, x_2, \ldots, x_n) \mapsto (0, x_1, x_2, \ldots, x_{n-1})$. It generates a commutative algebra $\mathbb{F}_2[L] \cong \mathbb{F}_2[t]/\langle t^n \rangle \cong \mathbb{F}_2[C_n]$ (related to the CPU instructions for carryless multiplication), with the group of units that's very easy to understand (see https://mathoverflow.net/q/138025/44739, which references Algebraic K-Theory by Hyman Bass, although the proof of the structure proposition in it only uses normal techniques of abstract algebra).

Proposition (Bass): For any prime $p$, we have a group isomorphism $(\mathbb{F}_p[t] / \langle t^{p^n} \rangle)^\times \cong \mathbb{F}_p^\times \times \prod_{j \in J} \langle 1 + t^j \rangle$, where $J = \{0 < j < n \mid (j, p) = 1 \}$, and $\langle 1 + t^j \rangle \cong C_{2^{e(j)}}$, where $e(j)$ is defined by $p^{n - e(j)} \leq j < p^{n - e(j) + 1}$.

In computer science terms, we can say that that $(\mathbb{F}_2[L])^\times$ is generated by XOR-shifts $x \mapsto x \oplus (x \mathbin{\ll} j)$ for all odd $j < n$ (when $n$ is a power of $2$).

Many non-cryptographic hash functions are of the form $m_1 \circ c_1 \circ \ldots \circ m_k \circ c_k$, where $m_i \in \mathrm{mul}(\mathbb{Z}/2^n \mathbb{Z})^\times$ and $c_i \in (\mathbb{F}_2[L])^\times$. As far as I know, there is very little in the way of understanding their algebraic properties. There is a hobbyist project (https://github.com/skeeto/hash-prospector) to find hash functions of this form that are numerically close to satisfying the Strict Avelanching Criterion (see https://en.wikipedia.org/wiki/Avalanche_effect), but it proceeds by using heuristic search, and while it's been successful for $n = 32$, the search space for $n = 64$ is just too vast.

I'd like to see if there is a way to reduce the search space, which boils down to understanding the structure of the subgroup $\langle \mathrm{mul}((\mathbb{Z} / 2^n \mathbb{Z})^\times) \cup (\mathbb{F}_2[L])^\times \rangle$ of $\mathrm{Sym}(\mathbb{F}_2^n)$, but I don't see any obvious way forward (other than quotienting out the common subgroup generated by $\mathrm{mul}_3^{2^{n-3}}$). I'll be happy even just understanding smaller subgroups such as $\langle \mathrm{mul}_3, 1 \oplus L^j \rangle$ for some odd $j$. My main obstacle is just that the algebraic normal form of $\mathrm{mul}_3$ is difficult to work with.

Can you suggest techniques that are worth exploring, please? Book and paper suggestions are also very welcome!

UPD: I incorrectly identified $\langle \mathrm{mul}((\mathbb{Z} / 2^n \mathbb{Z})^\times) \cup (\mathbb{F}_2[L])^\times \rangle$ with $(\mathbb{Z} / 2^n \mathbb{Z})^\times \coprod_{\mathrm{Sym}(\mathbb{F}_2^n)} (\mathbb{F}_2[L])^\times$ in the earlier version of the question, sorry about that.

  • @tkf Thanks for taking a look! I meant the case where $L$ represents multiplication by $2$. I'm not sure that I follow your argument. The units of $\mathbb{F}_2[L]$ can be identified with $1 + \langle L \rangle$, for example $1 \oplus L \oplus L^2$ is such a unit. The units of $\mathbb{Z}/ 2^n \mathbb{Z}$ and $\mathbb{F}_2[L]$ cannot be the same, because although these groups have the same order, they have different classification. Also, $\mathbb{F}_2[L]^\times \leq \mathrm{GL}_n(\mathbb{F}_2)$, but only two elements of $(\mathbb{Z} / 2^n \mathbb{Z})^\times$ are $\mathbb{F}_2$-linear. – Aleksei Averchenko May 12 '24 at 15:24
  • $1+\langle L\rangle$ is just the odd numbers in $\mathbb{Z}/2^n\mathbb{Z}$, which are indeed the units. – tkf May 12 '24 at 15:27
  • 1
    Sorry - I see - the issue is carrying. – tkf May 12 '24 at 15:32
  • @tkf Yep, the carrying ruins my day big time :) So far the only framework I know of that fits all of $L$, $a + b \mod 2^n$, and $a \oplus b$ is that of T-Functions introduced by Klimov and Shamir (https://link.springer.com/content/pdf/10.1007/3-540-36400-5_34.pdf), with a related result by Warren (https://dl.acm.org/doi/pdf/10.1145/359605.359632). I'm trying to pursue it, but I'm not positive that it will yield significant progress just yet. – Aleksei Averchenko May 12 '24 at 15:37

1 Answers1

1

You have obviously thought deeply about this problem. I hope my answer here provides some lines of thought that you have not already come across. Apologies if not.

We may place the standard $2$-adic metric on $\mathbb{F}_2^n$. That is, if two sequences of $0$'s and $1$'s agree in the first $k$ places, but not any further, then the distance between them is $\frac1{2^k}$. Let $G_n$ denote the group of isometries of this metric space.

The actions of both $(\mathbb{Z}/2^n\mathbb{Z})^\times$ and $(\mathbb{Z}[t]/\langle t^n\rangle)^\times$ on $\mathbb{F}_2^n$ are isometries. Two see this, note that when we make the natural identification of either ring with $\mathbb{F}_2^n$, the distance between two elements is determined by the greatest power of $2$ (in $\mathbb{Z}/2^n\mathbb{Z}$) or $t$ (in $\mathbb{Z}[t]/\langle t^n\rangle$) dividing the difference. In both cases this power is not altered by multiplication by a unit.

Let $H_n=\langle (\mathbb{Z}/2^n\mathbb{Z})^\times\cup (\mathbb{Z}[t]/\langle t^n\rangle)^\times\rangle\subset {\rm Sym} (\mathbb{F}_2^n)$ be the group we are seeking to understand. We have shown that $$H_n\subseteq G_n.$$

The groups $G_n$ can be understood in a number of ways. For example inductively: $$G_{n+1}=(G_n\times G_n)\rtimes C_2,$$ where the conjugation action of $C_2$ is swapping the two factors of $G_n$. To see this note that we have a homomorphism $G_{n+1}\to C_2$ sending an isometry to its action on the first digit. The kernel of this map consists of ordered pairs: an isometry on the set of sequences beginning with $0$, and an isometry on the set of sequences beginning with $1$.

Thus we know that $|G_n|=2^{2^n-1}$. Another approach is to consider the homomorphism $G_{n+1}\to G_n$ given by 'forgetting' the action on the last digit. The kernel of this homomorphism is all isometries, which fix the first $n$ digits. For each $n$ digit sequence, we can either swap the two $n+1$ digit extensions, or not, so this kernel is $C_2^{2^n}$. Thus we have the short exact sequence: $$C_2^{2^n}\to G_{n+1}\to G_n.$$

Note as a sanity check we have: $|C_2^{2^n}|\cdot|G_n|=2^{2^n}\cdot 2^{2^n-1}=2^{2^{n+1}-1}=|G_{n+1}|$.

Now let us return to the main question - what is $H_n$? We know $H_n$ is contained in an index $4$ subgroup of $G_n$: namely the index $4$ subgroup which either fixes the first $2$ digits, or acts as multiplication by $3\in (\mathbb{Z}/2^n\mathbb{Z})^\times $ (or equivalently $1+t\in \mathbb{Z}[t]/\langle t^n\rangle$) on them.

Thus, in particular we know that $|H_n|=2^m$, with $m\leq 2^n-3$.

Further, we know that $H_n$ acts separately on strings starting with a $0$ and strings starting with a $1$. Thus, forgetting the action on strings starting with a $0$, we get a homomorphism: $$f\colon H_n\to G_{n-1}.$$

Question 1 Is the homomorphism $f$ surjective?

Question 2 Is the homomorphism $f$ injective?

If the answer to both questions is yes, then we have identified $H_n$ as precisely $G_{n-1}$. For Question 2, the answer being yes corresponds to the action of an element of $H_n$ being completely determined by its action on strings beginning with a $1$. The only evidence we can offer for this is that it holds for elements of both $(\mathbb{Z}/2^n\mathbb{Z})^\times$ and $(\mathbb{Z}[t]/\langle t^n\rangle)^\times$. Of course this does not imply that it holds for compositions of such elements.

For Question 1 we can say a little more. We can verify that the answer is yes up to $n=3$, and make an attempt at an inductive proof that the answer is yes. Note that the answer to Question 1 being yes would imply that $H_n\subset G_n$ and $H_n$ has $G_{n-1}$ as a quotient, effectively trapping it between two known groups.

$H_1$ is the trivial group. $H_2$ is generated by the swapping of the two sequences beginning with $1$ as this coincides with both $(\mathbb{Z}/2^2\mathbb{Z})^\times$ and $(\mathbb{Z}[t]/\langle t^2\rangle)^\times$ acting on $\mathbb{F}_2^2$. Thus the map $f$ is surjective for $n=2$.

For $n=3$ the inverse of the action of $(1+t)$ composed with the action of $3$ fixes all sequences of length $3$, except for $101$ and $100$, which it swaps. Conjugating this swap by the action of $3$ (or $1+t$) we can swap only the other pair of sequences which begin with a $1$.

As we already know that $f$ is surjective for $n=2$, we know that elements of $H_3$ can induce any element of $G_2$ up to action on the final digit (that is swapping pairs of sequence which begin with a $1$ and differ only in the final digit). However we have now shown that $H_3$ also contains these actions on the final digit. Thus $f$ is surjective for $n=3$.

Finally we attempt to generalise this to an inductive proof that $f$ is always surjective. Suppose that $f$ is surjective for some strings of length $n-1$. Then we know that $H_n$ can induce any element of $G_{n-1}$ on strings beginning with a $1$, up to swapping some final digits. Thus to show that $f$ is surjective on strings of length $n$, we must show that $H_n$ contains an element that fixes the first $n-1$ digits of strings beginning with a $1$, and flips the final digit on any desired subset of pairs of strings that start with a $1$ and agree on the first $n-1$ digits.

Firstly consider the actions of $x=(1+t^{n-2})$, and with $y=(1+2^{n-2})$. These actions only differ in the final digit, on strings beginning with a $1$ and whose second to last digit is a $1$. Thus $xy^{-1}\in H_n$ fixes the first $n-1$ digits of all sequences beginning with a $1$, and flips the final digit of all such sequences which have a $0$ as second to last digit.

By the inductive hypothesis, we have an element $z\in H_n$ which flips the second to last digit of the four sequences beginning $1$ followed by any specified sequence $\alpha$ of $n-3$ digits, and fixes all other strings.

Then $z(xy^{-1})z^{-1}(xy^{-1})\in H_n$ will flip the final digit on the two pairs of strings beginning $1$ followed by $\alpha$ (as all the other flips are common to both $z(xy^{-1})z^{-1}$ and $(xy^{-1})$, hence cancel). Unfortunately this is as far as I can get. If we could flip the final digits of any single pair of sequences beginning with $1$ and agreeing on the first $n-1$ digits, then we would have completed the inductive step. However I have only been able to show that this can be done $2$ at a time - for a pair of such pairs, differing in the second to last digit.


Proof that $f$ is injective:

Let $L$ denote the shift operator on sequences in $\mathbb{F}_2^n$, corresponding to multiplication by $2$ and multiplication by $t$. As $L$ commutes with all elements of $(\mathbb{Z}/2^n\mathbb{Z})^\times$ and all elements of $(\mathbb{Z}[t]/\langle t^n\rangle)^\times$, we know that $L$ commutes with all elements of $H_n$.

Suppose $f(g)=f(g')$. Given $x\in \mathbb{F}_2^n$, we have $x=L^ru$, where the first digit of $u$ is $1$. Thus $$gx=gL^ru=L^rgu=L^rg'u=g'L^ru=g'x,$$ so $g=g'$.


From comments below by @AlekseiAverchenko we know $f$ is not surjective in general. In particular, not for $n=4$:

We have $|H_1|=2^0, |H_2|=2^1, |H_3|=2^3, |H_4|=2^6$.

For $n=1,2,3$ we have $H_n\cong G_{n-1}$. However $|G_3|=2^7$, so $H_4\not\cong G_3$. In general $H_n\subseteq G_{n-1}$.

tkf
  • 15,315
  • Thanks a ton! I haven't thought about 2-adics at all, this looks very promising indeed! – Aleksei Averchenko May 16 '24 at 18:11
  • 1
    Thanks for the bonus. There is still a bit of work to do here and I will keep thinking about it and try to tidy up the latter part of the answer. – tkf May 16 '24 at 19:16
  • So far I reproduced your preliminary results about the general 2-adic isometries. I also just realized that 2-adic isometries are precisely the invertible T-functions $\mathbb{F}_2^n \to \mathbb{F}_2^n$ (by the counting argument: we start with $1$, then add some $g: \mathbb{F}_2^{k-1} \to \mathrm{Sym}(\mathbb{F}_2)$ at each iteration), which is really cool. But I also realized that after studying $H_n$, I'll also need to study its version with $(\mathbb{F}_2[R])^\times$ instead of $(\mathbb{F}_2[L])^\times$ ($R = L^T$) :( – Aleksei Averchenko May 23 '24 at 03:38
  • Alright, I think I'm at a point where I understand what $f: H_n \to G_{n-1}$ is. I'll try to examine the case $n = 8$ on the weekend (it's the smallest $n$ such that $-1$ is not a power of $3$ $\mod 2^n$, so if the small-$n$ intuition fails, my guess is it's very likely to fail there). – Aleksei Averchenko May 23 '24 at 22:50
  • 1
    Do you mean $n=3$? There $f$ is surjective but I am not sure if it is injective. – tkf May 24 '24 at 01:13
  • I checked and the order of $f(H_4)$ is 64, it's generated by $a = f(\mathrm{mul}3) = (1,3,9,11) (15,13,7,5)$, $b = f(\mathrm{mul}{-1}) = (1,15) (3,13) (5,11) (7,9)$, $c = f(1 \oplus L) = (1,3,5,15) (7,9,11,13)$. I studied $\langle a, c\rangle$ by hand, its relations are $a^4 = b^4 = 1$, $ca = a^3 c^3$, $c^2 a = a c^2$, $c^3 a = a^3 c$. Its center is $\langle a^2, c^2\rangle \cong C_2^2$, and $\mathrm{Inn}(\langle a, c\rangle) \cong C_2^2$, but the extension isn't split, I haven't worked it out yet. I double checked the order of 16 in GAP, and also computed the order of $f(H_4)$ there. – Aleksei Averchenko Jun 02 '24 at 06:28
  • Since $|G_3| = 128$, $f$ isn't surjective. I haven't worked out if it's injective just yet. I believe that there should be a nicer ambient group: $G_{n-2} \times G_{n-2} \times G_{n-1} < G_n$ has index 4, but I suspect that in fact $H_n < G_1 \times \cdots \times G_{n-1}$ (it's certainly the case for $H_4$). Note that since $G_0 = 1$, $G_1 \cong C_2$, $G_2 \cong D_4$, it makes sense that $f: H_3 \to G_2 \cong D_4$ has to be surjective (we have to fit $C_4$ and $C_2$ not commuting with it inside $C_2 \times C_2 \times D_4$, but in hindsight it's clear that it's a low order fluke. – Aleksei Averchenko Jun 02 '24 at 07:09
  • 1
    On reflection, it is clear that $f$ is always injective. I added a proof at the end. Good job realising that surjectivity was just a low order fluke. – tkf Jun 04 '24 at 05:52
  • The elements $c, bc^{-1}bac^{-1}$ generate a copy of $D_\Box$, and the elements $cbcb,c^2ba,a^2$ generate a copy of $C_2^3$. $H_4$ is the semidirect product of these subgroups: $H_4=C_2^3\rtimes D_\Box$. This generalises, in that for any $n$, $H_n$ is an extension of $H_{n-1}$ by $C_2^k$ for some integer $k$. I am not sure if it always splits like it does for $n=4$ though, or what $k$ is in general. – tkf Jun 05 '24 at 16:31
  • Thanks, I double checked it. Out of curiosity, how did you come up with these generators? – Aleksei Averchenko Jun 14 '24 at 08:18
  • 1
    I am forgetting already, but I think it was a combination of your analysis of $D_\Box$ in terms of cycles in your comment, combined with my analysis of the kernel $C_2^3$ in the main answer above (where I tried and failed to prove surjectivity). – tkf Jun 14 '24 at 18:07