8

Let $\mathcal{C}$ be a finite set, let $\mathcal{F}(\mathcal{C})$ be the free (non-abelian) monoid over $\mathcal{C}$, and let $n\in\mathbb{N}$ be an integer. For every $k\in \mathbb{N}$, write $S_k$ for the full symmetric group of degree $k$. Formally my question is whether the following statement holds:

There exists an integer $N\in \mathbb{N}$ such that for every word $\alpha\in \mathcal{F}(\mathcal{C})$ of length at least $N$. Then there exists a word $\gamma\in \mathcal{F}(\mathcal{C})$ of length $k$, together with permutations $\sigma_1,\dots, \sigma_n\in S_k$ such that $$\alpha =\beta\cdot \sigma_1(\gamma)\sigma_2(\gamma)\cdots \sigma_n(\gamma)\cdot \delta,$$ for certain $\beta,\delta\in\mathcal{F}(\mathcal{C})$.

What I mean to ask is: If a string is sufficiently long, will it contain a repetition of a number of symbols, placed in any order. For example, the string $$ babcbaca,$$ can be split as $$b|abc|bac|a,$$ where $abc$ and $bac$ are permutations of one another. If $\mathcal{C}=\{a,b,c\}$ one can show that any string of length 10 or more, will contain at least 2 consecutive permutations of the same string, but this becomes exponentially more tedious as the order of the set and $n$ increase.

I was wondering if anyone knows a proof (or counter-example) for this, or has any suggestions on where to search?

  • 1
    Your question is ambiguous: is $k$ fixed in advance? Your statement doesn't seem to imply that it is, but your phrase "a repetition of a given number (presumably $k$) of symbols" does. Or did you mean "a given number ($n$) of permutaions of a set of symbols"? – TonyK Jun 26 '22 at 16:51
  • @TonyK My apologies for the confusion, I'll rewrite it in hopes to remove said ambiguity. $k$ is not fixed in advance, and differs with $\alpha$. – Some Math Student Jun 26 '22 at 17:01
  • 2
    The question can be rephrased as following: given a walk on $\mathbb Z_+^m$ with steps increasing one coordinate by $1$, does it have to visit an arithmetic progression ${a+kb:k=0,\dots,n}$ with $a,b\in Z_+^m, b\ne 0$ of length $n+1$? I believe a counterexample can be constructed for large $m,n$ but some details still escape me. – fedja Jun 26 '22 at 23:05

2 Answers2

4

We shall construct a counterexample for an alphabet of $2$ letters (say, $a$ and $b$) and sufficiently large $n$ (one can compute the explicit value from various inequality constraints we will impose but it is surely far from optimal; in fact I suspect that $n=5$ is enough, though $n=2$ is, probably not).

Let $Q$ be a large odd multiple of $3$ to be chosen later. We shall construct by induction some non-empty sets $S_k$ of strings of length $Q^k$ with the property that for every $k\ge 1$,

(a) Each $s\in S_{k+1}$ is a concatenation of $Q$ strings from $S_{k}$;

(b) If $s\in S_{k+1}$, there is no $T\in\{1,\dots,Q^{k}-1\}$ such that some finite arithmetic progression of integers $t+ qT:\ q=0,\dots,n$ with difference $T$ hits every of the substrings in the property (a) and has the same number of $a$'s in $(t+qT,t+(q+1)T]$ for every $q=0,\dots, n-1$;

(c) If we view $S_k$ as a probability space (with all elements having the same probability $\frac 1{|S_k|}$) and the number $\xi=\xi(s)$ of $a$'s in the string $s\in S_k$ as a random variable, then for every integer $m$, one has $\mathcal P(\xi=m)\le 2^{-k}$.

The main claim is that if $s\in S_k$, then it cannot contain the requested pattern for $n\ge 2Q^2$. Indeed, if it does, then let $t$ be the position of the last element of $\beta$ and let $T$ be the length of $\gamma$ (in the OP notation). Choose an integer $k$ so that $Q^{k-1}\le T<Q^k$. Clearly, $k\ge 1$ and the arithmetic progression $t+qT$ cannot jump over any $Q$-adic block of length $Q^k$. Moreover, since $T\ge Q^{k-1}$, no matter where we started, in at most $Q^2$ steps we'll arrive to the beginning of some $Q$-adic block of length $Q^{k+1}$ and then in $Q^2$ more steps we'll go over all of it, thus creating an arithmetic progression that was ruled out by property (b) (observe that by property (a), every $Q$-adic block of $s$ of length $Q^{k+1}$ is in $S_{k+1}$).

Note that we haven't used property (c) in the main claim at all and it is, indeed, not needed for the conclusion. However, it is crucial for making the induction step work.

For $k=1$, we shall just take $S_1$ to be the sat of all strings of length $Q$. (a) and (b) are vacuous here and (c) immediately follows from the fact that $\mathcal P(\xi=m)=\mathcal P(\xi=Q-m)$ and $m\ne Q-m$ since $Q$ is odd (you can say much more, of course, but I try to keep it low-tech whenever possible).

Suppose that we have already built $S_k$. Consider first the set $\widetilde S_{k+1}$ of all concatenations of $Q$ strings in $S_k$. We need the following technical

Lemma. There exists $Q_0$ such that for every $Q\ge Q_0$ and every probability distribution $p$ on integers with $\|p\|_\infty\le\frac 12$, one has $\|p*p*\dots *p\|_\infty\le \frac 14 \|p\|_\infty$ where $Q$ is the number of $p$'s in the convolution.

The lemma is well-known and not terribly hard but the proof I have is a bit more high-tech than I would like it to be, so I'll wait a few days before posting it in the hope that someone will come up with a completely elementary proof.

Assuming the lemma, we conclude that the probability that the number of $a$'s in a random string from $\widetilde S_{k+1}$ is any given number does not exceed $\frac 14 2^{-k}$. We now just want to remove the bad strings $s$ from $\widetilde S_{k+1}$ to ensure (b). Note that we have $Q^k$ choices of the beginning of the bad progression (in the first $Q$-adic block of $s$ of length $Q^k$, less than $Q_k$ choices of $T$ and, for each $T$, only $T+1\le Q^k$ choices of the number of $a$'s in each interval $(t+qT,t+(q+1)T]$, thus giving us $\le Q^{3k}$ possible bad situations. Now in each particular bad situation, we can choose the intervals $(a_1, b_1],\dots (a_{Q/3},b_{Q/3}]$, with $a_j$ and $b_j$ being some elements of our arithmetic progression in $3j-2$-nd and $3j$-th $Q$-adic block of length $Q^k$ of our string $s\in \widetilde S_{k+1}$, and that we know the exact number of $a$'s in each interval $(a_j,b_j]$. However, since we can insert a arbitrary string from $S_k$ as the $3j-1$-st $Q$-adic block of $a$, the probability that this number is just right in one particular $(a_j,b_j]$ is at most $2^{-k}$. Also, since the intervals $(a_j,b_j]$ are controlled by disjoined triples of $Q$-adic blocks of length $Q^{k}$, these events are independent, so the probability that a fixed bad situation happens is at most $2^{-k\frac Q3}$. Thus the probability that some bad situation happens is at most $$ 2^{-k\frac Q3}Q^{3k}=\left[2^{-\frac Q3}Q^{3}\right]^k\le\frac 12 $$ if $Q$ is so large that $2^{-\frac Q3}Q^{3}\le\frac 12$.

Thus we need to throw out at most one half of the full probability space $\widetilde S_{k+1}$, which would raise the probability of any event at most twice. Since having any fixed number of $a$'s in $\widetilde S_{k+1}$ had probability $\le \frac 14 2^{-k}$, we conclude that in $S_{k+1}$ it may be only twice greater, i.e., $\le 2^{-(k+1)}$, establishing (c).

That's it (modulo the technical lemma, which, as I said, I'll post later).

mathworker21
  • 35,247
  • 1
  • 34
  • 88
fedja
  • 19,348
  • Thank you for your answer. I don't fully understand the use/formulation of the lemma. What are the distribution $p$ and the number $Q$ in our scenario? And how will $k$ factor into this? – Some Math Student Jun 28 '22 at 15:30
  • @SomeMathStudent The distribution is that of the number of $a$'s in the string if we consider the set of strings we are looking at as a probability space with uniform measure (counting measure divided by the number of elements). $Q$ is just a large number, as declared (in particular, it is $\ge Q_0$, so the lemma applies). – fedja Jun 28 '22 at 16:13
  • @fedja Nice answer! I read through your answer (and fixed some typos) and still need to see how it compares to other constructions; see my answer below. – mathworker21 Aug 21 '22 at 20:09
-1

This question has been studied. The key word is "Abelian power".

Definition: We say that a word $\alpha$ contains an Abelian $n^{\text{th}}$-power if there is a word $\gamma$ (of positive length) such that a permutation of it occurs $n$ consecutive times in $\alpha$; in other words, as you say, if there are words $\beta,\delta$ and permutations $\sigma_1,\dots,\sigma_n \in S_{|\gamma|}$ such that $\alpha = \beta \circ \sigma_1(\gamma)\circ\dots\circ\sigma_n(\gamma)\circ \delta$.

Because the quantifiers in your question are a bit ambiguous, I phrase the question below.

Question: Fix a finite alphabet and $n \ge 1$. Is it true that any sufficiently large word over the alphabet contains an Abelian $n^{\text{th}}$ power?

The answer is no. Fedja showed the answer is negative for the alphabet on two symbols for some large $n$. In fact, for the alphabet on two symbols, this was (first, I think) established in 1972 by Justin for $n=5$, see [1]. Dekking [2] in 1978 then showed that the best $n$ one can take is $n=4$; small examples rule out $n=3$, and he gave a (decently explicit) construction for $n=4$.


[1] J. Justin, Characterization of the repetitive commutative semigroups, Journal of Algebra, Volume 21, Issue 1, April 1972, Pages 87-90.

[2] F.M Dekking, Strongly non-repetitive sequences and progression-free sets, Journal of Combinatorial Theory, Series A, Volume 27, Issue 2, September 1979, Pages 181-185.

mathworker21
  • 35,247
  • 1
  • 34
  • 88