Varying v. Unchanged String Length
If, as you initially indicated in response to my comment, the length of the transformed string can differ from the length of the original, then this problem becomes vastly more difficult because the set of distinct editing operations (operations that might potentially yield a distinct result) includes all 18 of the following:
- length +3 = 3 insertions
- length +2 = 2 insertions and 0 or 1 substitutions
- length +1 = 1 insertion and 0, 1, or 2 substitutions
- length unchanged = 0, 1, 2, or 3 substitutions; 1 deletion, 1 insertion, and 0 or 1 substitutions
- length -1 = 1 deletion and 0, 1, or 2 substitutions
- length -2 = 2 deletions and 0 or 1 substitutions
- length -3 = 3 deletions
Whenever multiple insertions or multiple deletions are performed, moreover, counting becomes inordinately difficult. If, on the other hand, we require that the length remain unchanged, we have only 6 editing combinations to consider and the problem becomes more tractable because none of those 6 combinations involves multiple insertions or multiple deletions. Indeed, the counting for each of the six cases becomes relatively straightforward; the trickiest bit is discounting to avoid double-counting instances when two different editing operations will produce the same string--a problem solved in an answer to another question.
The Six Cases and the Danger of Overcounting
To get our bearings initially, we can generalize this logic:
- The string must maintain $n$ symbols.
- The expected number of groups of identical symbols is $\frac{n+1}{\sigma}$
- The expected number of adjacent, identical symbol pairs is $\frac{n-1}{\sigma}$
- The number of ends is 2.
A fine-grained consideration of the five possible types of single edits thus yields:
- The number of possible substitutions is $n(\sigma-1)$
- The expected number of shrinkages of a group of identical symbols is $\frac{n+1}{\sigma}$
- The expected number of expansions of a group of identical symbols with the same symbol is $\frac{n+1}{\sigma}$
- The expected number of insertions into a group of identical symbols with the same symbol is $\frac{n-1}{\sigma}$
- The number of possible insertions of a different character at the beginning or end is $2(\sigma-1)$
We can now apply that basic logic to each of our six cases:
no edits
Performing no edits whatsoever yields only the original string, so 1 result for this case.
one substitution
There are $n$ different symbols and $\sigma-1$ ways each can be substituted into a different symbol, so $n(\sigma-1)$ results.
two substitutions
There are $\binom{n}{2}$ different pairs and $(\sigma-1)^2$ ways to modify each: $\binom{n}{2}(\sigma-1)^2$ results.
three substitutions
There are $\binom{n}{3}$ different trios and $(\sigma-1)^3$ ways to modify each: $\binom{n}{3}(\sigma-1)^3$.
one deletion, one insertion, no substitutions
For this case, we can generalize this solution for $\sigma=2$ to any $\sigma$, using the same logic to avoid double-counting those instances where two substitutions would yield the same result as one deletion and one insertion.
Let's count the cases where the insertion is to the left of the deletion and then multiply by 2. The combined effect of the insertion and the deletion is to shift all bits between them to the right while replacing the first one and removing the last one. This result can also be achieved by at most substitutions, so we need >2. Inserting within a run of s has the same effect as inserting at the end of the run. Thus we can count all insertions with different effects once by always inserting the bit complementary to the one to the right of the insertion. Similarly, a deletion within a run has the same effect as a deletion at the start of the run, so we should only count deletions that follow a change between 0 and 1. That gives us an initial count of:
$2\cdot\frac12\sum_{k=3}^n(n+1-k)=\sum_{k=1}^{n-2}k=\frac{(n-1)(n-2)}2\;$
Because the tricky logic to prevent double-counting carries directly over, the only modification required is to substitute a variable $\sigma$ for the fixed $\sigma=2$:
$2\cdot\frac{1}{\sigma}\sum_{k=3}^n(n+1-k)=2\cdot\frac{1}{\sigma}\sum_{k=1}^{n-2}k=\frac{(n-1)(n-2)}{\sigma}\;$
The overcount of results that have already been tallied as two substitutions can be calculated as follows when $\sigma=2$:
If there are no further changes in the shifted bits other than the
one preceding the deletion, then only the bits next to the insertion
and deletion change, and we can achieve that with 2 substitutions, so
we have to subtract
$\sum_{k=3}^n\left(\frac12\right)^{k-2}(n+1-k)=\sum_{k=1}^{n-2}\left(\frac12\right)^{n-k-1}k=n-3+2^{-(n-2)}\;$
Again, our only modification is to substitute $\sigma$ for 2:
$\sum_{k=3}^n\left(\frac1{\sigma}\right)^{k-2}(n+1-k)=\sum_{k=1}^{n-2}\left(\frac1{\sigma}\right)^{n-k-1}k=n-3+{\sigma}^{-(n-2)}\;$
Also, if the entire range of shifted bits consists of alternating
zeros and ones, then swapping the insertion and the deletion yields
the same effect, so in this case we were double-counting and need to
subtract
$\sum_{k=3}^n\left(\frac12\right)^{k-1}(n+1-k)\;$
Swapping in $\sigma$ a final time yields:
$\sum_{k=3}^n\left(\frac1{\sigma}\right)^{k-1}(n+1-k)\;$
These two overcounts (which, alas, cannot be combined as cleanly as when the symbols are binary) are then subtracted from the initial count of deletion/insertion operations to yield the overall results produced by this case, but not by case 3 above:
$\frac{(n-1)(n-2)}{\sigma}\ - \left(n-3+{\sigma}^{-(n-2)}\right) - \sum_{k=3}^n\left(\frac1{\sigma}\right)^{k-1}(n+1-k)\;$
- one deletion, one insertion, one substitution
That same calculation carries over to the final case. Here, however, each combination of one deletion and one insertion--likewise discounted to avoid double-counting the triple substitutions already tallied in case 4 above--is accompanied by a third edit: a substitution involving one of the $n-1$ original symbols remaining after the deletion. Since each of these $(n-1)$ symbols admits $(\sigma-1)$ novel substitutions, the total count for the sixth and final case becomes:
$\left(\frac{(n-1)(n-2)}{\sigma}\ - \left(n-3+{\sigma}^{-(n-2)}\right) - \sum_{k=3}^n\left(\frac1{\sigma}\right)^{k-1}(n+1-k)\right)(n-1)(\sigma-1);$
Summing the (previously uncounted) results produced by each of these six cases should yield the expected count when the length of the string remains unchanged. It's ugly (perhaps unnecessarily), but I hope correct.