Proving optimality of a dynamic programming algorithm

Question

We have a string $s$ containing $n \leq 100$ bits. The move we can make on it is erasing from $s$ some substring $x$, but only if $x$ is directly preceded by $x^R$, where $x^R$ means string $x$ reversed. In other words, we're choosing some even-length palindrome and erasing it's "right half".

The goal is to minimize the length of the word left when no further can moves be done. In example, from a bit word $100110$ we can erase $01$ from the middle, and be left with $1010$ where no moves can be done. However, starting differently, we can reduce the word to a single $10$, which is optimal. I want to compute this minimal length for a given string, for $100110$ it would be $2$.

Now, I have a dynamic-programming algorithm which I know produces right answers (it has been tested a lot), but don't know how to prove it. It computes the best result for every substring of $s$ in increasing length. For those of length $k$, the answer is surely at most $k$. Also, it's obvious that the only strings in which no moves can be done are of form $1010...$ or $0101...$. For some word we're playing on, as it's first bit can never be erased, we know the exact form of a word that will be left from it, the only thing that we lack is it's length. Now, I assumed that for a word $t$, there exist some words $x$, $y$ that $t = xy$, and if we do the optimal operations just on $x$, producing a word $x'$, then just on $y$ producing $y'$, and in the end on the word $x' y'$, we will acquire the optimal result for $t$. The last step is very easy, because we know how words $x'$ and $y'$ look like only knowing it's lengths, so there are only a couple of cases to consider. The complexity of this process is $O(n^3)$ as it spends linear time on each substring of $s$.

If $DP[i][j]$ is the answer for a substring $[i..j]$, the complete recurrence is as follows:

$$DP[i][j] = \min \{DP[i][k] + F(i,k,j) \cdot DP[k+1][j] : i \leq k < j \}.$$

where

$$ F(i,k,j) = \left\{ \begin{array}{ll} 1 & \textrm{for $s[i] = s[k+1]$ and $DP[i][k] = 1$}\\ 1 - DP[i][k] \mod 2 & \textrm{for $s[i] = s[k+1]$ and $DP[i][k] > 1$}\\ DP[i][k] \mod 2 & \textrm{otherwise} \end{array} \right. $$

what corresponds to the fact that in some cases $y'$ can be erased using $x'$ (possibly in multiple moves) - in those cases $F(i,k,j)$ is $0$, and it's $1$ otherwise. Also note, that if $s[i] = s[i+1]$ then when dealing with $DP[i][j]$ for $j > i$ at first we initialize $DP[i][j] = DP[i+1][j]$.

For me, the assumption that $t$ can be split into $x$ and $y$ in a such a way that the mentioned procedure is optimal, is non-trivial. I'm asking for a proof that this assumption can be made, or some different solutions for this problem, either would be nice.

score 2 · Answer 1 · edited Apr 13 '17 at 12:48

Since you're asking how to construct a proof of correctness, I'll give you some tips to get you started. If you do all of these, I think you'll be able to make a lot more progress.

As Raphael suggests, make that you can write a recurrence relation for the solution. You don't yet have a recurrence. A recurrence is an equation for $DP[i][j]$ that is written solely in terms of $i,j,s$ and other "earlier" values of $DP[\cdot][\cdot]$. Your equation involves a variable $k$, but what is $k$? Where'd that come from? It comes out of the blue. How do we choose $k$? You haven't yet given us a formula that computes $DP[i][j]$. Until you do that, you can't possibly hope to prove your algorithm correct. In fact, until you do that, you don't even have an algorithm!

Hint: I think you'll find you want to write something like

$$DP[i][j] = \min \{...\text{something}... : k=i-1,i,i+1,\ldots,j-1\}.$$

You fill in the $\text{something}$ part. In short, I recommend that you formulate your recurrence more accurately.

Hint: try writing out the pseudocode for your dynamic programming algorithm, in standard form. This will force you to write a recurrence for $DP[i][j]$.
Once you have a recurrence, your next step is to prove that it is correct. Why does it give the correct answer for your problem? Justify why the recurrence is correct. Here you might use proof by induction.

See Raphael's answer, which gives an excellent overview for how to prove a dynamic programming algorithm correct.

I recommend that you review the proof of correctness for a few other dynamic programming algorithms. (Look in a few standard algorithms textbooks; with any luck, they should show you several examples.) You'll see that they have a similar structure, and this should help you structure your proof.
One more tip that will be very helpful. You write "I assume (...)". You might try proving that the "(...)" statement is true.

Proving optimality of a dynamic programming algorithm

1 Answers1