2

I was reading about the Klee-Minty cubes. First of all, I am not sure if there is a typo, namely, that the constraints should be $$2^Dx_1 +\dots + 2^1x_D + 2^0x_D \le 5^D$$ instead of $$2^Dx_1 + \dots + 2^2x_D + 2^0x_D \le 5^D.$$ More importantly, I wanted to understand the logic behind the cubes. I know it is an example where the simplex algorithm (with the most negative reduced cost pivot rule) visits all the vertices before arriving to the optimal one, even though it could get to the optimal vertex in just one step. However, I wanted to know how to come up with the Klee-Minty cubes. In other words, if someone wanted to create a simple set where the simplex algorithm visits all the vertices before arriving to the optimal one, how would this come up.

edamondo
  • 1,691
  • Your $2^1X_D$ should be $2^1X_{D-1}$, shouldn't it? – MJD Mar 14 '25 at 17:03
  • Also according to https://glossary.cs.uwlax.edu/wiki/Klee-Minty_polytope the $2^2$ is not an error. I found this by glancing at the Wikipedia bibliography. – MJD Mar 14 '25 at 17:06
  • And the original paper of Klee and Minty is available from the Internet Archive: https://archive.org/details/inequalities0000oved/page/158/mode/2up – MJD Mar 14 '25 at 17:23
  • See the answer here. – Jean Marie Mar 14 '25 at 18:50
  • A nice source that explains them very well is this: https://link.springer.com/content/pdf/10.1007%2F978-0-387-74759-0_333.pdf (starting at page 1891). Saying a bit more about the structure of the cube: If we label the vertices with binary numbers (like the vertices of $[0,1]^{k}$ except the cube may be perturbed) then the labels that the simplex algorithm encounters is a Gray code (https://en.wikipedia.org/wiki/Gray_code). – Matthew Spam Mar 24 '25 at 14:15

1 Answers1

1

Here is a four-part story of how to construct a Klee–Minty-like cube.

Part I: A sequence of vertices

Let's first begin with the most normal cube in $\mathbb R^d$: the cube $[0,1]^d$. This cube has $2^d$ vertices. If we wanted the simplex method to visit all of them, how could it even do it?

In other words, we need to find a Hamiltonian path in the $d$-dimensional hypercube graph.

There are many such paths, but one of the nicest mathematically is given by the Gray code. Here, the vertices $\{0,1\}^d$ are put in an order that can be defined in two equivalent ways:

  1. It is the path that starts at $(0,0,\dots,0)$ and, at each step, changes the leftmost coordinate that would take us to an as-yet-unvisited vertex.
  2. When $d=1$, it is the path from $(0)$ to $(1)$. Once the $d$-dimensional path is defined (starting at $(0,0,\dots,0)$ and ending at $(0,0,\dots,1)$) we obtain the $(d+1)$-dimensional path by following the $d$-dimensional path on $\{0,1\}^d \times \{0\}$, then going from $(0,0,\dots,1,0)$ to $(0,0,\dots,1,1)$, then following the $d$-dimensional path in reverse on $\{0,1\}^d \times \{1\}$.

For example, the $3$-dimensional Gray code path (writing $(x,y,z)$ as $xyz$ to save space) is $$000 \to 100 \to 110 \to 010 \to 011 \to 111 \to 101 \to 001.$$

We still haven't figured out a reason why the simplex method would follow this path, but it's at least possible for it to do so, in the sense that these are all steps that change exactly one variable.

Part II: An objective function

To get the simplex method to follow the Gray code path, we need to give it an objective function that will increase along it. This is impossible for the cube $[0,1]^d$, actually! There's no linear function of the coordinates that will prefer $(1,0)$ to $(0,0)$ but prefer $(0,1)$ to $(1,1)$. So we will need to distort the cube slightly to make an objective function possible.

For our purposes, a distorted cube is something that, instead of being defined by the $2d$ inequalities $0 \le x_1 \le 1$ through $0 \le x_d \le 1$, is defined by $2d$ inequalities that simply intersect in the same way.

We will distort the cube recursively, keeping in mind characterization 2 of the Gray code (see Part I). Suppose we have a "distorted version" of $[0,1]^d$, where $f(x_1, \dots, x_d) = c_1 x_1 + \dots + c_d x_d$ increases along the Gray code path; we may as well assume that $0 \le f(x_1, \dots, x_d) \le 1$ for all vertices of the distorted $d$-cube. We would like the objective function to keep the same ordering on the distorted version of $\{0,1\}^d \times \{0\}$, but to reverse the ordering on the distorted version of $\{0,1\}^d \times \{1\}$, so we have to bring in $x_{d+1}$ somehow. What we can do is:

  • Replace the inequality $x_{d+1} \ge 0$ by $x_{d+1} \ge 0.1 f(x_1, \dots, x_d)$, and the objective function by simply $x_{d+1}$. When this inequality is tight, following the Gray code path in the distorted $d$-cube increases $f$, so it increases $x_{d+1}$.
  • Replace the inequality $x_{d+1} \le 1$ by $x_{d+1} \le 1 - 0.1 f(x_1, \dots, x_d)$. When this inequality is tight, following the Gray code path in the distorted $d$-cube in reverse decreases $f$, so once again it increases $x_{d+1}$.

Also, the middle step where we go from the lower bound on $x_{d+1}$ to the upper bound increases the objective function $x_{d+1}$: it goes from being at most $0.1$ to being at least $1 - 0.1$. And the new objective function is also bounded between $0$ and $1$.

I gave a recursive description of this cube, but we can simply write out the $2d$ inequalities we used: \begin{align} 0 &\le x_1 \le 1 \\ 0.1x_1 &\le x_2 \le 1 - 0.1x_1 \\ 0.1x_2 &\le x_3 \le 1 - 0.1x_2 \\ &\dots \\ 0.1x_{d-1} &\le x_d \le 1 - 0.1x_{d-1} \end{align} This mildly distorted cube still has $2d$ sides (that is, $(d-1)$-dimensional faces) arranged combinatorially in the shape of a $d$-cube, and still has $2^d$ vertices determined by whether we pick the lower bound or upper bound for each of $x_1, \dots, x_d$. Following the Gray code path in this distorted cube will monotonically increase $x_d$.

Part III: Cleaning up the cube

The way we constructed the cube is not ideal for the simplex method, because we messed with the nonnegativity constraints. When the constraints were $0 \le x_i \le 1$ for $i=1,\dots,d$, that was really only $d$ constraints, because the other $d$ are just nonnegativity. Now, we have $2d$ constraints.

To fix this, first subtract the LHS of each pair of inequalities from all three terms, getting: \begin{align} 0 &\le x_1 \le 1 \\ 0 &\le x_2 - 0.1x_1 \le 1 - 0.2x_1 \\ 0 &\le x_3 - 0.1x_2 \le 1 - 0.2x_2 \\ &\dots \\ 0 &\le x_d - 0.1 x_{d-1} \le 1 - 0.2x_{d-1} \end{align} Now make the substitution $u_1 = x_1$ and $u_i = x_i - 0.1 x_{i-1}$ for $i=2,\dots,d$. After this linear transformation, $d$ of our inequalities are now the nonnegativity constraints $u_1 \ge 0, \dots, u_d \ge 0$. The other $d$ inequalities could have been written as $x_i + 0.1 x_{i-1} \le 1$ for $i=2, \dots, d$ (with $x_1 \le 1$ as the first). In terms of the $u$'s, we have $$x_i = u_i + 0.1 u_{i-1} + 0.01 u_{i-2} + \dots 0.1^{i-1} u_1$$ and so our constraints have turned into: \begin{align} u_1 &\le 1 \\ 0.2 u_2 + u_1 &\le 1 \\ 0.02 u_3 + 0.2 u_2 + u_1 &\le 1 \\ \dots & \\ 2 \cdot 0.1^{d-1} u_d + \dots + 2 \cdot 0.1^{i-1} u_i + \dots + 0.2 u_2 + u_1 &\le 1 \end{align} with the objective function being to maximize $$ u_1 + 0.1 u_2 + 0.01 u_3 + \dots + 0.1^{d-1} u_d. $$

Part IV: Fooling a pivoting rule

Right now, we have a distorted $d$-cube where following the Gray code path is a thing the simplex method is allowed to do: it will increase the objective function at each step. But how do we convince the simplex method that it's a good idea?

Let $w_1, \dots, w_d$ be the slack variables for our $d$ inequalities. At each step in the simplex method, we will have $d$ basic variables and $d$ nonbasic variables; from each pair $\{u_i, w_i\}$, exactly one will be basic. From characterization 1 of the Gray code (see Part I), we would like the simplex method to follow the following rule: as entering variable, choose the first nonbasic variable from the list $u_1, w_1, u_2, w_2, \dots, u_d, w_d$ that would not return us to a previously-seen vertex.

Because the objective function increases along the Gray code path, we know that previously-seen vertices are exactly the ones with a smaller objective value. So we'd like the simplex method to follow the rule known as Bland's rule: choose the first entering variable in the list $u_1, w_1, \dots, u_d, w_d$ whose reduced cost indicates that the objective function will increase. (I'll use the convention that these are the positive reduced costs.)

The Klee–Minty cube was actually designed to fool a different rule: Dantzig's highest-cost pivoting rule. This pivoting rule takes the nonbasic variable with the highest reduced cost to be its entering variable. For randomly encountered problems, this is often reasonable, but it's easily exploitable by unit conversion. In other words:

  • if $x$ is one of our variables, and we make the substitution $y = Mx$ in the linear program, then $y$ will have $\frac1M$ of the reduced costs of $x$ at each stage, making it $M$ times less appealing to pivot on.
  • if $s$ is the slack variable for a constraint, and we multiply the constraint by $M$, then $s$ will have $\frac1M$ of the reduced costs it used to at each stage, making it $M$ times less appealing to pivot on.

The idea in both cases is that a reduced cost is a rate of change in the objective value as the entering variable increases from $0$ to its new value. When we make a substitution, the objective values at each vertex don't actually change, because we made the substitution in the objective function as well. However, the substitution $y = Mx$ means that when $y$ enters the basis, its new value is $M$ times what $x$ would have been, so the rate of change is $M$ times less.

In our case, we want to make $u_i$ and $w_i$ not just $M$ but $M^{i-1}$ times less appealing, so we substitute $y_i = M^{i-1} u_i$ and multiply the $i^{\text{th}}$ equation by $M^{i-1}$. This gives us the following:

\begin{align} y_1 &\le 1 \\ 0.2 M^2 y_2 + M y_1 &\le M \\ 0.02 M^4 y_3 + 0.2 M^3 y_2 + M^2 y_1 &\le M^2 \\ \dots & \\ 2 \cdot 0.1^{d-1} M^{2d-2} u_d + \dots + 2 \cdot 0.1^{i-1} M^{d+i-2} y_i + \dots + 0.2 M^d y_2 + M^{d-1} y_1 &\le M^{d-1} \end{align} with the objective function being to maximize $y_1 + (0.1M) y_2 + (0.1M)^2 y_3 + \dots + (0.1M)^{d-1} y_d$.

The value of $M$ should be chosen to be large enough to outweigh any naturally-occurring differences between the reduced costs. I will leave it as $M$ rather than assign it a value, but the constraints defining the canonical Klee–Minty cube are not too different in style from where we've ended up, and they have specific numbers chosen to be large enough.

Misha Lavrov
  • 159,700