I am looking for pointers to and names of computational approaches to solve a binary matrix optimization problem of:
$$ minimize: ||\mathbf{X}\mathbf{Y} - \mathbf{T}||_{L1} $$ where $\mathbf{X}$ and $\mathbf{Y}$ are binary matrices and the definition of the matrix multiplication allows for integers (is not mod-2).
- $\mathbf{X}$ is $\{0,1\}^{H \times M}$
- $\mathbf{Y}$ is $\{0,1\}^{M \times W}$
- $\mathbf{T}$ is $[\![0,N]\!]^{H \times W}$
The only known is $\mathbf{T}$. The $M$ dimension of $\mathbf{X}$, $\mathbf{Y}$ is not set and can "grow" for a better solution. Eventually, there may be a desire to find an optimal smallest size of the M dimension of $\mathbf{X}$, $\mathbf{Y}$ with a given value of the objective function.
One approach I have considered is that the dimension $M$ is set by the largest value in $\mathbf{T}$ and the corresponding elements in $\mathbf{X}$, $\mathbf{Y}$ are initialized to $1$ and then an optimization algorithm begins. In this case, would I consider a vector input to an optimization algorithm (e.g. scipy.optimize.minimize) and reshape this $1 \times (H*M+W*M)$ vector into matrices when evaluating the objective function?
I have looked at the following computational packages but am not sure of the best functions within those packages:
- YALMIP: MATLAB package for optimization and linear programming.
- Python package cvxopt specifically ilp from cvxopt.glpk
- scipy.optimize.minimize with bounds
I have also considered solving for $\textbf{X}$ (or $\textbf{Y}$) as all positive reals (constrained by some maximum) and then converting $\textbf{X}$ to a binary matrix by growing the M dimension of $\textbf{X}$ based on a "resolution of the discretization". For example, if a matrix element was found to be 0.50 and the resolution is set to 0.1 the resulting binary A matrix would have a run of 5 1s. To use this approach I would solve for the $\textbf{X}$ after an in random guess for $\textbf{Y}$ using a conventional matrix multiplication solver from numpy such as
> Y = np.random.randint(2,size=(3,4))
> Y_pinv = np.linalg.pinv(Y)
> X = np.dot(T, Y_pinv)
Yet, since $\textbf{Y}$ is not optimized the solution is far from optimal.