How can I model this optimization problem?

Question

We're looking to model the following problem as a standard optimization problem (or even a non-standard one). We can come close, but nothing seems to fit exactly. We have a working algorithm coded, but the performance is unacceptable.

Problem statement

The problem is set up in two parts. First, there are fixed data:

a set of objects S.
for each object o in S, a subset of objects in S that are "compatible" with o. Being compatible is guaranteed to be symmetric. Typically, each object is compatible with 90% or more of the objects in S.

Then there are query-specific data:

a non-negative weight for each o in S.
an integer bound m.
an "available" flag for each o in S. It's guaranteed that at least m objects are available.

The problem is to find a subset A of S such that the sum of the weights of the objects in A is maximized, subject to the following constraints:

A contains at most m objects.
all objects in A are available.
all objects in A are compatible with one another.

If more than one such subset exists, find any one.

Discussion

We think we can dispense with the availability flags by setting the weights of unavailable objects to something negative. Removing an unavailable object from A will then always improve A.

We can preprocess the fixed data if that will help. (In fact, we do this for our current algorithm.) The query-specific data are not correlated from one query to the next, so no preprocessing is possible there.

Typically, the size of S is 800 or so, while m is on the order of 5-10. (The 800 might grow significantly in the future.) However, the number of available objects can be as small as m (although typically larger).

Current status

We have implemented a solution based on sorting the available objects by weight and using a greedy algorithm with backtracking to add one object at a time to A, removing incompatible objects from consideration at each step. A backtracking branch is pruned when no possibility of beating the current best A is possible. The problem is that when there are a large number of available objects, there is excessive backtracking (or, rather, not a lot of pruning). We're working on optimizing the current code, but we're also looking for a better algorithmic approach.

Possibilities

We've looked at several standard combinatorial problems, but we haven't found one where we can fully map our problem. We can model the compatibility data as a graph with vertices being the objects and edges connecting incompatible objects. Then our problem becomes very similar to the maximum-weight independent set problem, except that problem doesn't bound the size of the independent set. We also looked at the weighted version of the maximum coverage problem, but that doesn't provide for the compatibility data.

Could this be formulated as a constrained linear programming problem? (I'm pretty sure the answer is "yes," but how?) Would that provide the best chance for improving our implementation?

We would also be interested in approximate algorithms provided the approximation was fairly tight. From what we've read about the standard problems, what we are trying to do is pretty hard (probably NP-hard) and not amenable to approximation. But might as well ask the community....

score 2 · Accepted Answer · answered Dec 13 '23 at 06:48

If we ignore the upper bound $m$, this is the problem of finding a maximum weight clique in a weighted graph. Delete all unavailable objects, and create a graph, where each object is a vertex, and you draw an edge between each pair of compatible objects. Then you want a clique where the total weight of the vertices in the clique is maximized.

Consequently, the problem is NP-hard, and also hard to approximate.

A pragmatic approach would be to formulate this as an instance of integer linear programming, and apply an off-the-shelf ILP solver. Let $x_o$ be a 0-or-1 variable, one per object, with the intended meaning that $x_o=1$ means you include object $o$ in the set $S$. Add the constraint $\sum_o x_o \le m$ to ensure you select at most $m$ objects. Also add the constraint that this solution give you a clique: for each non-edge $(o,p)$ in the graph (each pair of incompatible objects), add the constraint $x_o + x_p \le 1$. Finally, maximize $\sum_o \text{weight}(o) \, x_o$. This will automatically provide backtracking and pruning, using the logic built into the ILP solver.

How can I model this optimization problem?

1 Answers1