We're looking to model the following problem as a standard optimization problem (or even a non-standard one). We can come close, but nothing seems to fit exactly. We have a working algorithm coded, but the performance is unacceptable.
Problem statement
The problem is set up in two parts. First, there are fixed data:
- a set of objects
S. - for each object
oinS, a subset of objects inSthat are "compatible" witho. Being compatible is guaranteed to be symmetric. Typically, each object is compatible with 90% or more of the objects inS.
Then there are query-specific data:
- a non-negative weight for each
oinS. - an integer bound
m. - an "available" flag for each
oinS. It's guaranteed that at leastmobjects are available.
The problem is to find a subset A of S such that the sum of the weights of the objects in A is maximized, subject to the following constraints:
Acontains at mostmobjects.- all objects in
Aare available. - all objects in
Aare compatible with one another.
If more than one such subset exists, find any one.
Discussion
We think we can dispense with the availability flags by setting the weights of unavailable objects to something negative. Removing an unavailable object from A will then always improve A.
We can preprocess the fixed data if that will help. (In fact, we do this for our current algorithm.) The query-specific data are not correlated from one query to the next, so no preprocessing is possible there.
Typically, the size of S is 800 or so, while m is on the order of 5-10. (The 800 might grow significantly in the future.) However, the number of available objects can be as small as m (although typically larger).
Current status
We have implemented a solution based on sorting the available objects by weight and using a greedy algorithm with backtracking to add one object at a time to A, removing incompatible objects from consideration at each step. A backtracking branch is pruned when no possibility of beating the current best A is possible. The problem is that when there are a large number of available objects, there is excessive backtracking (or, rather, not a lot of pruning). We're working on optimizing the current code, but we're also looking for a better algorithmic approach.
Possibilities
We've looked at several standard combinatorial problems, but we haven't found one where we can fully map our problem. We can model the compatibility data as a graph with vertices being the objects and edges connecting incompatible objects. Then our problem becomes very similar to the maximum-weight independent set problem, except that problem doesn't bound the size of the independent set. We also looked at the weighted version of the maximum coverage problem, but that doesn't provide for the compatibility data.
Could this be formulated as a constrained linear programming problem? (I'm pretty sure the answer is "yes," but how?) Would that provide the best chance for improving our implementation?
We would also be interested in approximate algorithms provided the approximation was fairly tight. From what we've read about the standard problems, what we are trying to do is pretty hard (probably NP-hard) and not amenable to approximation. But might as well ask the community....