Algorithm to select sets of objects while maximizing number of objects covered

by Zohaib   Last Updated January 12, 2018 16:05 PM

If we have different objects,

[A1, A2, A3, B1, B2, B3, B4, B5]

Some calculations will be performed to find compatible objects. For example, lets assume following 3 sets were formed and every set contains compatible objects:

  1. {A1, B2}
  2. {A3, B2}
  3. {A1, A3, B4, B5}

Now we need to perform filtering. One node can participate only once. For example, if B2 has made a pair with A1, then B2 can not participate in any other set. Which means, if we select set 1, then set 2 will be deleted because B2 has already participated. And set 3 will be deleted because A1 has already participated.

Now we need to do filtering such that we maximize number of objects being utilized. If we select set 1, then we are only utilizing A1 and B2.
Thus, optimal way would be to select set 3 which is utilizing 4 objects.

Right now, i have a complex function which goes through list of sets recursively and keeps on adjusting until no changes are made to existing sets. This is not only inefficient, but it might not work in cases where changing sets can cause ripple effect.

I am not looking for coded solution, but just a guidance that is there any graph algorithm which I should study?



Answers 2


Your problem seems like a combination of maximum coverage and set packing problems. So my guess is that algorithms for them should be helpful. A greedy heuristic would be to choose the set that contains the largest number of uncovered elements over the conflicts with other sets.Another approach would be to use the integer linear program formulation of max coverage and add a constraint where you only accept solutions forming a set packing.

ichantz
ichantz
January 12, 2018 16:17 PM

I can't think of any algorithm that computes directly what you want without searching the solution space.

However, you want better performance than your complex recursive algorithm.

I think that can be had by using a non-recursive approach.

Data structures you already have:

  1. a Map, MatchesToMembers, taking us from Match # to Set of Member #s

Let's add to that a complementary structure for reverse lookup:

  1. a Map, MembersToMatches, taking us from Member # to Match #'s, initially empty

And a data structure of possible match grouping candidates, which will be

  1. a Map, Candidates, taking us from Candidate # to Boolean,
    • initially all true
    • sized of 2^(# Matches)

Let's populate this new data structure (#2):

for each Match# as Key in MatchesToMembers
    for each Member# in MatchesToMembers[Match#]
        Update MembersToMatches[Member#] to include Match#        
    next
next

We now have (#2) MembersToMatches as follows:

A1: S1, S3
A3: S2, S3
B2: S1, S2
B4: S3    
B5: S3

Let's populate the Candidate data structure (#3) from the other (#2):

for each MatchSet as Value in MembersToMatches
    if MatchSet has more than one member then
        Assemble a Candidate mask from the set of Match#'s
            by accumlating (1<<Match#) for each Match# in the MatchSet
        Reject all Candidates at that mask

And (#3) Candidates as follows

0 true  (the empty candidate)
1 true  (candidate having match#1 only)
2 true  (candidate having match#2 only)
3 false (candidate having match1&2)
4 true  (candidate having match#3 only)
5 false (candidate having match#1&3)
6 false (candidate having match#2&3)
7 false (candidate having all matches)

Hopefully you can see that we're using a bit values of the solution space for the Key in the Candiates map. E.g. Candidate #5 (it's index/Key) in binary is 101, meaning consider the solution of taking Set 3 and Set 1 (in your 1-based set #'ering).

Now we compute the score for each non-rejected candidate, and take the max.


Rejection from candidate mask as all the elements in the set are known to they interfere with each other:

for each index as Key in Candidate 
    if  index & CandidateIndexMask == CandidateIndexMask then
       Candidate[index] = false
    endif
next

Max computation

for index as Key in Candidate
    if Candidate[index] then
        compute score for Candidate
        if larger than before, then keep it
    endif
next

if your Candidate#'s are small (e.g. less than 32), then you can do this with a simple for (int index = 0; index < count; index++) { }, however, for a larger set you'd need a bit vector for the index variable (and the ability to "increment" a bit vector).


Also, these algorithms & data structures can be used with bit vectors instead of individual elements or sets or other. if we work on bit vectors we can work on chunks of 64-bits at at time reducing the iterations by a large factor.

Erik Eidt
Erik Eidt
January 13, 2018 21:02 PM

Related Questions




Heuristic Approach for Flexible DIFF Implementation

Updated March 20, 2017 19:05 PM