Sunday, May 6, 2018

Optimal solution to find out distinct pairs of integers in a SET that sum up to an integer

Suppose we have 2 Sets {1, 1, 2, 4,  4, 5}, {1, 12, 3, 4, 3, 3}  of  non-negative integers and a summation is 6.

Now we want to compute the distinct pair of elements from these Sets which summation is 6.

From 1st set, we found {1, 5}, {2, 4}. So the count is 2.

From 2nd set, we found {3, 3}, Count is 1.


Now the problem is what's the best solution for this computation where arbitrary number of Integer elements can be present in the Set. So we've to analysis data structure and algorithm for this solution.

1. The easiest and simplest way is Brute-Force algorithm. Outer Loop  and Inner Loop to check one number with every other number  find out the pairs. In this case, Run-time would be O(n^2). Therefore, If set size is n=200, then run-time goes to approximately 40000!!! So this is not acceptable algorithm.

2. If we sort the elements in the Set using quick sort and use binary search, then run-time would be O(nlogn). The details of algorithm can be like this.

  • Sort the array using quick sort.
  • Loop through the array while data<=sum
    • Use binary search to find out the difference i.e. sum-data in array.
    • if sum-data found then  addPairList (data, sum-data).
  • End Loop 
  • Return the count of addPairList.

This exposition of the problem having run-time of approximately O(nlogn) can be acceptable. However the solution can be even more optimized.

3. If we can find out the pairs using just ONE loop (i.e. linear search) and avoid sorting, then run-time would be O(n). And this would be most optimized solution. To achieve this, we may use Hashing data structure. The algorithm details are given below.


  • Set countPairs = 0 and initialize empty HashMap.
  • Loop through the array i.e. for each data in Array.
    • Check HashMap contains the key of difference for current data i.e. sum-data and it's value is 0. Because if value is greater than 0, we can assume that this data already has been used in a pair.
      • If the condition is TRUE then 
        • Set countPairs = countPairs + 1.
        • Increment 1 into HashMap value with key sum-data.
      •  Else If HashMap is NOT contains key with current data then
        • Put 0 into HashMap value with key data.
      • End If
  • End Loop
  • Return countPairs.

Using this approach, we can achieve run-time O(n). And this is far better than previous two solutions. The implementation of this technique is illustrated below with Java.

       

            public static int countPairs(int[] dataSetArray, int sum) {

  Map mapNumbers = new HashMap();
  int countPairs = 0;
  for (int data : dataSetArray) {
   if (mapNumbers.containsKey(sum - data) && mapNumbers.get(sum - data) == 0) {
    countPairs++;
    mapNumbers.put((sum - data), mapNumbers.get(sum - data) + 1);
   } else if (!mapNumbers.containsKey(data)) {
    mapNumbers.put(data, 0);
   }
  }

  return countPairs;
 }