Wednesday, 10 August 2016

Anirudh

There may be a bug hidden in your Binary Searches and Merge Sorts!

Your implementation of the Binary Search algorithm might be having a concealed bug in it! This Binary Search bug applies equally to Merge Sort and many other Divide and Conquer algorithms. If you have any code that implements one of these algorithms, better fix it now before it messes something up.

Here is a generally seen Binary Search code (in Java):
public static int binarySearch(int[] a, int item) {
    int low = 0;
    int high = a.length - 1;

    while(low <= high) {
        int mid = (low + high) / 2;

        if(a[mid] == item)
            return mid;        // Key found
        else if (a[mid] < key)
            low = mid + 1
        else
            high = mid - 1;
    }

    return -(low + 1);         // Key not found
}

The bug is in this line:
int mid = (low + high) / 2;    // Problem!
What the above line does is that it sets mid to the average of low and high, truncated down to the nearest integer. At a glance, nothing seems to be wrong in that, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value and the value stays negative when divided by two, causing the array to go out of bounds. In Java, it throws ArrayIndexOutOfBoundsException while in C it causes unpredictable results. This bug can manifest itself for arrays whose length (in elements) is 230 or greater (roughly a billion elements).


So now we want to fix it, right? Here are some ideas to correct the line:
int mid = low + ((high - low) / 2);

Also note that right shifting a number by one bit is equivalent to dividing it by 2. We can utilize the unsigned right shift operator (>>>) available in Java. The >>> operator lets you treat int and long as 32-bit and 64-bit unsigned integral types (Java does not provide unsigned int and unsigned long data types).
int mid = (low + high) >>> 1;

In C and C++, where you don't have the >>> operator, you can do this:
int mid = ((unsigned int)low + (unsigned int)high)) >> 1;

Note:
  • You should always check array algorithms for possible overflow.
  • Elements that are meant to store array indexes should be taken as unsigned int.
(Credit to this article on Google Research Blog for the above concepts.)

Anirudh

About the author →

Anirudh Khanna is a Computer Science student and a geek who firmly believes in the awesomeness of technology! He is a programmer and web designer, also keenly interested in the research of new and innovative ideas in Computer Science.

Subscribe to Geek Factorial via email :