Saturday, September 21, 2013

Median and Order statistics

Finding kth median in linear time

The idea is to partition the array we do in quick sort.  Assuming a 0 based index the item that you are looking for is positioned at k-1  if the array is ordered. Here is how we can find that element. First partition the array using a pivot (the pivot partitions the array into two parts; the elements on the left are smaller than the pivot element and elements on the right are greater than the pivot element).  Lets say the pivot index is i. if i equals k-1 then it is your median. Otherwise check if k-1 is less than  i. If it is  then repeat the same process for elements from 0 to i - 1 else repeat the process for elements from i + 1 to the end of partition that you are working on. As we chose the pivot randomly this algorithm would take expected linear time.

Here is the code that implements the above algorithm.

int find_kth_median(int x[],int start, int end, int k) {

 int pivot =  start + ((float)rand() - 1)/INT_MAX * (end - start +1);
 int a = x[pivot];

 int up = end, down = start;
 while(down < up) {

  while(x[down] <= a && down <= end)
   down++;
  while(x[up] > a)
   up--;
  if(down < up) {
   std::swap(x[up], x[down]);
  }
 }

 x[start] = x[up];
 x[up] = a;

 if (up == k-1)
  return x[up];
 else if (up < k-1) 
  return find_kth_median(x,up+1, end, k);
 else
  return find_kth_median(x, start, up -1, k);
}

Here are some of the questions that are similar to the above question:

Given an algorithm that finds median in linear time. How would you find kth element ?

Use the technique above. The idea is simple: find the median of the whole array ( the array will be partitioned around the media). Say the median index is i. Then if k is greater than i repeat the process on the right side of the array else repeat on the left side. Continue until you find the median.

Given a set of elements divide the set into k equal parts

The idea is to find the median of the whole set and then find median of the set at the left and right of the median and then continue so on. This requires O (n log k).