Problem

Recall our discussion of linear vs. binary searching of sorted arrays for a particular value.
Suppose we a set of nbin ascending-sorted histogram bins edges = range(nbin) and a given number x whose histogram bin i we wish to find such that edges[i] <= x < edges[i + 1] assuming i < nbin - 1.
Write a function (e.g., in Python) with the following example Python interface that use linear searching method to find the correct histogram bin of x.

def getBinLinear(x, edges):
    """
    Return the index of the element of `edges` such that the condition `edges[i] <= x < edges[i + 1]` holds.
    If `x` is larger than all elements of `edges`, then the function returns `len(edges) - 1`.
    If `x` is smaller than all elements of `edges`, then the function returns `- 1`.
    """

Recall that for sorted arrays, we can also use the binary search method to compute the corresponding bin of x.
Write a new function with the following example Python interface to use the binary searching method to find the histogram bin of x.

def getBinBinary(x, edges):
    """
    Return the index of the element of `edges` such that the condition `edges[i] <= x < edges[i + 1]` holds.
    If `x` is larger than all elements of `edges`, then the function returns `len(edges) - 1`.
    If `x` is smaller than all elements of `edges`, then the function returns `- 1`.
    """

Which algorithm is faster for finding the bin of an arbitrary x value? When? Why?
Benchmark the two algorithms you have implemented for a range of incrementally-increasing edges array values and lengths.
Make a plot of the benchmark and justify your answer to the performance of the two algorithms based on your benchmark results.
Hint: To simplify the benchmark, you can fix the value of x in your benchmark to edges[-2] (the element before the last in edges).
Then, run the benchmarks for different values of nbin, for example, from 5 to 50,000 edges.
The specific suggested value of x roughly leads to the worst-case performance of the linear and binary searches.
This benchmark setting, therefore, gives you a rough idea of how fast the two methods are compared to each other.
Here is an example range of nbin values (in Python) with which you can test your algorithms,

for val in np.logspace(np.log2(5), np.log2(50000), 10, base = 2):
    nbin = np.int(val)
    edges = range(nbin)
    x = edges[-2]
    print("nbin = {}, x = {}".format(nbin, x))
nbin = 4, x = 2
nbin = 13, x = 11
nbin = 38, x = 36
nbin = 107, x = 105
nbin = 299, x = 297
nbin = 834, x = 832
nbin = 2320, x = 2318
nbin = 6457, x = 6455
nbin = 17969, x = 17967
nbin = 50000, x = 49998

Comments