Problem

Suppose that you wanted to generate points whose distribution follows the blue curve in the following curve, whose mathematical formulation is known (in the case here, the function is just the sum of two Gaussian distributions),

The equation describing the above curve is the following,

\[f(x|\mu_1,\mu_2,\sigma_1,\sigma_2) = \frac{1}{\sigma_1\sqrt{2\pi}}\exp\bigg( -\frac{(x-\mu_1)^2}{2\sigma_1^2} \bigg) + \frac{1}{\sigma_2\sqrt{2\pi}}\exp\bigg( -\frac{(x-\mu_2)^2}{2\sigma_2^2} \bigg) ~,\]

with the means of the two Gaussian distributions being $[\mu_1,\mu_2]=[-2, 2]$, and standard deviations being $[\sigma_1,\sigma_2]=[1, 2]$. Now, one way of doing this is to draw a box around this curve, such that the box encompasses the entire curve.

Then, we draw random points from this square and keep only those points that fall beneath this blue curve, like the red points in the following animation.

As you see, the base of this square has a width of 20, and its height is 0.5.

Now, if you plot the histogram of these points, you will see that the distribution of the red points follows closely the blue curve that we wanted to sample.

Now, given the above description, write a script or function that performs this Monte Carlo rejection sampling and computes the area under the two Gaussian distributions. What do you think the result of this integration should be asymptotically?

Solution

Python

Here is an example Python script that computes the integral.

def integrateStdNormPDF(nsim):
    import numpy as np
    from numpy.random import random as unifrnd
    from scipy.stats import norm
    rectangleWidth = 10.
    rectangleHeight = 0.5
    X = (unifrnd(nsim)-0.5) * 2 * rectangleWidth
    Y = rectangleHeight * unifrnd(nsim)
    return 2 * rectangleWidth * rectangleHeight * sum( np.less_equal( Y , norm.pdf(X,-2,1) + norm.pdf(X,2,2) ) ) / nsim
integrateStdNormPDF(100)
2.2
integrateStdNormPDF(1000)
2.05
integrateStdNormPDF(10000)
2.097
integrateStdNormPDF(100000)
2.009
integrateStdNormPDF(100000)
2.001

The number is approaching 2 as it should be, since the two Gaussian distributions are probability density functions each with an area of 1.

Comments