Problem

Consider this dataset, Drand.mat, which contains a set of random numbers. Let’s make a hypothesis with regards to this dataset: We assume that this dataset is well fit by a Gaussian distribution. But, we don’t know the values of the two parameters (mean and standard deviation) of this Normal (Gaussian) distribution.

Write a script that constructs a mathematical objective function and then use an optimization algorithm of your choice to find the most likely values of the mean and standard deviation of this Gaussian distribution. Here is a best-fit Gaussian distribution using the most likely parameters to the histogram of this dataset.


MATLAB

Name your main script findBestFitParameters.m. Now when you run your script it should call fminsearch() and then output the best-fit parameters like the following,

>> findBestFitParameters
mu: -0.082001 , sigma: 1.0043

Start your parameter search via fminsearch() with the following values: $[\mu,\sigma] = [1,10]$.

Python

Name your main script findBestFitParameters.py. Here is an example expected output of such script,

findBestFitParameters
Optimization terminated successfully.
     Current function value: 142.326191
     Iterations: 50
     Function evaluations: 98
mean = -0.08200050180312446, standard-deviation = 1.0043358235352169

Start your parameter search via fmin() with the following values: $[\mu,\sigma] = [1,10]$.

Solution

MATLAB

Here is an implementation of this code: findBestFitParameters.m,

clear all;
close all;
load('Drand.mat');
global data
data = Drand;
Parameters = fminsearch(@getLogProbNorm,[1,10]);
disp( ['mu: ', num2str(Parameters(1)), ' , sigma: ', num2str(Parameters(2))] );

and here is an implementation of the function: getLogProbNorm.m,

function logProbNorm = getLogProbNorm(Param)
    global data
    mu = Param(1);
    sigma = Param(2);
    logProbNorm = - sum( log( normpdf(data,mu,sigma) ) );
end
Python

Here is an implementation of this code: findBestFitParameters.py,

#!python
#!/usr/bin/env python
from scipy.io import loadmat
import numpy as np
from scipy.stats import norm
from scipy.optimize import fmin


# load MATLAB data file
Drand = loadmat("Drand.mat")
Data  = Drand["Drand"]


import matplotlib.pyplot as plt
fig = plt.figure( figsize=(9, 8) \
                , dpi= 300 \
                , facecolor='w' \
                , edgecolor='w' \
                ) # create figure object
ax = fig.add_subplot(1,1,1) # Get the axes instance

plt.hist(Data)

plt.show()

# find the parameters of Gaussian distribution

def getNegLogProbNorm(Param):
    avg = Param[0];
    std = Param[1];
    getNegLogProbNorm = - np.sum( np.log( norm.pdf(x = Data, loc = avg, scale = std) ) );
    return getNegLogProbNorm

Parameters = fmin   ( func = getNegLogProbNorm  \
                    , x0 = np.array([1,10])     \
                    )
print( "mean = {}, standard-deviation = {}".format(Parameters[0],Parameters[1]) )
Optimization terminated successfully.
     Current function value: 142.326191
     Iterations: 50
     Function evaluations: 98
mean = -0.08200050180312446, standard-deviation = 1.0043358235352169

Comments