Supposed we have observed a dataset comprised of $15027$ events with one attribute variable
in this file: dataFull.csv. Plotting these points would yield a histogram like the following plot,
Now our goal is to form a hypothesis about this dataset, that is, a hypothesis about the distribution of the events in the above plot.
To help you get started, we can first take the logarithm of this dataset to better understand the distribution of the attribute of the dataset and plot the transformed data,
Just by looking at the observed (red) distribution, we can form a relatively good hypothesis about the distribution of the data: This dataset is likely very well fit by a lognormal distribution, that is, the logtransform of data is very well fit by a Normal distribution.
Now, use the maximum likelihood method to infer the two unknown parameters of the corresponding Normal distribution that best fits the logtransformed data.
Hint:
 First read the data using Pandas library, then logtransform data to make it look like a Normal distribution.

Write a class that takes the logdata as input and has two methods,
getLogProb(data,avg,std)
andgetLogLike(param)
. The former computes the logprobability of observing the input datasetdata
given the parameters of the model (the Normal averageavg
and the Normal standard deviationstd
). The latter method takes a set of parameters as a vector containing the average of the Normal distribution (avg
) and the naturallogarithm of the standard deviation of the Normal distributionlog(std)
. Given these two parameters,getLogLike(param)
sums over the logprobabilities returned bygetLogProb(data,avg,std)
to compute the loglikelihood and returns it as the output.  You can use
scipy.optimize.fmin
to perform the maximization of loglikelihood to obtain the bestfit parameters. Once done with the minimization (of negative loglikelihood), report the bestfit parameters on the display.