Consider this dataset, usaTemperatureHistory.txt, which contains the global land temperature of the earth from 1743 to 2013 at every monthly resolution. As stated in the file, temperatures are in Celsius and reported as anomalies relative to the average global land temperature of the Earth between Jan 1951 to Dec 1980.

First, plot the data contained in the third column of this file (the Monthly Anomaly column starting with $-1.131$) against the year and month of the corresponding row in the file. Note that the year and month of the temperature anomaly are given separately in the first two columns. Therefore, to include all of the months of a given year correctly and consistently on a single plot, you will need to define a new value for the x-axis of the plot, where $x$ represents the year plus the month of the record times $1/13$,

$x = \rm{year} + \rm{month-1} / 12 ~~,$

Note that you will have to write your code in such a way to remove the first 70 lines of text in the file to reach to the data section. The resulting plot should look something like the following,

Clearly, one can see that the global land temperature of the Earth was relatively constant up until around 1880. Afterwards, the global temperature starts to increase consistently over the entire 20th century and beyond (actually, if you look closely, you will notice a bump in the global temperature at around 1950, which can be attributed to the slow-down of the world’s economy during the second world-war). The around 1970, it seems like the pace of global temperature-increase significantly accelerated compared to the past.

Knowing this, we want to use the data after 1970, up until the most recent recorded temperature in 2013 in this file, to predict what the temperature anomaly of the Earth in the year 2050 will be. By looking at the above plot, it seems plausible that a simple linear regression will give us a good first approximation of the temperature anomaly in 2050.

(a) Using the maximum likelihood approach, write a script that fits a line to the temperature data used above from the year 1970 to 2013 and find the parameters of this fit. Then use this fitted line to predict the temperature anomaly of the Earth in 2050. The result of the fitting should be something like the following,

where the red line represents the linear-fit to the temperature anomaly data from 1970-2013, and then extrapolates to the temperature anomaly to the year 2050. Interestingly, the predicted temperature-anomaly at year 2050 ($1.9158$) is very close to what the climate scientists have been talking about, for a long time.

(b) Next, we want to know the uncertainty in the our predicted value of the temperature anomaly in 2050. Use Markov Chain Monte Carlo techniques to sample the constructed maximum-likelihood objective function and compute the $1\sigma$ uncertainty range for the predicted anomaly in the year 2050.

Hint:
You can use the ParaMonte library in Python or in MATLAB to perform the Monte Carlo simulation.

Here is an implementation of this code: findBestFitParameters.m,

clear all
close all

Temp = importdata('usaTemperatureHistory.txt', ' ',70);

figure(1)
plot( Temp.data(:,1) + ( Temp.data(:,2) - 1 ) / 12.0 ...
, Temp.data(:,3) ...
, 'linewidth', 0.75 ...
);

xlabel('Year');
ylabel('Temperature (Celsius)');
title('Global Land Temperature Anomaly vs. 1951-1980 Average');
saveas(figure(1),'GlobalLandTemperatureAnomaly.png')

startRow = 2700; % corresponding to year 1970
ndata = length(Temp.data(startRow:end,1));
DataSet = zeros(ndata,2);
DataSet(:,1) = Temp.data(startRow:end,1) + Temp.data(startRow:end,2) / 13.0;
DataSet(:,2) = Temp.data(startRow:end,3);

linearFit = fitlm(DataSet(:,1),DataSet(:,2));

hold on;
XValues = [1700:0.01:2050];
YValues = linearFit.Coefficients.Estimate(1) + linearFit.Coefficients.Estimate(2) * XValues;
plot( XValues, YValues ...
, 'color','red' ...
,'linewidth',2 ...
)
hold off;


(a) Here is an implementation of this code in Python:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()

import urllib.request
import shutil
with urllib.request.urlopen(url) as response, open(filePath, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
return None

filePath = "./usaTemperatureHistory.txt"
, filePath = filePath
)

# parse the file contents

with open(filePath,"r") as file:
dates = []
anomalies = []
for line in fileLines[70:]:
year = np.double(line[0:6])
month = np.double(line[7:12])
tempDiff = np.double(line[13:22])
dates.append( year + month/13 )
anomalies.append(tempDiff)

# do MCMC simulation to estimate parameters based on the maximum likelihood approach

startRow = 2700; # corresponding to year 1970
class Anomaly():
param = None # intercept, slope, sigma
def __init__(self,dates,anomalies):
self.dates = np.double(dates)
self.anomalies = np.double(anomalies)
def predict(self,dates):
return self.param[0] + self.param[1] * dates
def getLogProb(self,dates,anomalies):
return  - 0.9189385332046727 - self.param[2] \
- .5 * ( (anomalies - self.predict(dates)) / np.exp(self.param[2]) )**2
def getLogLike(self,param):
self.param = param
return np.sum(self.getLogProb(self.dates,self.anomalies))
def getNegLogLike(self,param):
return -self.getLogLike(param)

#########################

import paramonte as pm
#pm.verify()

#########################

anomaly = Anomaly( dates = dates[startRow:]
, anomalies = anomalies[startRow:]
)
pmpd.spec.chainSize = 10000
pmpd.spec.scaleFactor = "0.1*gelman"
#pmpd.spec.outputFileName = "mcmcMaxLikelihood"
pmpd.spec.variableNameList = ["intercept", "slope", "logSigma"]
pmpd.runSampler( ndim = 3
, getLogFunc = anomaly.getLogLike
)

#########################

#mpd.markovChainList[0].plot.hist.columns = 'slope'
#pmpd.markovChainList[0].plot.hist()
#pmpd.markovChainList[0].plot.line()

#########################

# plot slope distribution and traceplot

pmpd.sampleList[0].plot.hist.columns = "slope"
pmpd.sampleList[0].plot.hist.outputFile = "slopeDistribution.png"
pmpd.sampleList[0].plot.hist()

pmpd.sampleList[0].plot.line.ycolumns = "slope"
pmpd.sampleList[0].plot.line.outputFile = "slopeTracePlot.png"
pmpd.sampleList[0].plot.line()

# plot intercept distribution and traceplot

pmpd.sampleList[0].plot.hist.columns = "intercept"
pmpd.sampleList[0].plot.hist.outputFile = "interceptDistribution.png"
pmpd.sampleList[0].plot.hist()

pmpd.sampleList[0].plot.line.ycolumns = "intercept"
pmpd.sampleList[0].plot.line.outputFile = "interceptTracePlot.png"
pmpd.sampleList[0].plot.line()

# plot logSigma distribution and traceplot

pmpd.sampleList[0].plot.hist.columns = "logSigma"
pmpd.sampleList[0].plot.hist()

pmpd.sampleList[0].plot.line.ycolumns = "logSigma"
pmpd.sampleList[0].plot.line.outputFile = "logSigmaTracePlot.png"
pmpd.sampleList[0].plot.line()

# plot the joint distribution of the variables

pmpd.sampleList[0].plot.grid.kdecorner = None
pmpd.sampleList[0].plot.grid.outputFile = "gridPlot.png"
pmpd.sampleList[0].plot.grid()


The above code will perform a Monte Carlo simulation to estimate the parameters of the linear regression model. The maximum likelihood parameters are reported in the ParaDRAM MCMC simulation output file that is suffixed with *_report.txt. The distribution of the parameters of the model is given in the simulation output file named _sample.txt. For example, here is the histogram of the slope parameter of the model sampled by the ParaDRAM MCMC sampler,

and here is the trace plot of the sampled slopes,

Note that there are also two other parameters in the model, the intercept of the linear regression and standard deviation $\sigma$ of the Gaussian scatter of the data around the regression line. Note that we sampled the natural logarithm of this standard deviation via the MCMC sampler. So when using it, keep in mind to revert it to the natural standard deviation by exponentiating it.

and here is the joint distribution of the variables against each other,

As you can see, the two parameters slope and intercept are highly negatively correlated with each other.

(b) Now, in order to estimate the uncertainty in the temperature of the earth in the year 2050, we need to not only use the most likely set of parameters to estimate the temperature anomaly, but also the full range of all possible parameter values sampled by the ParaMonte-ParaDRAM Markov chain Monte Carlo sampler. Therefore, we will use the contents of the simulation output sample file and will pass each set of sampled parameters to our linear regression model to predict the temperature anomaly in 2050 according to those set of parameters. In the end, we will have a distribution of the predicted temperature anomalies for the year 2050, which we can use to infer the $1\sigma$ uncertainty in it.

Here is an implementation of this code in Python:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# read again the contents of the output sample file

pmpd.readSample() # this will generate the new component pmpd.sampleList[0].df

# make plot of all predicted lines

dates_range = np.linspace(1750,2051,1000) # range of the linear regression line

fig = plt.figure()
ax = fig.gca()
plt.plot(dates,anomalies) # plot the observational data

# loop over all parameters

predictedAnomalies = np.zeros(pmpd.sampleList[0].count)
for index, row in pmpd.sampleList[0].df.iterrows():

# assign the sampled parameters to the model
anomaly.param = np.double(row[1:])

# add the line to the plot
plt.plot(dates_range,anomaly.predict(dates_range),"r", alpha=0.2, linewidth = 0.05)

# predict the anomaly corresponding to the current parameter
predictedAnomalies[index] = anomaly.predict(2050)

# add the rest of the properties to the plot
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (Celsius)")
plt.title("Global Land Temperature Anomaly vs. 1951-1980 Average")
plt.savefig("lineRainbow.png")
plt.show()

# plot the distribution of the predicted anomalies

fig = plt.figure()
ax = fig.gca()
plt.hist(predictedAnomalies)
ax.set_xlabel("Predicted Anomaly in 2050 (Celsius)")
ax.set_ylabel("Count")
plt.savefig("anomalyHist.png")
plt.show()

print( "The 1-sigma uncertainty in the temperature Anomaly of 2050 is " + str(np.std(predictedAnomalies)) )

ParaDRAM - WARNING: delimiter is neither given as input nor set as a ParaDRAM object property.
ParaDRAM - WARNING: This information is essential for successful reading of the requested sample file(s).
ParaDRAM - WARNING: Proceeding with the default assumption of comma-delimited sample file contents...

ParaDRAM - NOTE: parsing file contents... done in 0.000998 seconds.
ParaDRAM - NOTE: computing sample correlation matrix... done in 0.005035 seconds.
ParaDRAM - NOTE: computing sample covariance matrix... done in 0.003987 seconds.
ParaDRAM - NOTE: computing autocorrelations... done in 0.029939 seconds.

ParaDRAM - NOTE: The processed sample file(s) are now stored as a Python list in
ParaDRAM - NOTE: For example, to access the contents of the first (or the only) sample file, try:
ParaDRAM - NOTE: To access plots, try:
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS>