Consider this dataset, usaTemperatureHistory.txt, which contains the global land temperature of the earth from 1743 to 2013 at every monthly resolution. As stated in the file, temperatures are in Celsius and reported as anomalies relative to the average global land temperature of the Earth between Jan 1951 to Dec 1980.

First, plot the data contained in the third column of this file (the Monthly Anomaly column starting with $-1.131$) against the year and month of the corresponding row in the file. Note that the year and month of the temperature anomaly are given separately in the first two columns. Therefore, to include all of the months of a given year correctly and consistently on a single plot, you will need to define a new value for the x-axis of the plot, where $x$ represents the year plus the month of the record times $1/13$,

$x = \rm{year} + \rm{month} / 13 ~~,$

Note that you will have to write your code in such a way to remove the first 70 lines of text in the file to reach to the data section. The resulting plot should look something like the following,

Clearly, one can see that the global land temperature of the Earth was relatively constant up until around 1880. Afterwards, the global temperature starts to increase consistently over the entire 20th century and beyond (actually, if you look closely, you will notice a bump in the global temperature at around 1950, which can be attributed to the slow-down of the world’s economy during the second world-war). The around 1970, it seems like the pace of global temperature-increase significantly accelerated compared to the past.

Knowing this, we want to use the data after 1970, up until the most recent recorded temperature in 2013 in this file, to predict what the temperature anomaly of the Earth in the year 2050 will be. By looking at the above plot, it seems plausible that a simple linear regression will give us a good first approximation of the temperature anomaly in 2050. Write a script that fits a line to the temperature data used above from the year 1970 to 2013 and find the parameters of this fit. Then use this fitted line to predict the temperature anomaly of the Earth in 2050. The result of the fitting should be something like the following,

where the red line represents the linear-fit to the temperature anomaly data from 1970-2013, and then extrapolates to the temperature anomaly to the year 2050. Interestingly, the predicted temperature-anomaly at year 2050 ($1.9158$) is very close to what the climate scientists have been talking about, for a long time.

Here is an implementation of this code: findBestFitParameters.m,

clear all
close all

Temp = importdata('usaTemperatureHistory.txt', ' ',70);

figure(1)
plot( Temp.data(:,1) + Temp.data(:,2) / 13.0 ...
, Temp.data(:,3) ...
, 'linewidth', 0.75 ...
);

xlabel('Year');
ylabel('Temperature (Celsius)');
title('Global Land Temperature Anomaly vs. 1951-1980 Average');
saveas(figure(1),'GlobalLandTemperatureAnomaly.png')

startRow = 2700; % corresponding to year 1970
ndata = length(Temp.data(startRow:end,1));
DataSet = zeros(ndata,2);
DataSet(:,1) = Temp.data(startRow:end,1) + Temp.data(startRow:end,2) / 13.0;
DataSet(:,2) = Temp.data(startRow:end,3);

linearFit = fitlm(DataSet(:,1),DataSet(:,2));

hold on;
XValues = [1700:0.01:2050];
YValues = linearFit.Coefficients.Estimate(1) + linearFit.Coefficients.Estimate(2) * XValues;
plot( XValues, YValues ...
, 'color','red' ...
,'linewidth',2 ...
)
hold off;


Here is an implementation of this code in Python: findBestFitParameters.py,

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()

# download the file from web

def download(url,filePath):
import urllib.request
import shutil
with urllib.request.urlopen(url) as response, open(filePath, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
return None

filePath = "./usaTemperatureHistory.txt"
download( url = "https://www.cdslab.org/recipes/programming/regression-predicting-future-global-land-temperature/usaTemperatureHistory.txt"
, filePath = filePath
)

# parse the file contents

with open(filePath,"r") as file:
fileLines = file.readlines()
dates = []
anomalies = []
for line in fileLines[70:]:
year = np.double(line[0:6])
month = np.double(line[7:12])
tempDiff = np.double(line[13:22])
dates.append( year + month/13 )
anomalies.append(tempDiff)

# generate the plot

fig = plt.figure()
ax = fig.gca()
plt.plot(dates,anomalies)
ax.set_xlabel("Year")
ax.set_ylabel("Temperature (Celsius)")
plt.title("Global Land Temperature Anomaly vs. 1951-1980 Average")
plt.savefig("GlobalLandTemperatureAnomalyPython.png")
plt.show()

# do linear fitting

class Anomaly():
coef = None
def predict(self,dates):
return self.coef[0]*dates + self.coef[1]

anomaly = Anomaly()

startRow = 2700; # corresponding to year 1970
anomaly.coef = np.polyfit ( x = dates[startRow:]
, y = anomalies[startRow:]
, deg = 1
)

dates_range = np.linspace(1750,2051,1000)

# make figure

fig = plt.figure()
ax = fig.gca()
plt.plot(dates,anomalies)
plt.plot(dates_range,anomaly.predict(dates_range),"r")
ax.set_xlabel("Year")
ax.set_ylabel("Temperature (Celsius)")
plt.title("Global Land Temperature Anomaly vs. 1951-1980 Average")
plt.savefig("GlobalLandTemperatureAnomaly2050Python.png")
plt.show()