Consider this dataset, 1880_2020.csv, which contains the global land and ocean temperature anomalies of the earth from January 1880 to June 2020 at every month. As stated in the file, temperatures are in Degrees Celsius and reported as anomalies relative to the average global land temperature of the Earth between in the year 1950.

First, plot the temperature anomalies reported in this dataset to obtain the following figure,

Clearly, one can see that the global land and sea temperature of the Earth starts to increase consistently over the entire 20th century and beyond (actually, if you look closely, you will notice a bump in the global temperature at around 1950, which can be attributed to the slow-down of the world’s economy during the second world-war). The around 1970, it seems like the pace of global temperature-increase significantly accelerated compared to the past.

Knowing this, we want to use the data after 1980, up until the most recent recorded temperature in 2020 in this file, to predict what the temperature anomaly of the Earth in the year 2050 will be. By looking at the above plot, it seems plausible that a simple linear regression will give us a good first approximation of the temperature anomaly in 2050. Write a script that fits a line to the temperature data used above from about the year 1980 (data point 1200 in the file) to 2020 (the last data point in the file) and find the parameters of this fit. Then use this fitted line to predict the temperature anomaly of the Earth in 2050. The result of the fitting should be something like the following,

where the red line represents the linear-fit to the temperature anomaly data from 1970-2013, and then extrapolates to the temperature anomaly to the year 2050. Interestingly, the predicted temperature-anomaly at year 2050 ($1.9158$) is very close to what the climate scientists have been talking about, for a long time.

Next, we want to perform an exponential fit to the dataset. Taking the year 1950 as the reference zeroth point, we can perform the exponential fit using the least-squares method to obtain the following fit,

Now write a script that does this fitting and compare the predicted values for the temperature anomaly of the earth as predicted by the linear and exponential fits. Which one do you believe more?

Here is an implementation of this code in Python:,

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.optimize import fmin
sns.set()

##############################
# import the file as dataframe
##############################

date = np.zeros(len(df.Year))
for i in range(0,len(df.Year)):
df.loc[i,"Year"] = (float(str(df.Year[i])[0:4])+(float(str(df.Year[i])[4:6])-1)/12)

##############################
# generate the plot
##############################

# generate the plot
fig = plt.figure()
ax = fig.gca()
plt.plot( df.Year
, df.Value
, c = "red"
)
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (Celcius)")
plt.title("Global Land Temperature Anomaly vs 1901-2000 Average")
plt.savefig("GlobalLandSeaTemperatureAnomaly.png")
plt.show()

##############################
# define the fitting class
##############################

class Anomaly():

def __init__(self, dates, anomalies, fitType = "linear", initparams = None):
self.dates = np.double(dates)
self.anomalies = np.double(anomalies)
self.fitType = fitType
self.initparams = initparams
if self.fitType=="linear" and self.initparams is None: self.initparams = [0,0]
if self.fitType=="exp" and self.initparams is None: self.initparams = [0,1,0]

def predictLinear(self,params,dates):
# first element of params is intercept
# second element of params is slope
return params + params * dates

def predictExp(self,params,dates):
return params + np.exp(params * dates - params)

def getSumDistSq(self,params):
if self.fitType=="linear": return np.sum( (self.anomalies - self.predictLinear(params,self.dates) )**2 )
if self.fitType=="exp": return np.sum( (self.anomalies - self.predictExp(params,self.dates) )**2 )

startPoint = 1200
anomaly = Anomaly( dates = df.Year[startPoint:]
, anomalies = df.Value[startPoint:]
)

##############################
# do linear fitting
##############################

bestFitParams = fmin( func = anomaly.getSumDistSq
, x0 = anomaly.initparams
)

# generate the plot

fig = plt.figure()
ax = fig.gca()
plt.plot( df.Year
, df.Value
, c = "red"
)
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (Celcius)")
plt.title("Global Land Temperature Anomaly vs 1901-2000 Average")
datesRange = np.linspace(1880,2050,1000)
plt.plot( datesRange
, anomaly.predictLinear(params = bestFitParams, dates = datesRange)
, c= "black"
)
plt.show()
plt.savefig("GlobalLandSeaTemperatureAnomalyLinear.png")

##############################
# do exponential fitting
##############################

startPoint = 840
anomaly = Anomaly( dates = df.Year[startPoint:]-1950
, anomalies = df.Value[startPoint:]
, fitType = "exp"
)
bestFitParams = fmin( func = anomaly.getSumDistSq
, x0 = [-1,.01,0]
)
bestFitParams

# generate the plot

fig = plt.figure()
ax = fig.gca()
plt.plot( df.Year
, df.Value
, c = "red"
)
ax.set_xlabel("Year")
ax.set_ylabel("Temperature Anomaly (Celcius)")
plt.title("Global Land Temperature Anomaly vs 1901-2000 Average")
datesRange = np.linspace(1880-1950,2050-1950,1000)
plt.plot( datesRange + 1950
, anomaly.predictExp(params = bestFitParams, dates = datesRange)
, c= "black"
)
plt.savefig("GlobalLandSeaTemperatureAnomalyExp.png")
plt.show()