Problem

Consider this dataset, usaTemperatureHistory.txt, which contains the global land temperature of the earth from 1743 to 2013 at every monthly resolution. As stated in the file, temperatures are in Celsius and reported as anomalies relative to the average global land temperature of the Earth between Jan 1951 to Dec 1980.

First, plot the data contained in the third column of this file (the Monthly Anomaly column starting with $-1.131$) against the year and month of the corresponding row in the file. Note that the year and month of the temperature anomaly are given separately in the first two columns. Therefore, to include all of the months of a given year correctly and consistently on a single plot, you will need to define a new value for the x-axis of the plot, where $x$ represents the year plus the month of the record times $1/13$,

\[x = \rm{year} + \rm{month-1} / 12 ~~,\]

Note that you will have to write your code in such a way to remove the first 70 lines of text in the file to reach to the data section. The resulting plot should look something like the following,

Clearly, one can see that the global land temperature of the Earth was relatively constant up until around 1880. Afterwards, the global temperature starts to increase consistently over the entire 20th century and beyond (actually, if you look closely, you will notice a bump in the global temperature at around 1950, which can be attributed to the slow-down of the world’s economy during the second world-war). The around 1970, it seems like the pace of global temperature-increase significantly accelerated compared to the past.

Knowing this, we want to use the data after 1970, up until the most recent recorded temperature in 2013 in this file, to predict what the temperature anomaly of the Earth in the year 2050 will be. By looking at the above plot, it seems plausible that a simple linear regression will give us a good first approximation of the temperature anomaly in 2050.

(a) Using the maximum likelihood approach, write a script that fits a line to the temperature data used above from the year 1970 to 2013 and find the parameters of this fit. Then use this fitted line to predict the temperature anomaly of the Earth in 2050. The result of the fitting should be something like the following,

where the red line represents the linear-fit to the temperature anomaly data from 1970-2013, and then extrapolates to the temperature anomaly to the year 2050. Interestingly, the predicted temperature-anomaly at year 2050 ($1.9158$) is very close to what the climate scientists have been talking about, for a long time.
Hint:
To get started, think of how to express your hypothesis mathematically as an equation.

  1. What is the hypothesis here?
  2. How would you write the hypothesis mathematically (it’s simple, don’t overthink)?
  3. What are the parameters of your model (hypothesis)?
  4. Now, obviously, if you have proposed a realistic model, it should not fit the data perfectly.
    Rather, you would expect to see some data scatter around your best-fit model.
    Therefore, we need to also propose a hypothesis for the form of scatter around this physical model.
  5. What do you think would be a good model describe the data scatter around your deterministic model?
    If you have no idea, take the simplest case, a Gaussian.
    What would be the parameters of this noise model then?
    Which one of them can be computed from your originally-proposed deterministic model?
    Which one of the parameters is unknown and has to be inferred through the fitting process?
  6. So, now over all, how many parameters do you need to constrain for your combined (deterministic + noise model)?
  7. Now, write the likelihood function for this dataset under your proposed combined hypothesis. The likelihood function is simply the product the probabilities of observing individual events given your deterministic and noise models (Use our lecture notes for help to define it).
  8. Now, we need to maximize this function, but it is not computationally feasible. Why? What do you suggest to do with the likelihood function to make computable within the limited numerical precision of the computer?
  9. Apply the necessary transformation to get an objective function that can be readily maximized by a computer.
  10. Now, use the optimizer algorithm of your language of choice to find the parameter sets that maximize the likelihood of observing your data given the model (In Python, You can use SciPy’s minimize() function).
  11. Now, repeat the same process for a different hypothesis: The temperature anomaly is flat from the year 1970 to 2013.
    That is surely a bad hypothesis, but regardless, let’s form a mathematical model corresponding to this hypothesis and see how it looks like.
  12. Now, use the BIC model selection criterion (from our lecture notes) to compare the performance of the two hypotheses for our data (one is temperature increase over time, the other is constant temperature (no global warming)).
    Which one fits the data better?
  13. What does each model predict for the temperature of Earth in 2050?

(b) Extra Credit - So far, we were able to find the best fit model to our data, constrain its parameters, and predict the temperature of Earth in 2050 based on those models.
But we can do more than just making a single point prediction. We can quantify the uncertainty in our prediction for the temperature of Earth in 2050!
To do so, we need to learn the shape of the likelihood function in addition to just learning its maximum location.
One of the best tools for exploring objective functions is the Markov Chain Monte Carlo technique which enables us to to sample the constructed maximum-likelihood objective function and compute the $1\sigma$ uncertainty range for the predicted anomaly in the year 2050.

Hint:
Use can reach out to the instructor for more help on this part of the problem. But in general, you can use an MCMC tool like the ParaMonte library in Python, MATLAB, C, or Fortran to perform the Monte Carlo simulation (exploration of the likelihood) for this task.

Comments