Problem

Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form,

\[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix}\]

where each element is computed via the following equation,

\[\sigma_{ij} = \sum_1^{n} ~ (x_i-\overline{x})(y_j-\overline{y})\]

where $n$ represents the number of data points and $x_i$ and $x_j$ are individual data points. Use the covariance matrix definition and the above equation to compute the covariance matrix of the two year and the temperature anomaly attributes in the data referenced above. To do so, first write a generic function genCovMat(Data) that takes an input matrix (table) of data and generates the covariance matrix of data.

Solution

MATLAB

To be added…

Python
def genCovMat(Data, Mean = None):
    """
    Generate and return the covariance matrix of the input data.
    The columns of data must be individual attributes.
    The rows of data must be individual observations.
    Please pass clean matrix of all real values (no NA, no NaN).
    
    Parameters
    ----------
        Data
            The input Numpy matrix of data of all numeric values.
    
        Mean
            The mean of the input data along the columns (attributes)
            (**optional**, default = numpy.mean(Data))
    """
    import numpy as np
    if Mean is None: Mean = np.mean(Data, axis = 0)
    ndim = len(Data[0,:])
    npnt = len(Data[:,0])
    normFac = 1 / (npnt - 1)
    CovMat = np.zeros((ndim,ndim))
    for irow in range(ndim):
        for icol in range(irow+1):
            CovMat[irow,icol] = normFac * np.dot( Data[:,irow] - Mean[irow] , Data[:,icol] - Mean[icol] )
            CovMat[icol,irow] = CovMat[irow,icol] 
    return CovMat

#### Read the temperature anomaly data and compute the covariance matrix.

import pandas as pd
df = pd.read_csv('http://www.cdslab.org/recipes/programming/stat-covmat/globalLandTempHist.txt', ', ')
df = df.dropna()
df = df.reset_index(drop=True)

CovMat = genCovMat(df.values)
print("CovMat = \n{}".format(CovMat))
CovMat = 
[[5.81382831e+03 2.30956353e+01]
 [2.30956353e+01 8.75891017e-01]]

Comments