Problem

Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form,

\[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix}\]

where each element is computed via the following equation,

\[\sigma_{ij} = \sum_1^{n} ~ (x_i-\overline{x})(y_j-\overline{y})\]

where $n$ represents the number of data points and $x_i$ and $x_j$ are individual data points. Also, recall that the corresponding correlation matrix to this covariance matrix of data is defined as the following,

\[\text{Cor} = \begin{pmatrix} \rho_{11} = 1 & \rho_{12} = \frac{\sigma_{12}}{\sqrt(\sigma_{11}\sigma_{22})} \\ \rho_{21} = \frac{\sigma_{21}}{\sqrt(\sigma_{11}\sigma_{22})} & \rho_{22} = 1 \end{pmatrix}\]

Use the covariance matrix definition and the above equations to compute the correlation matrix of the two year and the temperature anomaly attributes in the data referenced above. To do so, first write a generic function genCorMat(Data) that takes an input matrix (table), then calls another function genCovMat(data) that computes the covariance matrix of data, then uses this covariance matrix to calculate the corresponding elements of the correlation matrix.

Solution

MATLAB

To be added…

Python
def genCovMat(Data, Mean = None):
    """
    Generate and return the covariance matrix of the input data.
    The columns of data must be individual attributes.
    The rows of data must be individual observations.
    Please pass clean matrix of all real values (no NA, no NaN).
    
    Parameters
    ----------
        Data
            The input Numpy matrix of data of all numeric values.
    
        Mean
            The mean of the input data along the columns (attributes)
            (**optional**, default = numpy.mean(Data))
    """
    import numpy as np
    if Mean is None: Mean = np.mean(Data, axis = 0)
    ndim = len(Data[0,:])
    npnt = len(Data[:,0])
    normFac = 1 / (npnt - 1)
    CovMat = np.zeros((ndim,ndim))
    for irow in range(ndim):
        for icol in range(irow+1):
            CovMat[irow,icol] = normFac * np.dot( Data[:,irow] - Mean[irow] , Data[:,icol] - Mean[icol] )
            CovMat[icol,irow] = CovMat[irow,icol] 
    return CovMat

def genCorMat(Data, Mean = None):
    """
    Generate and return the correlation matrix of the input data.
    The columns of data must be individual attributes.
    The rows of data must be individual observations.
    Please pass clean matrix of all real values (no NA, no NaN).
    
    Parameters
    ----------
        Data
            The input Numpy matrix of data of all numeric values.
    
        Mean
            The mean of the input data along the columns (attributes)
            (**optional**, default = numpy.mean(Data))
    """
    import numpy as np
    CovMat = genCovMat(Data,Mean)
    ndim = len(CovMat[0,:])
    CorMat = np.ones((ndim,ndim))
    for irow in range(ndim):
        for icol in range(irow+1):
            if icol != irow:
                CorMat[irow,icol] = CovMat[irow,icol] / np.sqrt(CovMat[irow,irow] * CovMat[icol,icol])
                CorMat[icol,irow] = CorMat[irow,icol]
    return CorMat
    
# Read the global land temperature history data
import pandas as pd
df = pd.read_csv('http://www.cdslab.org/recipes/programming/stat-covmat/globalLandTempHist.txt', ', ')
df = df.dropna()
df = df.reset_index(drop=True)

# Get the rank of data
CorMat = genCorMat(df.values)
print("CorMat = \n{}".format(CorMat))
CorMat = 
[[1.         0.32364863]
 [0.32364863 1.        ]]

Comments