Computing the covariance matrix from the correlation matrix and standard deviations

Problem Recall the definition of correlation matrix as normalized covariance matrix. Write a function genCovMatFromCorMat(CorMat, StdVec = None) that computes the covariance matrix from an input correlation matrix and, optionally - if available, the input vector of standard deviations.

Computing the correlation matrix of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...

Prove that the diagonal elements of a correlation matrix of a dataset must be one

Problem Recall that covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix}\] where each element is computed via the following equation, \[\sigma_{ij} = \sum_1^{n} ~ (x_i-\overline{x})(y_j-\overline{y})\] where...

Computing the Spearman rank correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the Spearman correlation rank coefficient is merely the Pearson’s correlation coefficient of the ranks of two attributes in a...

Computing the Pearson correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Pearson’s correlation coefficient, between two attributes of a dataset. \[r_{xy} = \frac{ \sum_1^{n} ~ (x_i-\overline{x})(y_i-\overline{y})...

The most sensitive correlation coefficient to outliers

Problem Which correlation coefficient is the most sensitive to outliers among the three Pearson, Spearman, and Kendall correlation coefficients? Why?

Computing the Kendall's rank correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Kendall’s rank correlation coefficient, between two attributes of a dataset. \[\tau_{xy} = \frac{ \text{#concordant pairs}...

Visualizing the average precipitation among the US states

Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000. Make a choropleth visualization of this precipitation data (either using the SI or the US-British units). Do not forget to add a color-bar to...

Computing the first four moments of a sample

Problem Consider this dataset comprised of $1000$ observations (tuples). Compute the first four standardized moments of this sample (mean, standard deviation, skewness, kurtosis).

An experimental proof of Chebyshev's inequality

Problem The Chebyshev Inequality states that no more than $1/k^2$ of an attribute values of a given sample can be $k$ or more standard deviations away from the attribute mean. Provide an experimental proof of this theorem by generating a...

Computing the mean of a weighted data

Problem Consider this weighted dataset comprised of $500$ observations (tuples) each of which is described by $5$ attributes. Note that the last column of data is the weight of each tuple. Compute the weighted mean of this sample.

Monte Carlo approximation of the number Pi

Problem Compute the following 10-dimensional integral via Monte Carlo Rejection sampling method, \[I = \int_{x_1 = 0}^{x_1 = 1} dx_1 \cdots \int_{x_{10} = 0}^{x_{10} = 1} dx_{10} \bigg(\sum_{i=1}^{i=10} ~ x_i ~ \bigg) ~,\] Ensure the accuracy of your integration result...

Monte Carlo approximation of the number Pi using a full circle

Problem Suppose we did not know the value of $\pi$ and we wanted to estimate its value using Monte Carlo methods. One practical approach is to draw a square sides equal to $a = 2$, with its diagonal opposite corners...

Monte Carlo approximation of the area of heart

Problem A popular mathematical equation for 2D heart is the following, \[f(x,y) = (x^2 + y^2 - 1)^3 - x^2 y^3 = 0\] Any $(x,y)$ values that result in $f(x,y) < 0$ represent the coordinates of a point that falls...

Parsing data from the World Wide Web

Consider the following web-page address https://cdslaborg.github.io/DataRepos_SwiftBat/index.html. This is a data table in HTML language containing data from the NASA Swift satellite. Each row in this table represents information about a Gamma-Ray Burst (GRB) detection that Swift has made in the...

Data transfer: Converting formatted input to Comma-Separated-Values (CSV) output

Problem Consider this formatted data file: data.in. Write a simple script named formatted2csv that takes two input arguments representing the input and output file names. Then, the script writes the same input float data to the output file data.out in...

Command line input option-value pairs

Problem Python Suppose we want to write a program that takes in three input parameters: the initial height ($y_0$) initHeight, the initial velocity ($v_0$) initVelocity, the time after which we want to know how much a projectile has moved in...

Python modules and packaging

Problem Consider the following codes that compute the Fibonacci sequence using two different methods: fib_recursive.py and fib_loop.py. Put these two functions in a folder named fib such that they can be imported as a Python package to your Python environment....

Visualization: The world population

Problem The following plot shows the worldwide population by countries and states. What kind of visualization is this plot?

Visualization: The world population (refined)

Problem The following plot is a refined map of the worldwide population. What kind of visualization and map is this plot? Select all that apply. Robinson map, Interrupted Goode Homolosine map, Cartesian Longitude and Latitude map, Cartogram Heat map, Winkel...