Visualization color scales

Problem Which classes of color scales the following color-mappings belong to? a) b) c) d)

Regression: Model selection for a bivariate data using Excel

Problem Supposed we have observed a dataset comprised of events with two attributes $x$ and $y$ as in this file: data.xlsx. Plot this data in Microsoft Excel. Form a hypothesis about the relationship between $x$ and $y$. Use Excel’s Trendline...

Cognitive Biases

Problem Suppose I have discovered a positive relationship between properties of some celestial objects, like the one formed by the black dots in the following figure. But in making such a discovery, I repeatedly and subconsciously throw away any data...

Visualizing and comparing the temperatures of Honolulu and Duluth

Problem Consider the following csv dataset containing the temperature of cities around the world from 1995 to 2020. Each row in the file corresponds to the average temperature (in Fahrenheit) of a city in a given day of the year....

Visualizing and comparing the temperatures of Honolulu and Duluth via Excel

Problem Consider the following Excel dataset containing the temperature of two US cities Honolulu, HI and Duluth, MN from 1995 to 2020. There are two pages in the Excel file: Duluth, and Honolulu. Each row in the file corresponds to...

Visualizing the average precipitation of the US states vs. sunshine

Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000 and this dataset. Combine these two datasets in Excel and generate a plot of US states precipitation vs. sunshine like the following figure. Note...

Computing the covariance matrix of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...

Computing the covariance matrix from the correlation matrix and standard deviations

Problem Recall the definition of correlation matrix as normalized covariance matrix. Write a function genCovMatFromCorMat(CorMat, StdVec = None) that computes the covariance matrix from an input correlation matrix and, optionally - if available, the input vector of standard deviations.

Computing the correlation matrix of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...

Prove that the diagonal elements of a correlation matrix of a dataset must be one

Problem Recall that covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix}\] where each element is computed via the following equation, \[\sigma_{ij} = \sum_1^{n} ~ (x_i-\overline{x})(y_j-\overline{y})\] where...

Computing the Spearman rank correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the Spearman correlation rank coefficient is merely the Pearson’s correlation coefficient of the ranks of two attributes in a...

Computing the Pearson correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Pearson’s correlation coefficient, between two attributes of a dataset. \[r_{xy} = \frac{ \sum_1^{n} ~ (x_i-\overline{x})(y_i-\overline{y})...

The most sensitive correlation coefficient to outliers

Problem Which correlation coefficient is the most sensitive to outliers among the three Pearson, Spearman, and Kendall correlation coefficients? Why?

Computing the Kendall's rank correlation coefficient of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Kendall’s rank correlation coefficient, between two attributes of a dataset. \[\tau_{xy} = \frac{ \text{#concordant pairs}...

Visualizing the average precipitation among the US states

Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000. Make a choropleth visualization of this precipitation data (either using the SI or the US-British units). Do not forget to add a color-bar to...

Computing the first four moments of a sample

Problem Consider this dataset comprised of $1000$ observations (tuples). Compute the first four standardized moments of this sample (mean, standard deviation, skewness, kurtosis).

An experimental proof of Chebyshev's inequality

Problem The Chebyshev Inequality states that no more than $1/k^2$ of an attribute values of a given sample can be $k$ or more standard deviations away from the attribute mean. Provide an experimental proof of this theorem by generating a...

Computing the mean of a weighted data

Problem Consider this weighted dataset comprised of $500$ observations (tuples) each of which is described by $5$ attributes. Note that the last column of data is the weight of each tuple. Compute the weighted mean of this sample.

Monte Carlo approximation of the number Pi

Problem Compute the following 10-dimensional integral via Monte Carlo Rejection sampling method, \[I = \int_{x_1 = 0}^{x_1 = 1} dx_1 \cdots \int_{x_{10} = 0}^{x_{10} = 1} dx_{10} \bigg(\sum_{i=1}^{i=10} ~ x_i ~ \bigg) ~,\] Ensure the accuracy of your integration result...

Monte Carlo approximation of the number Pi using a full circle

Problem Suppose we did not know the value of $\pi$ and we wanted to estimate its value using Monte Carlo methods. One practical approach is to draw a square sides equal to $a = 2$, with its diagonal opposite corners...