Visualization color scales
Problem Which classes of color scales the following color-mappings belong to? a) b) c) d)
Problem Which classes of color scales the following color-mappings belong to? a) b) c) d)
Problem Supposed we have observed a dataset comprised of events with two attributes $x$ and $y$ as in this file: data.xlsx. Plot this data in Microsoft Excel. Form a hypothesis about the relationship between $x$ and $y$. Use Excel’s Trendline...
Problem Suppose I have discovered a positive relationship between properties of some celestial objects, like the one formed by the black dots in the following figure. But in making such a discovery, I repeatedly and subconsciously throw away any data...
Problem Consider the following csv dataset containing the temperature of cities around the world from 1995 to 2020. Each row in the file corresponds to the average temperature (in Fahrenheit) of a city in a given day of the year....
Problem Consider the following Excel dataset containing the temperature of two US cities Honolulu, HI and Duluth, MN from 1995 to 2020. There are two pages in the Excel file: Duluth, and Honolulu. Each row in the file corresponds to...
Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000 and this dataset. Combine these two datasets in Excel and generate a plot of US states precipitation vs. sunshine like the following figure. Note...
Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...
Problem Recall the definition of correlation matrix as normalized covariance matrix. Write a function genCovMatFromCorMat(CorMat, StdVec = None) that computes the covariance matrix from an input correlation matrix and, optionally - if available, the input vector of standard deviations.
Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...
Problem Recall that covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix}\] where each element is computed via the following equation, \[\sigma_{ij} = \sum_1^{n} ~ (x_i-\overline{x})(y_j-\overline{y})\] where...
Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the Spearman correlation rank coefficient is merely the Pearson’s correlation coefficient of the ranks of two attributes in a...
Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Pearson’s correlation coefficient, between two attributes of a dataset. \[r_{xy} = \frac{ \sum_1^{n} ~ (x_i-\overline{x})(y_i-\overline{y})...
Problem Which correlation coefficient is the most sensitive to outliers among the three Pearson, Spearman, and Kendall correlation coefficients? Why?
Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall the equation for the Kendall’s rank correlation coefficient, between two attributes of a dataset. \[\tau_{xy} = \frac{ \text{#concordant pairs}...
Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000. Make a choropleth visualization of this precipitation data (either using the SI or the US-British units). Do not forget to add a color-bar to...
Problem Consider this dataset comprised of $1000$ observations (tuples). Compute the first four standardized moments of this sample (mean, standard deviation, skewness, kurtosis).
Problem The Chebyshev Inequality states that no more than $1/k^2$ of an attribute values of a given sample can be $k$ or more standard deviations away from the attribute mean. Provide an experimental proof of this theorem by generating a...
Problem Consider this weighted dataset comprised of $500$ observations (tuples) each of which is described by $5$ attributes. Note that the last column of data is the weight of each tuple. Compute the weighted mean of this sample.
Problem Compute the following 10-dimensional integral via Monte Carlo Rejection sampling method, \[I = \int_{x_1 = 0}^{x_1 = 1} dx_1 \cdots \int_{x_{10} = 0}^{x_{10} = 1} dx_{10} \bigg(\sum_{i=1}^{i=10} ~ x_i ~ \bigg) ~,\] Ensure the accuracy of your integration result...
Problem Suppose we did not know the value of $\pi$ and we wanted to estimate its value using Monte Carlo methods. One practical approach is to draw a square sides equal to $a = 2$, with its diagonal opposite corners...