Puzzle: How many living creatures are in the pond

Problem How many living creatures can you identify in this figure? (Hint: There are two).

Regression: Predicting the global land temperature of Earth in 2050 from the past data: Choosing the best model

Problem Consider this dataset, 1880_2020.csv, which contains the global land and ocean temperature anomalies of the earth from January 1880 to June 2020 at every month. As stated in the file, temperatures are in Degrees Celsius and reported as anomalies...

Regression: Estimating the parameters of a linear model for a Normally-distributed sample

Problem Supposed we have observed a dataset comprised of events with one attribute as in this file: z.csv. Plotting these points would yield a histogram like the following plot, Now our goal is to form a hypothesis about this dataset,...

Regression: Estimating the parameters of a Normally-distributed sample

Problem Supposed we have observed a dataset comprised of $15027$ events with one attribute variable in this file: dataFull.csv. Plotting these points would yield a histogram like the following plot, Now our goal is to form a hypothesis about this...

Computing the cross-correlation of sin() and cos()

Problem Generate two arrays corresponding to the values of $\sin(x)$ and $\cos(x+\pi/2)$ functions in the range $[0, 10\pi]$. Make a plot of the resulting arrays like the following illustration. Now use an FFT package in the language of your choice...

Computing the cross-correlation of two data attributes

Problem Consider this dataset of carbon emissions history per country. Make a visualization of the global carbon emission data in the CSV file in the above by summing over the contributions of all countries per year to obtain an illustration...

Computing the autocorrelation of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that the autocorrelation of a time-series is defined as the correlation of a univariate dataset with itself, with some...

Computing and removing the autocorrelation of a dataset

Problem Consider the following Banana function. def getLogFuncBanana(point): import numpy as np from scipy.stats import multivariate_normal as mvn from scipy.special import logsumexp NPAR = 2 # sum(Banana,gaussian) normalization factor normfac = 0.3 # sum(Banana,gaussian) normalization factor lognormfac = np.log(normfac) #...

Ugly visualization

Problem What is ugly in the following graph?

The population growths of the US states

Problem Which color scale has been used in the following visualization?

The cities with the most and least moderate temperature

Problem Consider the following plot displaying the temperatures of a number of US cities. Which city’s temperature is the least varying throughout the year? Which city’s temperature is the wildest varying throughout the year? Which city the hottest in the...

Wrong visualization

Problem What is wrong in the following visualization?

Excel Bar plot

Problem Consider the following salary data. Data Scientist | Physicist | Bioinformatician ---------------|-----------|----------------- $110,000 | $122,000 | $58,000 Make a graph of this data in Microsoft Excel similar to the following visualization.

Visualization color scales

Problem Which classes of color scales the following color-mappings belong to? a) b) c) d)

Regression: Model selection for a bivariate data using Excel

Problem Supposed we have observed a dataset comprised of events with two attributes $x$ and $y$ as in this file: data.xlsx. Plot this data in Microsoft Excel. Form a hypothesis about the relationship between $x$ and $y$. Use Excel’s Trendline...

Cognitive Biases

Problem Suppose I have discovered a positive relationship between properties of some celestial objects, like the one formed by the black dots in the following figure. But in making such a discovery, I repeatedly and subconsciously throw away any data...

Visualizing and comparing the temperatures of Honolulu and Duluth

Problem Consider the following csv dataset containing the temperature of cities around the world from 1995 to 2020. Each row in the file corresponds to the average temperature (in Fahrenheit) of a city in a given day of the year....

Visualizing and comparing the temperatures of Honolulu and Duluth via Excel

Problem Consider the following Excel dataset containing the temperature of two US cities Honolulu, HI and Duluth, MN from 1995 to 2020. There are two pages in the Excel file: Duluth, and Honolulu. Each row in the file corresponds to...

Visualizing the average precipitation of the US states vs. sunshine

Problem Consider the following dataset containing the average annual precipitation in the US states between 1971-2000 and this dataset. Combine these two datasets in Excel and generate a plot of US states precipitation vs. sunshine like the following figure. Note...

Computing the covariance matrix of a dataset

Problem Recall the globalLandTempHist.txt dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that a covariance matrix is a symmetric positive-definite square matrix of the form, \[\Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12}...