CDSLab Recipes - A repository for all sorts of problems with solutions Jekyll 2022-11-14T01:55:09-06:00 https://www.cdslab.org/recipes/ Amir Shahmoradi https://www.cdslab.org/recipes/ shahmoradi@utexas.edu <![CDATA[Logic NAND and NOR]]> https://www.cdslab.org/recipes/programming/logic-nand-nor/logic/logic-nand-nor 2022-11-01T00:00:00-05:00 2020-11-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <p>Recall the definitions of NAND and NOR from our lecture notes. Show that,</p> $\overline{(A \uparrow A) \downarrow (B \uparrow B)} \equiv (\overline{A} \downarrow \overline{A}) \uparrow (\overline{B} \downarrow \overline{B}) ~.$ <p><a href="https://www.cdslab.org/recipes/programming/logic-nand-nor/logic/logic-nand-nor">Logic NAND and NOR</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 01, 2022.</p> <![CDATA[Logical implication in terms of logic functions]]> https://www.cdslab.org/recipes/programming/logic-functions-implication/logic-functions-implication 2022-10-21T00:00:00-05:00 2020-10-21T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <p>Consider the following logic functions,</p> <figure><img src="basisTruthTable.png" width="350" /></figure> <p>Show that,</p> $f_1(A, B) + f_3(A, B) + f_4(A, B)$ <p>is equivalent to logical implication $A \Rightarrow B$.</p> <p><a href="https://www.cdslab.org/recipes/programming/logic-functions-implication/logic-functions-implication">Logical implication in terms of logic functions</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 21, 2022.</p> <![CDATA[Version-control: Setting up Git Software and GitHub Account]]> https://www.cdslab.org/recipes/programming/vcs-git-github-setup/vcs-git-github-setup 2022-09-13T00:00:00-05:00 2022-09-13T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:4rem;margin-bottom:1rem;"> <figure> <a href="#git" id="git"> <img src="https://www.cdslab.org/recipes/images/Git.png" width="75px" /> </a> <figcaption> <a href="https://en.wikipedia.org/wiki/Git" target="_blank"> Git </a> </figcaption> </figure> </div> <p>This exercise guides you through the steps needed to take to properly install and minimally use the git software and the Git Bash terminal on your system.<br /> By the end of this exercise, you will be able to initialize an empty git project anywhere in your computer, or initialize a project on GitHub and clone it to your system.<br /> <strong>Guidelines.</strong> Use the following references for operations in a (Git) Bash terminal.<br /> - <a href="https://www.cdslab.org/python/notes/preliminary-foundations/version-control-system/linuxRef.pdf" target="_blank">Linux Bash command reference</a> <br /> <br /></p> <ol> <li><strong>Only Windows users</strong>. Before installing the Git software, I highly recommend you to download and install the most recent version of <a href="https://notepad-plus-plus.org/downloads/" target="_blank">Nodepad++ text editor</a> on your system, if you do not have it already.</li> <li><strong>Git installation</strong>. Visit the <a href="https://git-scm.com/downloads" target="_blank">Git downloads website</a> and download the most recent version of the git software to install on your system. <ul> <li><strong>Only Windows users</strong>. During the installation, the git software may ask you to link your Notepad++ software with git. If given this option, choose it.</li> </ul> </li> <li><strong>Interacting with Git</strong>. <ul> <li>Depending on your operating system, <ul> <li><strong>On Windows systems</strong>, <ol> <li>press the Windows key + <code>E</code> to open a Windows explorer.<br /> <img src="explorer-key-shortcut.jpg" alt="explorer-key-shortcut.jpg" /></li> <li>Then, navigate to the directory <code>C:\Users\account</code>, where you have to replace <code>account</code> with your Windows account name.</li> <li>Now right-click on an empty region of the Windows explorer, you should see a menu like the following.<br /> <img src="git.bash.here.png" alt="git.bash.here.png" /><br /> Click on “Git Bash Here” to open a Git Bash session. You should see a Bash session opened like the following screen shot,<br /> <img src="git.bash.png" alt="git.bash.png" /><br /> What is <strong>Git Bash</strong>? Git Bash is simply a <a href="https://en.wikipedia.org/wiki/Bash_(Unix_shell)" target="_blank">Bash terminal</a> that is tailored for Git usage on Windows.<br /> The Linux and macOS operating systems have Bash-compatible terminals (shells) that allow interaction with the operating system.<br /> The Git software was originally built as a Linux application that natively used the Linux terminals for interaction.<br /> However, Windows system is not fully compatible with Linux terminals and does not have native Linux-compatible terminals.<br /> Therefore, the Git developers decided to ship the Git software with a dedicated Bash terminal for use on Windows systems, so that Windows users also get the same feeling as Linux and macOS users when dealing with Git software.</li> </ol> </li> <li><strong>On Linux / macOS systems</strong>. <ol> <li>Simply search for <strong>terminal</strong> in the search box of your operating system and open a terminal.</li> <li>Then type <code>cd ~</code> and press enter. This will take you to <strong>home directory</strong> of your system.</li> <li>Then type <code>open .</code> in your terminal and press enter to open a <strong>macOS finder</strong> in the same location as your home folder.</li> </ol> </li> </ul> </li> <li>Now, within your terminal (whether Git Bash or Linux/macOS terminal), type <code>pwd</code> and press enter.<br /> <strong>Why?</strong> This Bash command displays the current working directory where you are.<br /> It should print the path to the home directory of your system, because you are already in the home directory.</li> <li>Now type <code>ls -a</code> and press enter.<br /> <strong>Why?</strong> This Bash command will display a list of all files and folders in the home directory of your computer.<br /> The <code>-a</code> flag requests the Bash terminal to show all files (<strong>including hidden files</strong>).<br /> Note that any file or folder whose name begins with a <code>.</code> is automatically hidden from your view.</li> </ul> </li> <li><strong>Creating your first Git project.</strong><br /> There are two ways to create git projects, <ol> <li>Creating a git project on your local system. <ul> <li>Now type <code>mkdir git</code> and press enter.<br /> <strong>Why?</strong> This Bash command will create a new folder named <code>git</code> within the current directory of your Bash session (which is already your home directory).<br /> Although you can use any place in your computer to store your Git projects, it is good practice and much easier to keep them all in one place (the <code>git</code> folder you just created). <blockquote> <p>Some of you may have already created a <code>git</code> folder in your home directory. <br /> In such a case, the command <code>mkdir git</code> will lead to an error because the folder already exists.<br /> Do not panic, ignore the error message and move on to the next step below.</p> </blockquote> </li> <li>Now type <code>ls -l</code> again and press enter.<br /> <strong>Why?</strong> If you have successfully created the new folder <code>git</code> in your current directory, this command will show you the new folder in the listing it displays.<br /> If you cannot find it, you need to reach out to me to identify the roots of the problem.</li> <li>Now type <code>cd git</code> and press enter.<br /> <strong>Why?</strong> The <code>cd</code> Bash command stands for <strong>Change Directory</strong>. Therefore, <code>cd git</code> will change your current working directory from your home directory to the subdirectory <code>git</code>.</li> <li>Now, we want to create a new git project in this folder. Let’s say the name of the project is <code>test</code>.</li> <li>First, we will have to create a <code>test</code> folder where we will store all files and materials related to the <code>test</code> git project. Type the following in the terminal and press enter. <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir test</span> <span class="o">&amp;&amp;</span> <span class="nb">cd test</span> </code></pre></div> </div> <p>This command will create a <code>test</code> subfolder in your <code>git</code> folder and then will change the current directory to wihtin the <code>test</code> subfolder.<br /> The <code>&amp;&amp;</code> simply means <strong>and</strong>: Make directory <code>test</code> <strong>and</strong> change directory to <code>test</code>.</p> </li> <li>This is where we want to host our git test project. To initialize an empty Git project here, type the following git command in the terminal, <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git init </code></pre></div> </div> </li> <li><strong>Done.</strong> You have successfully created an empty test Git project on your computer.</li> <li>Each and every Git project is always associated with a <code>.git</code> subfolder in the same location where your store your project on your system.<br /> To ensure you have successfully initiated a Git project, type <code>ls -a</code> in your terminal and press enter.<br /> This should display a list among which there is <code>.git</code>.<br /> If you do not see this hidden folder in the list displayed, your <code>test</code> folder is <strong>not</strong> a git project yet!</li> <li>Of course, your Git project is still empty. But you can now put anything you like in it and make it part of your project.<br /> For example, we can add an empty text file to it by typing, <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">touch </span>README.md </code></pre></div> </div> <p>in the Bash terminal and pressing enter.</p> </li> </ul> </li> <li>Creating a git project on a server and cloning the project on your local system.<br /> This is the easier way of initializing projects and sharing them with other Team members in the project.<br /> In the follow-up exercises, we will learn how to initialize Git projects using <a href="https://github.com/" target="_blank">GitHub</a> as the git server.</li> </ol> </li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/vcs-git-github-setup/vcs-git-github-setup">Version-control: Setting up Git Software and GitHub Account</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on September 13, 2022.</p> <![CDATA[A naive implementation of Kmedoids clustering]]> https://www.cdslab.org/recipes/programming/clustering-naive-kmedioid-implementation/clustering-naive-kmedioid-implementation 2021-12-01T00:00:00-06:00 2021-12-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>The K-medoids is a classical partitioning technique of clustering that splits the data set of $n$ objects into $k$ clusters, where the number $k$ of clusters is assumed to be known a priori. Unlike the K-means algorithm however, <strong>the K-medoids algorithm chooses points within the dataset as the centers of the clusters</strong>.</p> <p>A naive implementation of the K-medoids algorithm is similar to the K-means algorithm and requires the following steps,</p> <ol> <li>Select the initial medoids randomly (that is, select $k$ points from the dataset randomly as the cluster centers).</li> <li>Iterate while the cost decreases: <ol> <li>In each cluster, make the point that minimizes the sum of distances within the cluster the medoid.</li> <li>Reassign each point to the cluster defined by the closest medoid determined in the previous step.</li> </ol> </li> </ol> <p>Write a function in Python, MATLAB, or your preferred language to cluster an input dataset using the naive K-medoids method.<br /> Test the functionality of your algorithm with <a href="https://www.cdslab.orghttps://www.cdslab.org/recipes/programming/clustering-kmeans-implementation/points.txt" target="_blank">this example dataset</a> for 6 clusters.</p> <p><a href="https://www.cdslab.org/recipes/programming/clustering-naive-kmedioid-implementation/clustering-naive-kmedioid-implementation">A naive implementation of Kmedoids clustering</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on December 01, 2021.</p> <![CDATA[Online comparison of the Kmeans clustering algorithm with DBSCAN]]> https://www.cdslab.org/recipes/programming/clustering-kmeans-vs-dbscan-online/clustering-kmeans-vs-dbscan-online 2021-12-01T00:00:00-06:00 2021-12-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>On <a href="https://www.naftaliharris.com/blog/visualizing-k-means-clustering/" target="_blank">this website</a>, you will find an online simulator of the Kmeans clustering technique.</p> <ol> <li>Visit this page and choose the first choice stating <strong>I’ll Chooose</strong>. You will be taken to a new page.</li> <li>On this new page, choose <strong>Smiley Face</strong>. Then, you will be taken to a another page where you see a set of points like a smiley face.</li> <li>You will notice that you have the choice of adding (specifying) as many cluster centers as you like. Using mouse clicks, specify (add) four cluster centers on your best guess for the cluster centers.</li> <li>Then, press <strong>Go!</strong> and then continue updating <strong>Centroids</strong> (cluster centers) until the cluster do not change visibly anymore (convergence to a set of clusters has occurred).</li> <li>Is the final set of clusters that you get satisfying?</li> <li>Take a screenshot of the clusters that you get and submit it with your homework.</li> <li>Repeat this procedure with a new random set of cluster centers.</li> <li>Do the resulting clusters look the same as you got before? Why?</li> <li>Take a screenshot of the clusters that you get and submit it with your homework.</li> </ol> <p>Now, visit <a href="https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/" target="_blank">this website</a> and choose the same option <strong>Smiley Face</strong> as before to perform DBSCAN clustering on the same <strong>Smiley Face</strong> data. Adjust the parameters such that the two eyes, mouth, and the face-circle each become separate clusters. Why is the DBSCAN clustering so much more successful than the Kmeans? Take a screenshot of the clustering result to submit with your homework.</p> <p><a href="https://www.cdslab.org/recipes/programming/clustering-kmeans-vs-dbscan-online/clustering-kmeans-vs-dbscan-online">Online comparison of the Kmeans clustering algorithm with DBSCAN</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on December 01, 2021.</p> <![CDATA[Online experimentation with DBSCAN clustering technique]]> https://www.cdslab.org/recipes/programming/clustering-dbscan-online/clustering-dbscan-online 2021-12-01T00:00:00-06:00 2021-12-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>On <a href="https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/" target="_blank">this website</a>, you will find an online simulator of the DBSCAN clustering technique. Visit this page and choose the first dataset option named <strong>Uniform</strong>. Recall from our lecture notes that the DBSCAN method has two free adjustable parameters that you need to set prior to clustering.</p> <ol> <li>What are the two free parameters of the DBSCAN clustering technique?</li> <li>Choose a set of parameters for DBSCAN on this page for the uniform dataset such that all points are partitioned a single cluster. Is this set of parameters unique to achieve the same clustering result? If not, provide another example set of parameters?</li> <li>Now, choose another set of parameters such that there is at least one outlier (noise) point left at the end of clustering that does not belong to any cluster.</li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/clustering-dbscan-online/clustering-dbscan-online">Online experimentation with DBSCAN clustering technique</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on December 01, 2021.</p> <![CDATA[Kmeans clustering - an implementation]]> https://www.cdslab.org/recipes/programming/clustering-kmeans-implementation/clustering-kmeans-implementation 2021-11-29T00:00:00-06:00 2019-11-21T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Consider this dataset <a href="points.txt" target="_blank">points.txt</a>. Write a script that reads this dataset and plots the second column of the dataset versus the first column as the following,</p> <figure> <img src="points.png" /> </figure> <p>Now write another script that applies Kmeans clustering technique to this data set and plot the resulting clusters for a range of input number of clusters. Here is an example plot for a cluster count of 6.</p> <figure> <img src="clusters6.png" /> </figure> <p>Make an Elbow plot with the inertia of the clusterings you have done with various cluster counts.</p> <p>Now, write a new algorithm implementing the Kmeans method.</p> <ol> <li>The function that you write must take a two dimensional data as input and the number of clusters to find.</li> <li>Then the function randomly initializes the centers of the clusters.</li> <li>Then it computes the distances of each point from each cluster center.</li> <li>Then it assigns each point to its nearest cluster center.</li> <li>Based on the members identified for each cluster, the function computes the new cluster centers as the averages of their member points.</li> <li>Then it compares the new centers with the old centers and if no center has changed by more than a certain threshold, it returns the memberships and the cluster centers as the clustering result. Otherwise, if at least one center has change beyond the arbitrary threshold that you have set (or the user passes to your function), then it repeats all of the above tasks from 3 to the end, until convergence occurs.</li> </ol> <p>Verify the functionality of your implementation with the external package that you originally used to perform Kmeans clustering.</p> <p><a href="https://www.cdslab.org/recipes/programming/clustering-kmeans-implementation/clustering-kmeans-implementation">Kmeans clustering - an implementation</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 29, 2021.</p> <![CDATA[Kmeans clustering: Determining the cluster number using the Elbow method]]> https://www.cdslab.org/recipes/programming/clustering-kmeans-customers/clustering-kmeans-customers 2021-11-21T00:00:00-06:00 2019-11-21T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Consider this dataset <a href="customers.csv" target="_blank">customers.csv</a> of a Mall’s customers containing the details of customers in a mall. Our aim is to cluster the customers based on the relevant features “annual income” and “spending score”. Write a script that reads this dataset and plots the relevant attributes of the dataset against each other like the following,</p> <figure> <img src="customers.png" /> </figure> <p>Then, the script performs K-means clustering on the two selected attributes of data with a range of number of clusters. Then use the Elbow method to find the optimal number of clusters for the customers in this dataset.</p> <p><a href="https://www.cdslab.org/recipes/programming/clustering-kmeans-customers/clustering-kmeans-customers">Kmeans clustering: Determining the cluster number using the Elbow method</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 21, 2021.</p> <![CDATA[Regression: Predicting the distribution of the a dataset subjected to a smooth censorship (sample incompleteness)]]> https://www.cdslab.org/recipes/programming/regression-erf-censored-gaussian-data/regression-erf-censored-gaussian-data 2021-11-19T00:00:00-06:00 2021-11-19T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Supposed we have observed a dataset of events with one attribute <code>variable</code> in this file: <a href="data.csv" target="_blank">data.csv</a>. Plotting these points would yield a blue-colored histogram like the following plot,</p> <figure> <img src="data.png" width="900" /> </figure> <p><br /></p> <p>Unlike the previous problems where the censorship was due to a sharp cutoff on a Gaussian dataset, the smooth cutoff in this problem is due to the following Gaussian model mixed with and inverted Gaussian CDF,</p> $\pi( x | \mu_G, \sigma_G, \mu_C, \sigma_C) \propto \mathcal{N}(x | \mu_G, \sigma_G) \times \frac{1}{2} \Big[ 1 + \text{erf}\Big(\frac{\mu_C-x}{\sigma_C\sqrt{2}}\Big) \Big] ~,$ <p>where $\mu_G, \sigma_G$ are the mean and standard deviation parameters of the Gaussian distribution and $\mu_G, \sigma_G$ are the unknown parameters of the Gaussian CDF smooth cutoff on this dataset.</p> <p>Now our goal is to constrain the four unknown parameters of the above model using the maximum likelihood method. You can use the <a href="https://www.cdslab.org/paramonte/" target="_blank">ParaMonte library</a> in Python or in MATLAB to explore the resulting log-likelihood function. In such s case, make sure you start your MCMC exploration by a good set of initial parameter values, such that the MCMC sampler can correctly explore the parameter-space without getting lost. You can get help from <a href="https://www.cdslab.orghttps://www.cdslab.org/recipes/programming/regression-censored-gaussian-data/regression-censored-gaussian-data" target="_blank">another relevant problem here</a>.</p> <p><a href="https://www.cdslab.org/recipes/programming/regression-erf-censored-gaussian-data/regression-erf-censored-gaussian-data">Regression: Predicting the distribution of the a dataset subjected to a smooth censorship (sample incompleteness)</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 19, 2021.</p> <![CDATA[Puzzle: Matchstick Wrong Equation]]> https://www.cdslab.org/recipes/programming/puzzle-matchstick-equation/puzzle-matchstick-equation 2021-11-15T00:00:00-06:00 2021-10-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Move just one matchstick in the following equation to make it hold.</p> <figure> <img src="matchstick_equation.png" /> </figure> <p><a href="https://www.cdslab.org/recipes/programming/puzzle-matchstick-equation/puzzle-matchstick-equation">Puzzle: Matchstick Wrong Equation</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 15, 2021.</p> <![CDATA[Puzzle: How many living creatures are in the pond]]> https://www.cdslab.org/recipes/programming/puzzle-how-many-animals-in-pond/puzzle-how-many-animals-in-pond 2021-11-15T00:00:00-06:00 2021-10-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>How many living creatures can you identify in this figure? (<strong>Hint:</strong> There are two).</p> <figure> <img src="animals-in-pond.png" /> </figure> <p><a href="https://www.cdslab.org/recipes/programming/puzzle-how-many-animals-in-pond/puzzle-how-many-animals-in-pond">Puzzle: How many living creatures are in the pond</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 15, 2021.</p> <![CDATA[Regression: Predicting the global land temperature of Earth in 2050 from the past data: Choosing the best model]]> https://www.cdslab.org/recipes/programming/regression-predicting-future-global-land-temperature-excel/regression-predicting-future-global-land-temperature-excel 2021-11-11T00:00:00-06:00 2021-11-11T00:00:00-06:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Consider this dataset, <a href="1880_2020.csv" target="_blank">1880_2020.csv</a>, which contains the global land and ocean temperature anomalies of the earth from January 1880 to June 2020 at every month. As stated in the file, temperatures are in Degrees Celsius and reported as anomalies relative to the average global land temperature of the Earth between in the year 1950.</p> <ol> <li>First, use Microsoft Excel software to plot the temperature anomalies reported in this dataset. You can divide the <code>Year</code> column of data by $100$ to obtain real years.</li> <li>Fit a linear regression to temperature anomaly like the following illustration in Excel and obtain the linear regression equation. Then use the equation to predict the temperature of Earth in 2050. <figure> <img src="linear.png" /> </figure> </li> <li>Fit a quadratic (Polynomial of degree two) regression to temperature anomaly like the following illustration in Excel and obtain the quadratic regression equation. Then use the equation to predict the temperature of Earth in 2050. <figure> <img src="quadratic.png" /> </figure> </li> <li>Which one of the mathematical models that you have fir to data is a better representation of reality? Which one predicts more temperature increase in the near future?</li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/regression-predicting-future-global-land-temperature-excel/regression-predicting-future-global-land-temperature-excel">Regression: Predicting the global land temperature of Earth in 2050 from the past data: Choosing the best model</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 11, 2021.</p> <![CDATA[Regression: Estimating the parameters of a linear model for a Normally-distributed sample]]> https://www.cdslab.org/recipes/programming/regression-linear-gaussian/regression-linear-gaussian 2021-11-05T00:00:00-05:00 2021-11-05T00:00:00-05:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Supposed we have observed a dataset comprised of events with one attribute as in this file: <a href="z.csv" target="_blank">z.csv</a>. Plotting these points would yield a histogram like the following plot,</p> <figure> <img src="z.png" width="900" /> </figure> <p><br /></p> <p>Now our goal is to form a hypothesis about this dataset, that is, a hypothesis about the distribution of the events in the above plot. Just by looking at the observed distribution, we can form a relatively good hypothesis about the distribution of the data: This dataset is likely very well fit by a Normal distribution.</p> <p>Now, use the maximum likelihood method to infer the two unknown parameters of the Normal distribution that best fits the data.</p> <p><strong>Hint:</strong></p> <ol> <li>First read the data using Pandas library.</li> <li>Write a function/class that takes the data as input and has two methods, <code>getLogProb(data,avg,std)</code> and <code>getLogLike(param)</code>. The former computes the log-probability of observing the input dataset <code>data</code> given the parameters of the model (the Normal average <code>avg</code> and the Normal standard deviation <code>std</code>). The latter method takes a set of parameters as a vector containing the average of the Normal distribution (<code>avg</code>) and the natural-logarithm of the standard deviation of the Normal distribution <code>log(std)</code>. Given these two parameters, <code>getLogLike(param)</code> sums over the log-probabilities returned by <code>getLogProb(data,avg,std)</code> to compute the log-likelihood and returns it as the output.</li> <li>You can use <code>scipy.optimize.fmin</code> to perform the maximization of log-likelihood to obtain the best-fit parameters. Once done with the minimization (of negative log-likelihood), report the best-fit parameters on the display.</li> <li>Now, consider the following more complicated problem with this data <a href="xy.csv" target="_blank">xy.csv</a>. Visualizing this dataset gives us the following plot. <figure> <img src="xy.png" width="900" /> </figure> <p><br /> As you may have guessed, the only difference between this dataset and the previous one is that the random variable in this case depends on another deterministic variable $x$, most-likely, in the following manner, $y \sim \mathcal{N}(\mu = b \times x + a, \sigma)$. In other words, the data ($y$) still comes from a Normal distribution, but its mean depends on the corresponding value of $x$. Therefore, in this problem, we have three unknown parameters to optimize for: $(\sigma, a, b)$. Now, implement the maximum likelihood method for this problem by revising your answer to the previous problem in the above and make a plot of the best-fit line to the data, like the following.</p> <figure> <img src="xy-fit-linear.png" width="900" /> </figure> <p><br /></p> </li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/regression-linear-gaussian/regression-linear-gaussian">Regression: Estimating the parameters of a linear model for a Normally-distributed sample</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on November 05, 2021.</p> <![CDATA[Regression: Estimating the parameters of a Normally-distributed sample]]> https://www.cdslab.org/recipes/programming/regression-gaussian-data/regression-gaussian-data 2021-10-28T00:00:00-05:00 2021-10-28T00:00:00-05:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Supposed we have observed a dataset comprised of $15027$ events with one attribute <code>variable</code> in this file: <a href="dataFull.csv" target="_blank">dataFull.csv</a>. Plotting these points would yield a histogram like the following plot,</p> <figure> <img src="data.png" width="900" /> </figure> <p><br /></p> <p>Now our goal is to form a hypothesis about this dataset, that is, a hypothesis about the distribution of the events in the above plot.</p> <p>To help you get started, we can first take the logarithm of this dataset to better understand the distribution of the attribute of the dataset and plot the transformed data,</p> <figure> <img src="logdata.png" width="900" /> </figure> <p><br /></p> <p>Just by looking at the observed (red) distribution, we can form a relatively good hypothesis about the distribution of the data: This dataset is likely very well fit by a log-normal distribution, that is, the log-transform of data is very well fit by a Normal distribution.</p> <p>Now, use the maximum likelihood method to infer the two unknown parameters of the corresponding Normal distribution that best fits the log-transformed data.</p> <p><strong>Hint:</strong></p> <ol> <li>First read the data using Pandas library, then log-transform data to make it look like a Normal distribution.</li> <li> <p>Write a class that takes the log-data as input and has two methods, <code>getLogProb(data,avg,std)</code> and <code>getLogLike(param)</code>. The former computes the log-probability of observing the input dataset <code>data</code> given the parameters of the model (the Normal average <code>avg</code> and the Normal standard deviation <code>std</code>). The latter method takes a set of parameters as a vector containing the average of the Normal distribution (<code>avg</code>) and the natural-logarithm of the standard deviation of the Normal distribution <code>log(std)</code>. Given these two parameters, <code>getLogLike(param)</code> sums over the log-probabilities returned by <code>getLogProb(data,avg,std)</code> to compute the log-likelihood and returns it as the output.</p> </li> <li>You can use <code>scipy.optimize.fmin</code> to perform the maximization of log-likelihood to obtain the best-fit parameters. Once done with the minimization (of negative log-likelihood), report the best-fit parameters on the display.</li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/regression-gaussian-data/regression-gaussian-data">Regression: Estimating the parameters of a Normally-distributed sample</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 28, 2021.</p> <![CDATA[Computing the cross-correlation of sin() and cos()]]> https://www.cdslab.org/recipes/programming/stat-crosscorr-sin-cos/stat-crosscorr-sin-cos 2021-10-20T00:00:00-05:00 2019-07-04T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Generate two arrays corresponding to the values of $\sin(x)$ and $\cos(x+\pi/2)$ functions in the range $[0, 10\pi]$. Make a plot of the resulting arrays like the following illustration.</p> <p><img src="sin-cos.png" alt="sin-cos" /></p> <p>Now use an FFT package in the language of your choice to compute the cross-correlation between the two resulting arrays from $\sin()$ and $\cos()$. Plot the resulting cross-correlation to obtain an illustration like the following.</p> <p><img src="sin-cos-ccf.png" alt="sin-cos-ccf" /></p> <p>Explain the reason for the periodic behavior of the cross-correlation. Why does the periodic signal decay toward the tails?</p> <p>Now compute the autocorrelation of each of the arrays separately and plot the resulting autocorrelations to compare with the CCF between the two as computed in the above. Plot the resulting autocorrelations to obtain illustrations like the following,</p> <p><img src="sin-sin-acf.png" alt="sin-sin-acf" /></p> <p><img src="cos-cos-acf.png" alt="cos-cos-acf" /></p> <p>Explain why the autocorrelations of <code>sin()</code> and <code>cos()</code> are similar to each other while they look different from the cross-correlation in the above. What can you do to make the cross-correlation of <code>sin-cos</code> look like the autocorrelations of <code>sin-sin</code> and <code>cos-cos</code>?</p> <p><a href="https://www.cdslab.org/recipes/programming/stat-crosscorr-sin-cos/stat-crosscorr-sin-cos">Computing the cross-correlation of sin() and cos()</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 20, 2021.</p> <![CDATA[Computing the cross-correlation of two data attributes]]> https://www.cdslab.org/recipes/programming/stat-crosscorr/stat-crosscorr 2021-10-11T00:00:00-05:00 2019-07-04T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Consider <a href="annual-co2-emissions-per-country.csv" target="blank">this dataset of carbon emissions history per country</a>.</p> <ol> <li>Make a visualization of the <strong>global</strong> carbon emission data in the CSV file in the above by summing over the contributions of all countries per year to obtain an illustration like the following,<br /> <img src="globalEmissionsCO2.png" alt="Global CO2 Emissions" /></li> <li> <p>Now consider the contribution of individual countries in this <a href="annual-co-emissions-by-region.csv"><em>zero-filled</em> CSV dataset</a> and make a stacked plot of the countries over the years, like the following,<br /> <img src="regionEmissionsCO2.png" alt="Region CO2 Emissions" /><br /> To do so, you will have to extract the data for the following regions from the CSV file and <code>matplotlib</code> <code>stackplot</code> in Python or some other similar package or function in your language of choice.</p> </li> <li>Recall the <a href="globalLandTempHist.txt" target="blank">globalLandTempHist.txt</a> dataset that consists of the global land temperature of Earth over the past 300 years. Parse the contents of this file to generate the average annual temperature anomaly data. Then, extract the subset of CSV data from Step 2 in the above corresponding to the regional keyword <code>"World"</code> in the <code>Entity</code> column of data. Then, match the temperature anomaly data with the global CO2 emission data to generate a unified dataset. Then, write a function that computes the cross-correlation between the temperature anomaly and the global CO2 emissions. Use the definition of the correlation matrix that we have seen before to compute the cross-correlation.<br /> Now, use an external library in the language of your choice to compute the autocorrelation using Fast-Fourier Transform (FFT). Within Python, you can use <code>correlate</code> in SciPy package <code>from scipy.signal import correlate</code> to compute the autocorrelation. To do so, you will have to first normalize the input data (the anomaly data) to its mean. Then you pass the data in syntax like the following, <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">from</span> <span class="nn">scipy.signal</span> <span class="kn">import</span> <span class="n">correlate</span> <span class="n">anomalies</span> <span class="o">=</span> <span class="n">anomalies</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">anomalies</span><span class="p">)</span> <span class="n">emissions</span> <span class="o">=</span> <span class="n">emissions</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">emissions</span><span class="p">)</span> <span class="n">nlag</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">anomalies</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">nlag</span><span class="p">)</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">correlate</span> <span class="p">(</span> <span class="n">anomalies</span> <span class="p">,</span> <span class="n">emissions</span> <span class="p">,</span> <span class="n">mode</span> <span class="o">=</span> <span class="s">"full"</span> <span class="p">)[</span><span class="n">nlag</span><span class="p">:</span><span class="mi">2</span><span class="o">*</span><span class="n">nlag</span><span class="p">]</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">acf</span> <span class="o">/</span> <span class="n">acf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> </code></pre></div> </div> <p>Make a plot of this autocorrelation function (acf) and compare with what you have obtained from the slow version you have implemented.</p> </li> </ol> <p><a href="https://www.cdslab.org/recipes/programming/stat-crosscorr/stat-crosscorr">Computing the cross-correlation of two data attributes</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 11, 2021.</p> <![CDATA[Computing the autocorrelation of a dataset]]> https://www.cdslab.org/recipes/programming/stat-autocorr/stat-autocorr 2021-10-11T00:00:00-05:00 2019-07-04T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Recall the <a href="globalLandTempHist.txt" target="blank">globalLandTempHist.txt</a> dataset that consisted of the global land temperature of Earth over the past 300 years. Also recall that the autocorrelation of a time-series is defined as the correlation of a univariate dataset with itself, with some positive lag $\tau$.</p> <p>Use the definition of the correlation matrix that we have seen before to compute the autocorrelation of temperature anomaly of Earth starting from the first non-NAN value to the end, for all different possible lags. Make a plot of the autocorrelation vs. lag.</p> <p>Now, use an external library in the language of your choice to compute the autocorrelation using Fast-Fourier Transform (FFT). Within Python, you can use <code>correlate</code> in SciPy package <code>from scipy.signal import correlate</code> to compute the autocorrelation. To do so, you will have to first normalize the input data (the anomaly data) to its mean. Then you pass the data in syntax like the following,</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">from</span> <span class="nn">scipy.signal</span> <span class="kn">import</span> <span class="n">correlate</span> <span class="n">anomalies</span> <span class="o">=</span> <span class="n">anomalies</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">anomalies</span><span class="p">)</span> <span class="n">nlag</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">anomalies</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">nlag</span><span class="p">)</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">correlate</span> <span class="p">(</span> <span class="n">anomalies</span> <span class="p">,</span> <span class="n">anomalies</span> <span class="p">,</span> <span class="n">mode</span> <span class="o">=</span> <span class="s">"full"</span> <span class="p">)[</span><span class="n">nlag</span><span class="p">:</span><span class="mi">2</span><span class="o">*</span><span class="n">nlag</span><span class="p">]</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">acf</span> <span class="o">/</span> <span class="n">acf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> </code></pre></div></div> <p>Make a plot of this autocorrelation function (acf) and compare with what you have obtained from the slow version you have implemented. Here is an illustration of the average anomaly data per year and its autocorrelation function,<br /> <img src="globalTempAnomalies.png" alt="globalTempAnomalies.png" /> <img src="globalTempAnomaliesACF.png" alt="globalTempAnomaliesACF.png" /></p> <p><a href="https://www.cdslab.org/recipes/programming/stat-autocorr/stat-autocorr">Computing the autocorrelation of a dataset</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 11, 2021.</p> <![CDATA[Computing and removing the autocorrelation of a dataset]]> https://www.cdslab.org/recipes/programming/stat-autocorr-removal/stat-autocorr-removal 2021-10-11T00:00:00-05:00 2019-07-04T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Consider the following Banana function.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">getLogFuncBanana</span><span class="p">(</span><span class="n">point</span><span class="p">):</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">multivariate_normal</span> <span class="k">as</span> <span class="n">mvn</span> <span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">logsumexp</span> <span class="n">NPAR</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># sum(Banana,gaussian) normalization factor </span> <span class="n">normfac</span> <span class="o">=</span> <span class="mf">0.3</span> <span class="c1"># sum(Banana,gaussian) normalization factor </span> <span class="n">lognormfac</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">normfac</span><span class="p">)</span> <span class="c1"># sum(Banana,gaussian) normalization factor </span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="mf">0.7</span><span class="p">,</span> <span class="mf">1.5</span> <span class="c1"># parameters of the Banana function </span> <span class="n">MeanB</span> <span class="o">=</span> <span class="p">[</span> <span class="o">-</span><span class="mf">5.0</span> <span class="p">,</span> <span class="mf">0.</span> <span class="p">]</span> <span class="c1"># mean vector of Banana function </span> <span class="n">MeanG</span> <span class="o">=</span> <span class="p">[</span> <span class="mf">3.5</span> <span class="p">,</span> <span class="mf">0.</span> <span class="p">]</span> <span class="c1"># mean vector of Gaussian function </span> <span class="n">CovMatB</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">([</span><span class="mf">0.25</span><span class="p">,</span><span class="mf">0.</span><span class="p">,</span><span class="mf">0.</span><span class="p">,</span><span class="mf">0.81</span><span class="p">],</span> <span class="n">newshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">NPAR</span><span class="p">,</span><span class="n">NPAR</span><span class="p">))</span> <span class="c1"># Covariance matrix of Banana function </span> <span class="n">CovMatG</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">([</span><span class="mf">0.15</span><span class="p">,</span><span class="mf">0.</span><span class="p">,</span><span class="mf">0.</span><span class="p">,</span><span class="mf">0.15</span><span class="p">],</span> <span class="n">newshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">NPAR</span><span class="p">,</span><span class="n">NPAR</span><span class="p">))</span> <span class="c1"># Covariance matrix of Gaussian function </span> <span class="n">LogProb</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># transformed parameters that transform the Gaussian to the Banana function </span> <span class="n">pointSkewed</span> <span class="o">=</span> <span class="p">[</span> <span class="o">-</span><span class="n">point</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">+</span><span class="n">point</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">]</span> <span class="c1"># Gaussian function </span> <span class="n">LogProb</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">lognormfac</span> <span class="o">+</span> <span class="n">mvn</span><span class="p">.</span><span class="n">logpdf</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">pointSkewed</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">MeanG</span><span class="p">,</span> <span class="n">cov</span> <span class="o">=</span> <span class="n">CovMatG</span><span class="p">)</span> <span class="c1"># logProbBanana </span> <span class="c1"># Do variable transformations for the Skewed-Gaussian (banana) function. </span> <span class="n">pointSkewed</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">pointSkewed</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">a</span> <span class="n">pointSkewed</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">pointSkewed</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">a</span> <span class="o">-</span> <span class="p">(</span><span class="n">pointSkewed</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">a</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">b</span> <span class="c1"># Banana function </span> <span class="n">LogProb</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">mvn</span><span class="p">.</span><span class="n">logpdf</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">pointSkewed</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">MeanB</span><span class="p">,</span> <span class="n">cov</span> <span class="o">=</span> <span class="n">CovMatB</span><span class="p">)</span> <span class="c1"># logProbBanana </span> <span class="k">return</span> <span class="n">logsumexp</span><span class="p">(</span><span class="n">LogProb</span><span class="p">)</span> </code></pre></div></div> <p>We wish to generate random sample from the distribution function represented by this Python function. We do so via the ParaMonte library’s ParaDRAM MCMC sampler,</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">--</span><span class="n">upgrade</span> <span class="o">--</span><span class="n">user</span> <span class="n">paramonte</span> <span class="kn">import</span> <span class="nn">paramonte</span> <span class="k">as</span> <span class="n">pm</span> <span class="n">sim</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">paradram</span><span class="p">()</span> <span class="n">sim</span><span class="p">.</span><span class="n">spec</span><span class="p">.</span><span class="n">chainSize</span> <span class="o">=</span> <span class="mi">30000</span> <span class="n">sim</span><span class="p">.</span><span class="n">runSampler</span><span class="p">(</span> <span class="n">ndim</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">,</span> <span class="n">getLogFunc</span> <span class="o">=</span> <span class="n">getLogFuncBanana</span> <span class="p">)</span> </code></pre></div></div> <p>This sampler outputs an MCMC chain that we can subsequently visualize,</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">matplotlib</span> <span class="n">notebook</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="n">chain</span> <span class="o">=</span> <span class="n">sim</span><span class="p">.</span><span class="n">readChain</span><span class="p">(</span><span class="n">renabled</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="n">chain</span><span class="p">.</span><span class="n">df</span><span class="p">[</span><span class="s">"Banana Function Value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">chain</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">SampleLogFunc</span><span class="p">.</span><span class="n">values</span><span class="p">)</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">contour3</span><span class="p">()</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">contour3</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="n">fname</span> <span class="o">=</span> <span class="s">"bananaFuncContour3.png"</span><span class="p">)</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">scatter3</span><span class="p">.</span><span class="n">scatter</span><span class="p">.</span><span class="n">kws</span><span class="p">.</span><span class="n">s</span> <span class="o">=</span> <span class="mf">0.03</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">scatter3</span><span class="p">.</span><span class="n">scatter</span><span class="p">.</span><span class="n">kws</span><span class="p">.</span><span class="n">cmap</span> <span class="o">=</span> <span class="s">"winter"</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">scatter3</span><span class="p">(</span><span class="n">zcolumns</span> <span class="o">=</span> <span class="s">"Banana Function Value"</span><span class="p">,</span> <span class="n">ccolumns</span> <span class="o">=</span> <span class="s">"Banana Function Value"</span><span class="p">)</span> <span class="n">chain</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">scatter3</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="n">fname</span> <span class="o">=</span> <span class="s">"bananaFuncScatter3.png"</span><span class="p">)</span> </code></pre></div></div> <p><img src="bananaFuncContour3.png" alt="bananaFuncContour3.png" /></p> <p><img src="bananaFuncScatter3.png" alt="bananaFuncScatter3.png" /></p> <p>Now, this MCMC chain is a time-series data, meaning that we can compute its autocorrelation (for each data attribute). The ParaMonte library does this for us automatically which we can visualize via,</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">chain</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">autocorr</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">line</span><span class="p">()</span> <span class="n">chain</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">autocorr</span><span class="p">.</span><span class="n">plot</span><span class="p">.</span><span class="n">line</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="n">fname</span> <span class="o">=</span> <span class="s">"bananaCompactChainACF.png"</span><span class="p">)</span> </code></pre></div></div> <p><img src="bananaCompactChainACF.png" alt="bananaCompactChainACF.png" /></p> <p>Obviously, the three attributes of this chain are autocorrelated. But, we can remove traces of autocorrelation by choosing an appropriate step by which we jump over (skip) the data to <em>thin</em> (or <em>reduce</em> or <em>decorrelate</em> or <em>refine</em> the chain). Choose such an appropriate step size and refine the data in <code>chain.df</code> and then compute the autocorrelation of the refined data via <code>scipy.signal.correlate</code> function. Then, visualize it similar to the above illustration by the ParaMonte library to ensure the refinement process has truly removed the autocorrelation from your data.</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="kn">from</span> <span class="nn">scipy.signal</span> <span class="kn">import</span> <span class="n">correlate</span> <span class="n">attribute</span> <span class="o">=</span> <span class="n">attribute</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">attribute</span><span class="p">)</span> <span class="n">nlag</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">attribute</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">nlag</span><span class="p">)</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">correlate</span> <span class="p">(</span> <span class="n">attribute</span> <span class="p">,</span> <span class="n">attribute</span> <span class="p">,</span> <span class="n">mode</span> <span class="o">=</span> <span class="s">"full"</span> <span class="p">)[</span><span class="n">nlag</span><span class="p">:</span><span class="mi">2</span><span class="o">*</span><span class="n">nlag</span><span class="p">]</span> <span class="n">acf</span> <span class="o">=</span> <span class="n">acf</span> <span class="o">/</span> <span class="n">acf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> </code></pre></div></div> <p><a href="https://www.cdslab.org/recipes/programming/stat-autocorr-removal/stat-autocorr-removal">Computing and removing the autocorrelation of a dataset</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 11, 2021.</p> <![CDATA[Ugly visualization]]> https://www.cdslab.org/recipes/programming/vis-ugly-graph/vis-ugly-graph 2021-10-08T00:00:00-05:00 2021-10-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>What is ugly in the following graph?</p> <figure> <img src="ugly.png" /> </figure> <p><a href="https://www.cdslab.org/recipes/programming/vis-ugly-graph/vis-ugly-graph">Ugly visualization</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 08, 2021.</p> <![CDATA[The population growths of the US states]]> https://www.cdslab.org/recipes/programming/vis-population-growth-tx-la/vis-population-growth-tx-la 2021-10-08T00:00:00-05:00 2021-10-01T00:00:00-00:00 Amir Shahmoradi https://www.cdslab.org/recipes shahmoradi@utexas.edu <div style="text-align:center;margin-top:3rem;margin-bottom:2rem;"> <a href="#problem" style="display:inline-block;"> <h2 id="problem" style="color:red;"> Problem </h2> </a> </div> <p>Which color scale has been used in the following visualization?</p> <figure> <img src="statePopulationGrowth.png" /> </figure> <p><a href="https://www.cdslab.org/recipes/programming/vis-population-growth-tx-la/vis-population-growth-tx-la">The population growths of the US states</a> was originally published by Amir Shahmoradi at <a href="https://www.cdslab.org/recipes">CDSLab Recipes - A repository for all sorts of problems with solutions</a> on October 08, 2021.</p>