Problem Part A

Consider the following web-page address https://cdslaborg.github.io/DataRepos_SwiftBat/index.html. This is an data table (in HTML language) containing data from the NASA Swift satellite. Each row in this table represents information about a Gamma-Ray Burst (GRB) detection that Swift has made in the past years. Now, corresponding to each of the event IDs, there might exist files that contain some attributes of these events which we wish to plot and understand their behavior against each other.

For example, for the first event in this table, GRB170406x (00745966), there is a data file which is hidden in a directory on https://cdslaborg.github.io/DataRepos_SwiftBat/ep_flu/GRB00745966_ep_flu.txt. Notice how the event ID (inside paratheses) is inserted into the web address. Now, for each event in the GRB event table, there might exist one such text file hidden on the web directory beginning with https://cdslaborg.github.io/DataRepos_SwiftBat/ep_flu/ (followed by the GRB’s ID) and .txt.

Our goal here is to fetch all these files from the website and save them locally on our computer. Then read their contents one by one and plot the two columns of data in all of these files together on a single plot.

Write a script named fetchDataFromWeb in the language of your own choice that uses this web address: https://raw.githubusercontent.com/cdslaborg/DataRepos_SwiftBat/master/triggers.txt to read a list of all GRB events and then writes the entire table of triggers.txt to a local file with the same name on your device.

Problem Part B

Now, add to your script another set of commands that uses the event IDs stored in this file, to generate the corresponding web addresses like,

https://cdslaborg.github.io/DataRepos_SwiftBat/ep_flu/GRB00745966_ep_flu.txt.

Then it uses the generated web address to read the content of the page and store it in a local file on your device with the same name as it is stored on the web page (for example, for the given web page, the filename would be GRB00745966_ep_flu.txt).

Python

Problem Part C

Now write another script named plotDatafromFile, that reads all of these files in your downloaded set of files one by one and plots the contents of all of them together on a single scatter plot like the following,


Problem Part D

Write a script that performs a linear fit to this data that you have plotted and report the best fit slope and intercept that you obtain.

Problem Part E

Use the ParaMonte or any other Monte Carlo sampling package that you want to also compute the uncertainties associated with the slope and intercept of the linear fit to the data.

Problem Part F

By now, you may have realized that each file, that you downloaded and visualized in the above, contains information for a single observed event, and that the rows in each file correspond to different possible realizations of the same observation. In other words, the rows represent together the uncertainty in the observation corresponding to the file. The above scatter plot that you have made illustrates all such possible realizations of all observations in a single visualization. While the result looks beautiful, sometimes it does not and you may need to reduce the data for a better illustration. For example, one common method is to represent each observation with the mean value of data and and add $1\sigma$ standard deviation error-bars to the mean values in the plots.

Now using the above suggested data-reduction method, make an error-bar plot of the data. For reference, here is an example errobar plot of the reduced data, showing only the mean and standard deviations.


Compare this graph with the previous one that you generated.

Problem Part G

Although the above graph looks simpler than the original plot containing all thousands of points, the error bars do not correctly represent the uncertainties. Why? Because the y-error is correlated with x-error in most observations. To see this, compare the original plot of all data points with the latter reduced-data plot. How can we fix this problem, while keeping the plot simple, using only reduced data? The remedy is to use 2D ellipses to represent the $1\sigma$ standard deviations of data on both attributes. Unlike simple error bars, ellipses can capture potential correlations among the two attributes.

Now, instead of representing the attribute error bars independently and individually in the plot, use a single ellipsoid to represent the $1\sigma$ uncertainty for each data point. For this, you will have to form the bivariate covariance matrix of the uncertainty for each observation. Then you pass this covariance matrix along with the mean to the this function to get a set of representative points on the boundary of the ellipse corresponding to the mean and covariance matrix. Then you plot these representative points a closed line on the current plot. You repeat this process for all observation to obtain the full illustration as depicted below.


Unlike the previous plots, this ellipsoidal illustration is revealing something new to us never seen before; that the bivariate uncertainty of observations at high epeak values are positively correlated whereas the observations in the low epeak part of the plot are negatively correlated.

To get the color-map seen in the plot, you can use this helper script to define the color for each ellipsoid drawing in the figure.

#### set up color mapping via correlations

cmap = Struct()
cmap.values = []
for i in range(0,stat.ndata): cmap.values.append(stat.cor[i][0,1])
cmap.values = np.array(cmap.values)
cmap.name = "coolwarm"

import matplotlib.colors as colors
import matplotlib.cm as cmx

cmapObject = plt.get_cmap(cmap.name)
#cNorm = colors.Normalize( vmin = np.min(cmap.values), vmax = np.max(cmap.values), clip = True )
cNorm = colors.Normalize( vmin=-1., vmax=1., clip = True )
mappable = cmx.ScalarMappable( norm = cNorm, cmap = cmapObject )
cmap._rgba = np.zeros( ( len(cmap.values) , 4) )
for i, rho in enumerate(cmap.values): cmap._rgba[i,:] = mappable.to_rgba(cmap.values[i])

Here in this code, stat is a data structure containing summary statistics of data, including mean (stat.avg), covariance (stat.cov) and correlation (stat.cor) matrices for each observation. Describe what each line of this helper script does in your answer.

Comments