Supposed we have observed a dataset comprised of events with two attributes $x$ and $y$ as in this file: data.xlsx.

  1. Plot this data in Microsoft Excel.
  2. Form a hypothesis about the relationship between $x$ and $y$.
  3. Use Excel’s Trendline toolbox to fit your hypothesized model to this data.
  4. Is is a good fit to data?
  5. Try at least one other hypothesis for this dataset and fit the corresponding model to the observed trend in data.
  6. Which hypothesis is a better fit to your data? The original or your alternative hypothesis?
  7. Use the Excel Trendline again to obtain the equation for the model that seems to be a better fit to data.
  8. Using this equation, compute the predicted $y$ values by the model for the corresponding $x$ values in the dataset.
  9. Subtract the model-predicted $y$ values from the actual $y$ values in the data set. We call this fitting residuals.
  10. Make a histogram of this fitting residual in Excel. Does the histogram of residuals look significantly asymmetric at all?
    (Hint: If you have chosen a good model for your data, then this histogram should look fairly symmetric.)