Consider the following web-page address https://cdslaborg.github.io/DataRepos_SwiftBat/index.html. This is a data table in HTML language containing data from the NASA Swift satellite. Each row in this table represents information about a Gamma-Ray Burst (GRB) detection that Swift has made in the past years.

Each event is labeled by an ID that appears in the first column of the table named GRB (Trig#). Write a code that (downloads) and reads this HTML page from the web, then extracts the IDs of GRB events from the first column (IDs are the numbers that appear in parentheses in the first column). Then, writes the extracted IDs to an external output file on your system with the name nasa.swift.grb.ids.txt.

Note that the IDs must be outputted as strings to preserve the preceding zeros. Each ID must appear on a separate line in the output file.

Solution

Python

Here is an example implementation of the script in Python,

import urllib.request as ur
myurl = "https://cdslaborg.github.io/DataRepos_SwiftBat/index.html"
with open("nasa.swift.grb.ids.txt", "w") as outfile:
    with ur.urlopen(myurl) as webfile:
        for line in webfile.readlines():
            line = line.decode("utf-8")
            if line[0:4] == "<TR>":
                ID = line.split(")")[0].split("(")[-1]
                if ID.isdigit():
                    outfile.write(ID+"\n")

Comments