This homework aims at giving you some experience with Python I/O, error handling in your code, and testing you code for accuracy and robustness.





1.  Write a simple program named sum.py, that takes in an arbitrary-size list of input floats from the command-line, and prints out the sum of them on the terminal with the following message,

$ python sum.py 1 2 1 23
The sum of 1 2 1 23 is 27.0


Note that you will need to use the Python’s built-in function sum().





2.  Similar to the previous probelm, write a simple program named sum_via_eval.py, that takes in an arbitrary-size list of input numbers from the command-line, and prints out the sum of them on the terminal, this time using Python’s eval function. The program output should look like the following,

$ python sum.py 1 2 1 23
The sum of 1 2 1 23 is 27





3.  Consider this data file. It contains information about the amino acids in a protein called 1A2T. Each amino acid in protein is labeled by a single letter. There are 20 amin acid molecules in nature, and each has a total surface area (in units of Angstroms squared) that is given by the following table,

'A': 129.0
'R': 274.0
'N': 195.0
'D': 193.0
'C': 167.0
'Q': 225.0
'E': 223.0
'G': 104.0
'H': 224.0
'I': 197.0
'L': 201.0
'K': 236.0
'M': 224.0
'F': 240.0
'P': 159.0
'S': 155.0
'T': 172.0
'W': 285.0
'Y': 263.0
'V': 174.0

However, when these amino acids sit next to each other to form a chain protein, they cover parts of each other, such that only parts of their surfaces is exposed, while the rest is hidden from the outside world by other neighboring amino acids. Therefore, one would expect an amino acid that is at the core of a spherical protein would have almost zero exposed surface area.

Now given the above information, write a Python program that takes in two command-line input arguments, one of which is a string containing the path to the above input file 1A2T_A.dssp which contains the partially exposed surface areas of amino acids in protein 1A2T for each of its amino acids, and a second command-line argument which is the path to the file containing output of the code (e.g., it could be ./readDSSP.out). Then,

  1. the code reads the content of this file, and

  2. extracts the names of the amino acids in this protein from the data column inside the file which has the header AA (look at the line number 25 inside the input data file, below AA is the column containing the one-letter names of amino acids in this protein), and

  3. also extracts the partially exposed surface area information for each of these amino acids which appear in the column with header ACC, and

  4. then uses the above table of maximum surface area values to calculate the fractional exposed surface area of each amino acid in this protein (i.e., for each amino acid, fraction_of_exposed_surface = ACC / maximum_surface_area_from_table), and

  5. finally for each amino acid in this protein, it prints the one-letter name of the amino acid, its corresponding partially exposed surface area (ACC from the input file), and its corresponding fractional exposed surface area (name it RSA) to the output file given by the user on the command line.

  6. On the first column of the output file, the code should also write the name of the protein (which is basically the name of the input file 1A2T_A) on each line of the output file. Note that your code should extract the protein name from the input filename (by removing the file extension and other unnecessary information from the input command line string). Here is an example output of the code.

  7. Your code should also be able to handle an error resulting from less or more than 2 input command line arguments. That is, if the number of input arguments is 3 or 1, then it should input the following message on screen and stop.
$ ./readDSSP.py ./1A2T_A.dssp


Usage:
      ./readDSSP.py <input dssp file> <output summary file>

Program aborted.


or,

$ ./readDSSP.py ./1A2T_A.dssp ./readDSSP.out amir


Usage:
      ./readDSSP.py <input dssp file> <output summary file>

Program aborted.


To achieve the above goal, you will have to create a dictionary from the above table, with amino acid names as the keys, and the maximum surface areas as the corresponding values. Name your code readDSSP.py and submit it to your repository.

Write your code in such a way that it checks for the existence of the output file. If it already exists, then it does not remove the content of the file, whereas, it appends new data to the existing file. therwise, if the file does not exist, then it creates a new output file as requested by the user. To do so, you will need to use os.path.isfile function from module os.

ATTENTION: Note that in some rows instead of a one-letter amino acid name, there is !. In such cases, your code should be able to detect the abnormality and skip that row, because that row does not contain amino acid information.





4.  Consider the simplest program for evaluating the formula $y(t) = v_0t-\frac{1}{2}gt^2$,

v0 = 3; g = 9.81; t = 0.6
y = v0*t - 0.5*g*t**2
print(y)


(A) Write a program that takes in the above necessary input data ($t$,$v_0$) as command line arguments.

(B) Extend your program from part (A) with exception handling such that missing command-line arguments are detected. For example, if the user has entered enough input arguments, then the code should raise IndexError exception. In the except IndexError block, the code should use the input function to ask the user for the missing input data.

(C) Add another exception handling block that tests if the $t$ value read from the command line, lies between $0$ and $2v_0/g$. If not, then it raises a ValueError exception in the if block on the legal values of $t$, and notifes the user about the legal interval for $t$ in the exception message.

Here are some example runs of the code,

$ ./projectile.py
Both v0 and t must be supplied on the command line
v0 = ?
5
t = ?
4
Traceback (most recent call last):
  File "./projectile.py", line 17, in <module>
    'must be between 0 and 2v0/g = {}'.format(t,2.0*v0/g))
ValueError: t = 4.0 is a non-physical value.
must be between 0 and 2v0/g = 1.019367991845056


$ ./projectile.py
Both v0 and t must be supplied on the command line
v0 = ?
5
t = ?
0.5
y = 1.27375


$ ./projectile.py 5 0.4
y = 1.2151999999999998


$ ./projectile.py 5 0.4 3
y = 1.2151999999999998






5.  Consider the function Newton that we discussed in lecture 8,

def Newton(f, dfdx, x, eps=1E-7, maxit=100):
    if not callable(f): raise TypeError( 'f is %s, should be function or class with __call__' % type(f) )
    if not callable(dfdx): raise TypeError( 'dfdx is %s, should be function or class with __call__' % type(dfdx) )
    if not isinstance(maxit, int): raise TypeError( 'maxit is %s, must be int' % type(maxit) )
    if maxit <= 0: raise ValueError( 'maxit=%d <= 0, must be > 0' % maxit )
    n = 0 # iteration counter
    while abs(f(x)) > eps and n < maxit:
        try:
            x = x - f(x)/float(dfdx(x))
        except ZeroDivisionError:
            raise ZeroDivisionError( 'dfdx(%g)=%g - cannot divide by zero' % (x, dfdx(x)) )
        n += 1
    return x, f(x), n


This function is supposed to be able to handle exceptions such as divergent iterations (which we discussed in the lecture), and division-by-zero. The latter error happens when dfdx(x)=0 in the above code. Write a test code that ensures the above code is able to correctly identify a division-by-zero exception and raise the correct assertionError.
(Hint: To do so, you need to consider a test mathematical function as input to Newton. One example could be $f(x)=\cos(x)$ with a starting search value $x=0$. This would result in derivative value $f’(x=0)=-\sin(x=0)=0$, which should lead to a ZeroDivisionError exception. Now, write a test function test_Newton_div_by_zero that can explicitly handle this exception by introducing a boolean variable success that is True if the exception is raised and otherwise False.)



Comments