paramonte._paradram

Module Contents

Classes

ParaDRAM()

This is the ParaDRAM class to generate instances of serial and parallel

paramonte._paradram.newline
class paramonte._paradram.ParaDRAM[source]

Bases: _ParaMonteSampler.ParaMonteSampler

This is the ParaDRAM class to generate instances of serial and parallel Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler class of the ParaMonte library. The ParaDRAM class is a child of the ParaMonteSampler class.

All ParaDRAM class attributes are optional and all attributes can be set after a ParaDRAM instance is returned by the constructor.

Once you set the optional attributes to your desired values, call the ParaDRAM sampler via the object’s method runSampler().

Example serial usage

Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
##################################
import numpy as np
import paramonte as pm
def getLogFunc(point):
    # return the log of a multivariate Normal
    # density function with ndim dimensions
    return -0.5 * np.dot(point, point)
pmpd = pm.ParaDRAM()
pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function
                , getLogFunc = getLogFunc   # the objective function
                )
##################################

where,

ndim

represents the number of dimensions of the domain of the user’s objective function getLogFunc(point) and,

getLogFunc(point)

represents the user’s objective function to be sampled, which must take a single input argument point of type numpy-float64 array of length ndim and must return the natural logarithm of the objective function.

Example parallel usage

Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
##################################
with open("main_mpi.py", "w") as file:
    file.write  ('''
import numpy as np
import paramonte as pm
def getLogFunc(point):
    # return the log of the standard multivariate
    # Normal density function with ndim dimensions
    return -0.5 * np.dot(point, point)
pmpd = pm.ParaDRAM()
pmpd.mpiEnabled = True
pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function
                , getLogFunc = getLogFunc   # the objective function
                )
''')
##################################

where,

ndim

represents the number of dimensions of the domain of the user’s objective function getLogFunc(point) and,

getLogFunc(point)

represents the user’s objective function that is to be sampled. This function must take a single input argument point of type numpy-float64 array of length ndim and must return the natural logarithm of the objective function.

mpiEnabled

is a logical (boolean) indicator that, if True, will cause the ParaDRAM simulation to run in parallel on the requested number of processors. The default value is False.

The above will generate a Parallel-ParaDRAM-simulation Python script in the current working directory of Python. Note the only difference between the serial and parallel simulation scripts: the extra line pmpd.mpiEnabled = True which forces the ParaMonte library to invoke the parallel sampler to run the simulation.

Assuming that you already have an MPI runtime library installed on your system (see the Tips on parallel usage below), you can now execute this Python script file main.py in parallel in two ways:

  1. from inside ipython or jupyter, type the following,

    !mpiexec -n 3 python main_mpi.py
    
  2. outside of Python environment, from within a Bash shell (on Linux or Mac) or, from within an Anaconda command prompt on Windows, type the following,

    mpiexec -n 3 python main_mpi.py
    

Note:

On Windows platform, if you are using the Intel MPI library, we recommend that you also specify the extra flag -localonly,

mpiexec -localonly -n 3 python main_mpi.py

This will cause the simulations to run in parallel only on a single node, but more importantly, it will also prevent the use of Hydra service and the requirement for its registration. If you are not on a Windows cluster, (e.g., you are using your personal device), then we highly recommend specifying this flag.

In all cases in the above, the script main.py will run on 3 processors. Feel free to change the number of processors to any number desired. But do not request more than the available number of physical cores on your system.

Tips on parallel usage

For up-to-date detailed instructions on how to run simulations in parallel visit:

You can also use the following commands on the Python command-line,

1
2
3
4
##################################
import paramonte as pm
pm.verify() # verify the existence of parallel simulation prerequisites
##################################

to obtain specific information on how to run a parallel simulation, in particular, in relation to your current installation of ParaMonte. In general, for parallel simulations:

  1. Ensure you need and will get a speedup by running the ParaDRAM sampler in parallel. Typically, if a single evaluation of the objective function takes much longer than a few milliseconds, your simulation may then benefit from the parallel run.

  2. Ensure you have an MPI library installed, preferably, the Intel MPI runtime libraries. An MPI library should be automatically installed on your system with ParaMonte. If needed, you can download the Intel MPI library from their website and install it.

  3. Ensure the ParaDRAM object property mpiEnabled is True (the default is False).

  4. Before running the parallel simulation, in particular, on Windows systems, you may need to define the necessary MPI environmental variables on your system. To get information on how to define the variables, use the paramonte module’s function, verify(), as described in the above.

  5. Call your main Python code from a Python-aware mpiexec-aware command-line via,

    mpi_launcher -n num_process python name_of_yor_python_code.py
    

    where,

    1. “mpi_launcher” is the name of the MPI launcher of the MPI runtime library that you have installed. For example, the Intel MPI library’s launcher is named mpiexec, also recognized by Microsoft, MPICH, and OpenMPI. Note that on supercomputers, the MPI launcher is usually something other than mpiexec, for example: ibrun, mpirun, …

    2. “num_process” represents the number of cores on which you want to run the program. Replace this with the an integer number, like, 3 (meaning 3 cores).

      Do not assign more processes than the available number of physical cores on your device/cluster. Assigning more cores than physically available on your system will only slow down your simulation.

Once the above script is saved in the file main_mpi.py, open a Python-aware and MPI-aware command prompt to run the simulation in parallel via the MPI launcher,

mpiexec -n 3 python main_mpi.py

This will execute the Python script main_mpi.py on three processes (images). Keep in mind that on Windows systems you may need to define MPI environmental variables before a parallel simulation, as described in the above.

ParaDRAM Class Attributes

See also:

All input specifications (attributes) of a ParaDRAM simulation are optional. However, it is recommended that you provide as much information as possible about the specific ParaDRAM simulation and the objective function to be sampled via ParaDRAM simulation specifications.

The ParaDRAM simulation specifications have lengthy comprehensive descriptions that appear in full in the output report file of every ParaDRAM simulation.

The best way to learn about individual ParaDRAM simulation attributes is to a run a minimal serial simulation with the following Python script,

1
2
3
4
5
6
7
##################################
from paramonte import ParaDRAM
pmpd = ParaDRAM()
pmpd.spec.outputFileName = "./test"
def getLogFunc(point): return -sum(point**2)
pmpd.runSampler( ndim = 1, getLogFunc = getLogFunc )
##################################

Running this code will generate a set of simulation output files (in the current working directory of Python) that begin with the prefix test_process_1. Among these, the file test_process_1_report.txt contains the full description of all input specifications of the ParaDRAM simulation as well as other information about the simulation results and statistics.

Parameters

None. The simulation specifications can be set once an object is instantiated. All simulation specification descriptions are collectively available at:

Note that this is the new interface. The previous ParaDRAM class interface used to optionally take all simulation specifications as input. However, overtime, this approach has become more of liability than any potential benefit. All simulation specifications have to be now to be set solely after a ParaDRAM object is instantiated, instead of setting the specifications via the ParaDRAM class constructor.

Attributes

buildMode

optional string argument with the default value “release”. possible choices are:

“debug”

to be used for identifying sources of bug and causes of code crash.

“release”

to be used in all other normal scenarios for maximum runtime efficiency.

mpiEnabled

optional logical (boolean) indicator which is False by default. If it is set to True, it will cause the ParaDRAM simulation to run in parallel on the requested number of processors. See the class documentation guidelines in the above for information on how to run a simulation in parallel.

reportEnabled

optional logical (boolean) indicator which is True by default. If it is set to True, it will cause extensive guidelines to be printed on the standard output as the simulation or post-processing continues with hints on the next possible steps that could be taken in the process. If you do not need such help and information set this variable to False to silence all output messages.

inputFile

optional string input representing the path to an external input namelist of simulation specifications.

WARNING

Use this optional argument only if you know the consequences. Specifying an input file will cause the ParaDRAM sampler to ignore all other simulation specifications set by the user via the sampler instance’s spec-component attributes.

spec

A frozen class containing all simulation specifications. All simulation attributes are by default set to appropriate values at runtime. To override the default simulation specifications, set the spec attributes to some desired values of your choice. For possible values, see:

If you need help on any of the simulation specifications, try the supplied helpme() function in this component, like,

1
2
3
4
5
6
##################################
import paramonte as pm
pmpd = pm.ParaDRAM()          # instantiate a ParaDRAM sampler class
pmpd.spec.helpme()            # get help on all simulation specification
pmpd.spec.helpme("chainSize") # get help on "chainSize" specifically
##################################

Methods

See below for information on the methods.

Returns

Object of class ParaDRAM sampler.

runSampler(self, ndim: int, getLogFunc: tp.Callable[[tp.List[float]], float], inputFile: tp.Optional[str] = None)[source]

Run ParaDRAM sampler and return nothing.

Parameters

ndim

An integer representing the number of dimensions of the domain of the user’s objective function getLogFunc(point). It must be a positive integer.

getLogFunc(point)

represents the user’s objective function to be sampled, which must take a single input argument point of type numpy-float64 array of length ndim and must return the natural logarithm of the objective function.

inputFile (optional)

A string input representing the path to an external input namelist of simulation specifications.

WARNING

Use this optional argument with caution and only if you know what you are doing. Specifying this option will cause the sampler to ignore all other simulation specifications set by the user via the spec component of the sampler instance.

Returns

None

readMarkovChain(self, file: tp.Optional[str] = None, delimiter: tp.Optional[str] = None, parseContents: tp.Optional[bool] = True, renabled: tp.Optional[bool] = False)[source]

Return a list of the unweighted verbose (Markov-chain) contents of a set of ParaDRAM output chain files, whose names begin the user-provided input variable file. This method is to be only used for the postprocessing of the output chain file(s) of an already finished ParaDRAM simulation. It is not meant to be called by all processes in parallel mode, although it is possible.

Parameters

file (optional)

A string representing the path to the chain file with the default value of None. The path only needs to uniquely identify the simulation to which the chain file belongs. For example, specifying "./mydir/mysim" as input will lead to a search for a file that begins with "mysim" and ends with "_chain.txt" inside the directory "./mydir/". If there are multiple files with such name, then all of them will be read and returned as a list. If this input argument is not provided by the user, the value of the object attribute outputFileName will be used instead. At least one of the two mentioned routes must provide the path to the chain file otherwise, this method will break by calling sys.exit().

delimiter (optional)

An input string representing the delimiter used in the output chain file. If it is not provided as input argument, the value of the corresponding object attribute outputDelimiter will be used instead. If none of the two are available, the default comma delimiter "," will be assumed and used.

parseContents (optional)

If set to True, the contents of the file will be parsed and stored in a component of the object named contents. The default value is True.

renabled (optional)

If set to False, the contents of the file(s) will be stored as a list in a (new) component of the ParaDRAM object named markovChainList and None will be the return value of the method. If set to True, the reverse will done. The default value is False.

Returns

A list of objects, each of which has the following properties:

file

The full absolute path to the chain file.

delimiter

The delimiter used in the chain file.

ndim

The number of dimensions of the domain of the objective function from which the chain has been drawn.

count

The number of unique (weighted) points in the chain file. This is essentially the number of rows in the chain file minus one (representing the header line).

plot

A structure containing the graphics tools for the visualization of the contents of the file.

df

The unweighted (Markovian) contents of the chain file in the form of a pandas-library DataFrame (hence called df).

contents

corresponding to each column in the progress file, a property with the same name as the column header is also created for the object which contains the data stored in that column of the progress file. These properties are all stored in the attribute contents.

If renabled = True, the list of objects will be returned as the return value of the method. Otherwise, the list will be stored in a component of the ParaDRAM object named markovChainList.