paramonte._paradram
¶
Module Contents¶
Classes¶
|
This is the ParaDRAM class to generate instances of serial and parallel |
-
paramonte._paradram.
newline
¶
-
class
paramonte._paradram.
ParaDRAM
[source]¶ Bases:
_ParaMonteSampler.ParaMonteSampler
This is the ParaDRAM class to generate instances of serial and parallel Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler class of the ParaMonte library. The
ParaDRAM
class is a child of theParaMonteSampler
class.All ParaDRAM class attributes are optional and all attributes can be set after a ParaDRAM instance is returned by the constructor.
Once you set the optional attributes to your desired values, call the ParaDRAM sampler via the object’s method
runSampler()
.Example serial usage
Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules):
1 2 3 4 5 6 7 8 9 10 11 12
################################## import numpy as np import paramonte as pm def getLogFunc(point): # return the log of a multivariate Normal # density function with ndim dimensions return -0.5 * np.dot(point, point) pmpd = pm.ParaDRAM() pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function , getLogFunc = getLogFunc # the objective function ) ##################################
where,
ndim
represents the number of dimensions of the domain of the user’s objective function
getLogFunc(point)
and,getLogFunc(point)
represents the user’s objective function to be sampled, which must take a single input argument
point
of type numpy-float64 array of lengthndim
and must return the natural logarithm of the objective function.Example parallel usage
Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
################################## with open("main_mpi.py", "w") as file: file.write (''' import numpy as np import paramonte as pm def getLogFunc(point): # return the log of the standard multivariate # Normal density function with ndim dimensions return -0.5 * np.dot(point, point) pmpd = pm.ParaDRAM() pmpd.mpiEnabled = True pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function , getLogFunc = getLogFunc # the objective function ) ''') ##################################
where,
ndim
represents the number of dimensions of the domain of the user’s objective function
getLogFunc(point)
and,getLogFunc(point)
represents the user’s objective function that is to be sampled. This function must take a single input argument
point
of type numpy-float64 array of length ndim and must return the natural logarithm of the objective function.mpiEnabled
is a logical (boolean) indicator that, if
True
, will cause the ParaDRAM simulation to run in parallel on the requested number of processors. The default value isFalse
.The above will generate a Parallel-ParaDRAM-simulation Python script in the current working directory of Python. Note the only difference between the serial and parallel simulation scripts: the extra line
pmpd.mpiEnabled = True
which forces the ParaMonte library to invoke the parallel sampler to run the simulation.Assuming that you already have an MPI runtime library installed on your system (see the Tips on parallel usage below), you can now execute this Python script file
main.py
in parallel in two ways:from inside ipython or jupyter, type the following,
!mpiexec -n 3 python main_mpi.py
outside of Python environment, from within a Bash shell (on Linux or Mac) or, from within an Anaconda command prompt on Windows, type the following,
mpiexec -n 3 python main_mpi.py
Note:
On Windows platform, if you are using the Intel MPI library, we recommend that you also specify the extra flag -localonly,
mpiexec -localonly -n 3 python main_mpi.py
This will cause the simulations to run in parallel only on a single node, but more importantly, it will also prevent the use of Hydra service and the requirement for its registration. If you are not on a Windows cluster, (e.g., you are using your personal device), then we highly recommend specifying this flag.
In all cases in the above, the script
main.py
will run on 3 processors. Feel free to change the number of processors to any number desired. But do not request more than the available number of physical cores on your system.Tips on parallel usage
For up-to-date detailed instructions on how to run simulations in parallel visit:
You can also use the following commands on the Python command-line,
1 2 3 4
################################## import paramonte as pm pm.verify() # verify the existence of parallel simulation prerequisites ##################################
to obtain specific information on how to run a parallel simulation, in particular, in relation to your current installation of ParaMonte. In general, for parallel simulations:
Ensure you need and will get a speedup by running the ParaDRAM sampler in parallel. Typically, if a single evaluation of the objective function takes much longer than a few milliseconds, your simulation may then benefit from the parallel run.
Ensure you have an MPI library installed, preferably, the Intel MPI runtime libraries. An MPI library should be automatically installed on your system with ParaMonte. If needed, you can download the Intel MPI library from their website and install it.
Ensure the ParaDRAM object property
mpiEnabled
isTrue
(the default isFalse
).Before running the parallel simulation, in particular, on Windows systems, you may need to define the necessary MPI environmental variables on your system. To get information on how to define the variables, use the paramonte module’s function,
verify()
, as described in the above.Call your main Python code from a Python-aware mpiexec-aware command-line via,
mpi_launcher -n num_process python name_of_yor_python_code.py
where,
“mpi_launcher” is the name of the MPI launcher of the MPI runtime library that you have installed. For example, the Intel MPI library’s launcher is named mpiexec, also recognized by Microsoft, MPICH, and OpenMPI. Note that on supercomputers, the MPI launcher is usually something other than
mpiexec
, for example:ibrun
,mpirun
, …“num_process” represents the number of cores on which you want to run the program. Replace this with the an integer number, like, 3 (meaning 3 cores).
Do not assign more processes than the available number of physical cores on your device/cluster. Assigning more cores than physically available on your system will only slow down your simulation.
Once the above script is saved in the file
main_mpi.py
, open a Python-aware and MPI-aware command prompt to run the simulation in parallel via the MPI launcher,mpiexec -n 3 python main_mpi.py
This will execute the Python script
main_mpi.py
on three processes (images). Keep in mind that on Windows systems you may need to define MPI environmental variables before a parallel simulation, as described in the above.ParaDRAM Class Attributes
See also:
All input specifications (attributes) of a ParaDRAM simulation are optional. However, it is recommended that you provide as much information as possible about the specific ParaDRAM simulation and the objective function to be sampled via ParaDRAM simulation specifications.
The ParaDRAM simulation specifications have lengthy comprehensive descriptions that appear in full in the output report file of every ParaDRAM simulation.
The best way to learn about individual ParaDRAM simulation attributes is to a run a minimal serial simulation with the following Python script,
1 2 3 4 5 6 7
################################## from paramonte import ParaDRAM pmpd = ParaDRAM() pmpd.spec.outputFileName = "./test" def getLogFunc(point): return -sum(point**2) pmpd.runSampler( ndim = 1, getLogFunc = getLogFunc ) ##################################
Running this code will generate a set of simulation output files (in the current working directory of Python) that begin with the prefix
test_process_1
. Among these, the filetest_process_1_report.txt
contains the full description of all input specifications of the ParaDRAM simulation as well as other information about the simulation results and statistics.Parameters
None. The simulation specifications can be set once an object is instantiated. All simulation specification descriptions are collectively available at:
Note that this is the new interface. The previous ParaDRAM class interface used to optionally take all simulation specifications as input. However, overtime, this approach has become more of liability than any potential benefit. All simulation specifications have to be now to be set solely after a ParaDRAM object is instantiated, instead of setting the specifications via the ParaDRAM class constructor.
Attributes
buildMode
optional string argument with the default value “release”. possible choices are:
“debug”
to be used for identifying sources of bug and causes of code crash.
“release”
to be used in all other normal scenarios for maximum runtime efficiency.
mpiEnabled
optional logical (boolean) indicator which is
False
by default. If it is set toTrue
, it will cause the ParaDRAM simulation to run in parallel on the requested number of processors. See the class documentation guidelines in the above for information on how to run a simulation in parallel.reportEnabled
optional logical (boolean) indicator which is
True
by default. If it is set toTrue
, it will cause extensive guidelines to be printed on the standard output as the simulation or post-processing continues with hints on the next possible steps that could be taken in the process. If you do not need such help and information set this variable toFalse
to silence all output messages.inputFile
optional string input representing the path to an external input namelist of simulation specifications.
WARNING
Use this optional argument only if you know the consequences. Specifying an input file will cause the ParaDRAM sampler to ignore all other simulation specifications set by the user via the sampler instance’s spec-component attributes.
spec
A frozen class containing all simulation specifications. All simulation attributes are by default set to appropriate values at runtime. To override the default simulation specifications, set the spec attributes to some desired values of your choice. For possible values, see:
If you need help on any of the simulation specifications, try the supplied
helpme()
function in this component, like,1 2 3 4 5 6
################################## import paramonte as pm pmpd = pm.ParaDRAM() # instantiate a ParaDRAM sampler class pmpd.spec.helpme() # get help on all simulation specification pmpd.spec.helpme("chainSize") # get help on "chainSize" specifically ##################################
Methods
See below for information on the methods.
Returns
Object of class ParaDRAM sampler.
-
runSampler
(self, ndim: int, getLogFunc: tp.Callable[[tp.List[float]], float], inputFile: tp.Optional[str] = None)[source]¶ Run ParaDRAM sampler and return nothing.
Parameters
ndim
An integer representing the number of dimensions of the domain of the user’s objective function
getLogFunc(point)
. It must be a positive integer.getLogFunc(point)
represents the user’s objective function to be sampled, which must take a single input argument
point
of type numpy-float64 array of lengthndim
and must return the natural logarithm of the objective function.inputFile (optional)
A string input representing the path to an external input namelist of simulation specifications.
WARNING
Use this optional argument with caution and only if you know what you are doing. Specifying this option will cause the sampler to ignore all other simulation specifications set by the user via the
spec
component of the sampler instance.Returns
None
-
readMarkovChain
(self, file: tp.Optional[str] = None, delimiter: tp.Optional[str] = None, parseContents: tp.Optional[bool] = True, renabled: tp.Optional[bool] = False)[source]¶ Return a list of the unweighted verbose (Markov-chain) contents of a set of ParaDRAM output chain files, whose names begin the user-provided input variable
file
. This method is to be only used for the postprocessing of the output chain file(s) of an already finished ParaDRAM simulation. It is not meant to be called by all processes in parallel mode, although it is possible.Parameters
file (optional)
A string representing the path to the chain file with the default value of
None
. The path only needs to uniquely identify the simulation to which the chain file belongs. For example, specifying"./mydir/mysim"
as input will lead to a search for a file that begins with"mysim"
and ends with"_chain.txt"
inside the directory"./mydir/"
. If there are multiple files with such name, then all of them will be read and returned as a list. If this input argument is not provided by the user, the value of the object attributeoutputFileName
will be used instead. At least one of the two mentioned routes must provide the path to the chain file otherwise, this method will break by callingsys.exit()
.delimiter (optional)
An input string representing the delimiter used in the output chain file. If it is not provided as input argument, the value of the corresponding object attribute
outputDelimiter
will be used instead. If none of the two are available, the default comma delimiter","
will be assumed and used.parseContents (optional)
If set to
True
, the contents of the file will be parsed and stored in a component of the object namedcontents
. The default value isTrue
.renabled (optional)
If set to False, the contents of the file(s) will be stored as a list in a (new) component of the ParaDRAM object named
markovChainList
andNone
will be the return value of the method. If set to True, the reverse will done. The default value isFalse
.Returns
A list of objects, each of which has the following properties:
file
The full absolute path to the chain file.
delimiter
The delimiter used in the chain file.
ndim
The number of dimensions of the domain of the objective function from which the chain has been drawn.
count
The number of unique (weighted) points in the chain file. This is essentially the number of rows in the chain file minus one (representing the header line).
plot
A structure containing the graphics tools for the visualization of the contents of the file.
df
The unweighted (Markovian) contents of the chain file in the form of a pandas-library DataFrame (hence called
df
).contents
corresponding to each column in the progress file, a property with the same name as the column header is also created for the object which contains the data stored in that column of the progress file. These properties are all stored in the attribute
contents
.If
renabled = True
, the list of objects will be returned as the return value of the method. Otherwise, the list will be stored in a component of the ParaDRAM object namedmarkovChainList
.