The ParaDRAM simulation parallelization tips | ParaMonte: Parallel Monte Carlo Library

General parallelization tips

Regardless of the programming language, always code your simulations for single-core serial runs first.
Once you test the functionality of your program and the accuracy of the results in serial mode, inspect the runtime of your simulations for potential speedup benefits via parallel computing.
The runtime of a single objective function call in your simulation problem must be significantly more than the average inter-process communication time to see any benefits from going parallel.
Tip: Given the available contemporary technologies, parallel communications typically require $1-1000\mu s$. This estimate depends heavily on the the CPU, network architecture, and the number of cores involved in the inter-process communications. Therefore, the cost of the objective function call must be significantly more than $1-1000\mu s$ to see any benefits from parallelization of the simulation.

ParaDRAM parallelization tips

By default, the ParaDRAM sampler utilizes a Fork-Join parallelism paradigm where multiple proposals are generated and inspected for acceptance at each MCMC iteration. This is in addition to and different from the perfect-parallelism paradigm whereby multiple MCMC chains are simulated simultaneously in parallel independently of each other. The choice of parallelism paradigm can be changed from the default Fork-Join via the input simulation specification parallelizationModel.

Running in parallel to check for MCMC convergence

Use the perfect parallelism paradigm (corresponding to the input specification parallelizationModel = "multi chain") if you worry the objective function might be multi-modal. Upon concluding the simulation, the sampler will automatically perform and report the results of a series of Kolmogorov-Smirnov (KS) tests for lack of convergence of the multiple chains to the same global mode (peak) of the objective function. The evidence for multi-modality will show itself as extremely small KS-test probabilities. Typically, KS-test probabilities larger than $\sim0.01$ indicate no evidence for a lack of convergence of the multiple parallel chains to the same global peak in the domain of the objective function. The smaller the probabilities are, the higher the evidence will be for a lack of convergence.

Running in parallel for simulation speedup

Use the fork-join parallelism paradigm (corresponding to the input specification parallelizationModel = "single chain") to reduce the simulation runtime. For this to happen, the runtime of a single objective function call must be significantly longer than inter-process communications. If you are unsure of the runtime cost of objective function call, you can run a short simulation with the ParaDRAM sampler to generate the desired information for the production decision making. Upon concluding each simulation, the ParaDRAM sampler generates a comprehensive analysis of the parallel performance of the simulation in the output report file of the simulation.

The output simulation performance analysis contains the predicted optimal number of processors for the parallel simulation under the given simulation setup and objective function. Simply run a short trial simulation with your best guess for the optimal number of processors. Then use the predictions in the simulation output to adjust the number of processors for the optimal efficiency in the production run.

Factors affecting the parallel runtime efficiency

The efficiency of a parallel ParaDRAM simulation depends heavily on two factors:

The time cost of the objective function call (as compared to the inter-process communication cost).
The efficiency of the MCMC sampler (i.e., the average acceptance rate). As a rule of thumb for maximum efficiency, the number of processors should be set to a number roughly equal or less than the inverse of the mean efficiency (i.e., the mean acceptance rate) of the MCMC sampler. This holds true only if the cost of a single objective function call is significantly more than the inter-process communication cost. In other words, this is the ideal scenario.
Tip: Although the optimal average MCMC acceptance rate is generally believed to be around $0.234$, lowering the efficiency to increase the number of parallel cores – and therefore, the parallelization efficiency – is often a reasonably good strategy. The average MCMC acceptance rate of a given ParaDRAM simulation can be controlled via the input simulation specification targetAcceptanceRate or more appropriately, adjusted by varying the value of scaleFactor property of the simulation.

If you have any questions about the topics discussed on this page, feel free to ask in the comment section below, or raise an issue on the GitHub page of the library, or reach out to the ParaMonte library authors.