General parallelization tips

1. Regardless of the programming language, always code your simulations for single-core serial runs first.
2. Once you test the functionality of your program and the accuracy of the results in serial mode, inspect the runtime of your simulations for potential speedup benefits via parallel computing.
3. The runtime of a single objective function call in your simulation problem must be significantly more than the average inter-process communication time to see any benefits from going parallel.

By default, the ParaDRAM sampler utilizes a Fork-Join parallelism paradigm where multiple proposals are generated and inspected for acceptance at each MCMC iteration. This is in addition to and different from the perfect-parallelism paradigm whereby multiple MCMC chains are simulated simultaneously in parallel independently of each other. The choice of parallelism paradigm can be changed from the default Fork-Join via the input simulation specification parallelizationModel.

Running in parallel to check for MCMC convergence

Use the perfect parallelism paradigm (corresponding to the input specification parallelizationModel = "multi chain") if you worry the objective function might be multi-modal. Upon concluding the simulation, the sampler will automatically perform and report the results of a series of Kolmogorov-Smirnov (KS) tests for lack of convergence of the multiple chains to the same global mode (peak) of the objective function. The evidence for multi-modality will show itself as extremely small KS-test probabilities. Typically, KS-test probabilities larger than $\sim0.01$ indicate no evidence for a lack of convergence of the multiple parallel chains to the same global peak in the domain of the objective function. The smaller the probabilities are, the higher the evidence will be for a lack of convergence.

Running in parallel for simulation speedup

Use the fork-join parallelism paradigm (corresponding to the input specification parallelizationModel = "single chain") to reduce the simulation runtime. For this to happen, the runtime of a single objective function call must be significantly longer than inter-process communications. If you are unsure of the runtime cost of objective function call, you can run a short simulation with the ParaDRAM sampler to generate the desired information for the production decision making. Upon concluding each simulation, the ParaDRAM sampler generates a comprehensive analysis of the parallel performance of the simulation in the output report file of the simulation.

The output simulation performance analysis contains the predicted optimal number of processors for the parallel simulation under the given simulation setup and objective function. Simply run a short trial simulation with your best guess for the optimal number of processors. Then use the predictions in the simulation output to adjust the number of processors for the optimal efficiency in the production run.

Factors affecting the parallel runtime efficiency

The efficiency of a parallel ParaDRAM simulation depends heavily on two factors:

1. The time cost of the objective function call (as compared to the inter-process communication cost).
2. The efficiency of the MCMC sampler (i.e., the average acceptance rate). As a rule of thumb for maximum efficiency, the number of processors should be set to a number roughly equal or less than the inverse of the mean efficiency (i.e., the mean acceptance rate) of the MCMC sampler. This holds true only if the cost of a single objective function call is significantly more than the inter-process communication cost. In other words, this is the ideal scenario.

If you have any questions about the topics discussed on this page, feel free to ask in the comment section below, or raise an issue on the GitHub page of the library, or reach out to the ParaMonte library authors.