Use of Matlab's Parallel Computing Toolbox with COCO
Toolboxes for parameter continuation and bifurcation analysis.
Brought to you by:
fschild,
hdankowicz
Thanks for writing, Andy. Preliminary development of parallelized atlas algorithms for multidimensional solution manifolds by my research group has made use of the Matlab Parallel Computing Toolbox. For regular one-dimensional periodic orbit continuation of a large system, the code is already optimized through vectorization of the evaluation of the collocation conditions (provided that your vector field is vectorized) without any use of for loops. I expect that a multicore implementation of Matlab would distribute the load of such a vectorized operation across cores without further effort from you. The same applies to continuation that includes variational conditions, since these are fully coupled to the governing dynamics. For problems involving multisegment trajectories, the code relies on a for loop over the segments, but since their number is so small, parallelization would not really achieve any substantial saving. If you tell me more about your problem, I might be able to offer other advice, e.g., implementing your vector field in a compilable mex file.
Best,
Harry
I was using parfor in the past and don't expect much gain in using parfor, say in efunc.F/DFDX/DFDP, unless one is talking about hundreds or thousands of non-trivial embedded toolbox instances (i.e. collocation segments). The main computation time is typically not spent in the parts where COCO is using explicit loops.
On the other hand, it should be possible to utilize Matlab's multi-threading features, which seem to have improved a lot with Matlab 2016. Depending on the machine Matlab runs on and certain settings, a number of functions and operations are multi-threaded. For example, the function mldivide for solving systems of linear equations. A link is https://se.mathworks.com/help/matlab/math/systems-of-linear-equations.html?searchHighlight=multithreading&s_tid=doc_srchtitle#brs0ft8-1 .
I'm not sure that the function we are using for sparse systems (umfpack) is multi-threaded. So it could be that a multi-threaded solver is slower than the default solver, if the number of cores is small. Might be worth some experiments.
I should mention that, while working on another toolbox, I discovered a number of pathologies with Matlab that our implementation of COCO might be suffering from too. Matlab tends to spend a lot of time with memory management and the implementation of copy-on-write is not very efficient. Therefore, it is very well possible that substantial gains might not arise from parallelisation, but rather from problem-specific implementations of some core functions.
A useful tool for tracking down memory usage is to enable profiling memory usage http://undocumentedmatlab.com/blog/profiling-matlab-memory-usage . With this setting, time spent in the Matlab core is more accurately allocated to functions that cause the overhead - with a chance to reduce it. Note that profiling will be substantially slower with this setting enabled.