Execution model
When Bramble is compiled with the CUDA module, one can use GPU acceleration to speed up the execution. This is especially beneficial when performing a similarity analysis. Bramble supports multi-GPU setups, so one can use multiple GPUs if more than one GPU is available.
Warning
Bramble requires a GPU with at least 8Gb of memory. Bramble
will check whether the GPU supports the calculation prior to execution and throws
an error when the GPU is not supported. You can also check the memory available
on your GPU by running bramblecuda
.
When performing the similarity analysis, an inventory of all the jobs is made.
N+1
OpenMP threads are being spawned where N
equals the number of GPUs.
Each GPU gets assigned a CPU thread and jobs are relayed to the GPU via the CPU
thread. The remaining OpenMP thread employs so-called nested parallellism and
executes another OpenMP parallel environment which uses all CPUs.
Obviously, this implies that the N
CPU threads which are involved in
managing the GPUs are also used for other parts of the calculation. Since the
computational load of managing the GPUs is however relatively minimal, this does
come at a huge impact on performance. In fact, not using these CPUs is worse
than partially also using them to manage the GPUs.
When no GPUs are available, Bramble uses no nested parallelism and simply executes all jobs concurrently wherein each job uses OpenMP parallelism on a per-job basis.
Memory load
Performing calculations is quite memory expensive and as a rule of thumb, one needs roughly 8GB of memory per execution thread. For example, if one uses two GPUs, one needs roughly 24GB of memory. If memory is limited, one option is to use swapping, however this comes at a great cost on performance. Nevertheless, it might still be beneficial.
Assuming the user has root privileges, one can use the following instructions to increase the amount of swap memory:
sudo mkswap /swapfile
sudo chmod 600 /swapfile
sudo swapon /swapfile
sudo swapon --show
Typical output would yield:
NAME TYPE SIZE USED PRIO
/dev/sdb3 partition 976M 976M -2
/swapfile file 8G 196.3M -3