Large-scale simulations of many of today’s interesting problems in plasma physics suffer from severe load balance issues, including plasma-based acceleration and magnetic reconnection with pair production due to quantum-electrodynamic effects. Contemporary particle-in-cell codes must be designed to efficiently utilize a new fleet of exascale computing architectures. We discuss the implementation of dynamic load balancing by dividing the simulation space into many small, self-contained regions or “tiles”, along with shared-memory (i.e., OpenMP) parallelism both over many tiles and within single tiles. The load balancing algorithm can be used with three different topologies, including two space-filling curves. We show low overhead and improved scalability with OpenMP thread number on simulations with uniform load and those with severe load imbalance. We demonstrate speedups on the order of 4 with only 2,000 processing elements, with greater speedups expected from larger simulations.
Author: Roman Lee