Latency-Optimized Parallelization of the FMM Near-Field Computations

We present a new parallelization scheme for the FMM near-field. The parallelization is based on the Global Arrays Toolkit and uses one-sided communication with overlapping. It employs a purely static load-balancing approach to minimize the number of communication steps and benefits from a maximum utilization of data locality.