Summary: | Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mrow><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow></semantics></math></inline-formula> on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow></semantics></math></inline-formula> time complexity. We also present empirical results that indicate that our novel approach outperforms the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mrow><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> approach.
|