Running resilient MPI applications on a Dynamic Group of Recommended Processes

Abstract High-performance computing systems run applications that can take several hours to execute and have to deal with the occurrence of a potentially large number of faults. Most of the existing fault-tolerant strategies for these systems assume crash faults that are permanent events are easily...

Full description

Bibliographic Details
Main Authors: Edson Tavares de Camargo, Elias P. Duarte
Format: Article
Language:English
Published: Sociedade Brasileira de Computação 2018-03-01
Series:Journal of the Brazilian Computer Society
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13173-018-0069-z