Research and development of inefficiency patterns in MPI, UPC applications

Most of developed tools for analysis for various libraries (MPI, OpenMP) and languages for parallel programming use low level approaches to analyze the performance of parallel applications. There are a lot of profiling tools and trace visualizers which produce tables, graphs with various statistics...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính: M. S. Akopyan, N. E. Andreev
Định dạng: Bài viết
Ngôn ngữ:English
Được phát hành: Ivannikov Institute for System Programming of the Russian Academy of Sciences 2018-10-01
Loạt:Труды Института системного программирования РАН
Những chủ đề:
Truy cập trực tuyến:https://ispranproceedings.elpub.ru/jour/article/view/950
Miêu tả
Tóm tắt:Most of developed tools for analysis for various libraries (MPI, OpenMP) and languages for parallel programming use low level approaches to analyze the performance of parallel applications. There are a lot of profiling tools and trace visualizers which produce tables, graphs with various statistics of executed program. In most cases developer has to manually look for bottlenecks and opportunities for performance improvement in the produced statistics and graphs. The amount of information developer has to handle manually, increase dramatically with number of cores, number of processes and size of problem in application. Therefore new methods of performance analysis fully or partially handling output information will be more beneficial. To apply the same analysis tool to various parallel paradigm (MPI applications, UPC programs) paradigm-specific inefficiency patterns has been developed. In this paper code patterns resulting in performance penalties are discussed. Patterns of parallel MPI applications for parallel computing systems with distributed memory as well as for parallel UPC programs for systems with partial global address space (PGAS) are considered. A method for automatic detection of inefficiency patterns in parallel MPI applications and UPC programs is proposed. It allows to reduce the tuning time of parallel application.
số ISSN:2079-8156
2220-6426