Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
Abstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the funda...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-019-3085-x |
_version_ | 1818928848394256384 |
---|---|
author | Satoshi Ito Masaaki Yadome Tatsuo Nishiki Shigeru Ishiduki Hikaru Inoue Rui Yamaguchi Satoru Miyano |
author_facet | Satoshi Ito Masaaki Yadome Tatsuo Nishiki Shigeru Ishiduki Hikaru Inoue Rui Yamaguchi Satoru Miyano |
author_sort | Satoshi Ito |
collection | DOAJ |
description | Abstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. Result We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. Conclusion We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis. |
first_indexed | 2024-12-20T03:35:26Z |
format | Article |
id | doaj.art-f7b921edb3e44669a8dc94b0fc3d96f1 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-20T03:35:26Z |
publishDate | 2019-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-f7b921edb3e44669a8dc94b0fc3d96f12022-12-21T19:54:54ZengBMCBMC Bioinformatics1471-21052019-12-0120S1611010.1186/s12859-019-3085-xVirtual Grid Engine: a simulated grid engine environment for large-scale supercomputersSatoshi Ito0Masaaki Yadome1Tatsuo Nishiki2Shigeru Ishiduki3Hikaru Inoue4Rui Yamaguchi5Satoru Miyano6The Institute of Medical Science, The University of TokyoThe Institute of Medical Science, The University of TokyoFrontier Computing Center, Fujitsu LimitedFrontier Computing Center, Fujitsu LimitedFrontier Computing Center, Fujitsu LimitedThe Institute of Medical Science, The University of TokyoThe Institute of Medical Science, The University of TokyoAbstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. Result We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. Conclusion We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis.https://doi.org/10.1186/s12859-019-3085-xHigh performance computingGrid engineTOP500MPIPython |
spellingShingle | Satoshi Ito Masaaki Yadome Tatsuo Nishiki Shigeru Ishiduki Hikaru Inoue Rui Yamaguchi Satoru Miyano Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers BMC Bioinformatics High performance computing Grid engine TOP500 MPI Python |
title | Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers |
title_full | Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers |
title_fullStr | Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers |
title_full_unstemmed | Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers |
title_short | Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers |
title_sort | virtual grid engine a simulated grid engine environment for large scale supercomputers |
topic | High performance computing Grid engine TOP500 MPI Python |
url | https://doi.org/10.1186/s12859-019-3085-x |
work_keys_str_mv | AT satoshiito virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT masaakiyadome virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT tatsuonishiki virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT shigeruishiduki virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT hikaruinoue virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT ruiyamaguchi virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers AT satorumiyano virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers |