Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers

Abstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the funda...

Full description

Bibliographic Details
Main Authors: Satoshi Ito, Masaaki Yadome, Tatsuo Nishiki, Shigeru Ishiduki, Hikaru Inoue, Rui Yamaguchi, Satoru Miyano
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3085-x
_version_ 1818928848394256384
author Satoshi Ito
Masaaki Yadome
Tatsuo Nishiki
Shigeru Ishiduki
Hikaru Inoue
Rui Yamaguchi
Satoru Miyano
author_facet Satoshi Ito
Masaaki Yadome
Tatsuo Nishiki
Shigeru Ishiduki
Hikaru Inoue
Rui Yamaguchi
Satoru Miyano
author_sort Satoshi Ito
collection DOAJ
description Abstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. Result We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. Conclusion We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis.
first_indexed 2024-12-20T03:35:26Z
format Article
id doaj.art-f7b921edb3e44669a8dc94b0fc3d96f1
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-20T03:35:26Z
publishDate 2019-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-f7b921edb3e44669a8dc94b0fc3d96f12022-12-21T19:54:54ZengBMCBMC Bioinformatics1471-21052019-12-0120S1611010.1186/s12859-019-3085-xVirtual Grid Engine: a simulated grid engine environment for large-scale supercomputersSatoshi Ito0Masaaki Yadome1Tatsuo Nishiki2Shigeru Ishiduki3Hikaru Inoue4Rui Yamaguchi5Satoru Miyano6The Institute of Medical Science, The University of TokyoThe Institute of Medical Science, The University of TokyoFrontier Computing Center, Fujitsu LimitedFrontier Computing Center, Fujitsu LimitedFrontier Computing Center, Fujitsu LimitedThe Institute of Medical Science, The University of TokyoThe Institute of Medical Science, The University of TokyoAbstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. Result We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. Conclusion We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis.https://doi.org/10.1186/s12859-019-3085-xHigh performance computingGrid engineTOP500MPIPython
spellingShingle Satoshi Ito
Masaaki Yadome
Tatsuo Nishiki
Shigeru Ishiduki
Hikaru Inoue
Rui Yamaguchi
Satoru Miyano
Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
BMC Bioinformatics
High performance computing
Grid engine
TOP500
MPI
Python
title Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
title_full Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
title_fullStr Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
title_full_unstemmed Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
title_short Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers
title_sort virtual grid engine a simulated grid engine environment for large scale supercomputers
topic High performance computing
Grid engine
TOP500
MPI
Python
url https://doi.org/10.1186/s12859-019-3085-x
work_keys_str_mv AT satoshiito virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT masaakiyadome virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT tatsuonishiki virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT shigeruishiduki virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT hikaruinoue virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT ruiyamaguchi virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers
AT satorumiyano virtualgridengineasimulatedgridengineenvironmentforlargescalesupercomputers