Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge

Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training proce...

Full description

Bibliographic Details
Main Authors: Yifan Gao, Lezhou Wu
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/13/1533
_version_ 1797528879246606336
author Yifan Gao
Lezhou Wu
author_facet Yifan Gao
Lezhou Wu
author_sort Yifan Gao
collection DOAJ
description Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with limited resources, defeating many AlphaZero-based programs. Our code, pretrained models, and self-play datasets are publicly available. The ultimate goal of this paper is to provide exploratory insights and mature auxiliary tools to enable AI researchers and computer-game communities to study, test, and improve these promising state-of-the-art methods at a much lower cost of computing resources.
first_indexed 2024-03-10T10:05:36Z
format Article
id doaj.art-7376380152294e7db3bcea8413ecddca
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T10:05:36Z
publishDate 2021-06-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-7376380152294e7db3bcea8413ecddca2023-11-22T01:35:46ZengMDPI AGElectronics2079-92922021-06-011013153310.3390/electronics10131533Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain KnowledgeYifan Gao0Lezhou Wu1College of Medicine and Biological Information Engineering, Northeastern University, Liaoning 110819, ChinaCollege of Information Science and Engineering, Northeastern University, Liaoning 110819, ChinaComputer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with limited resources, defeating many AlphaZero-based programs. Our code, pretrained models, and self-play datasets are publicly available. The ultimate goal of this paper is to provide exploratory insights and mature auxiliary tools to enable AI researchers and computer-game communities to study, test, and improve these promising state-of-the-art methods at a much lower cost of computing resources.https://www.mdpi.com/2079-9292/10/13/1533artificial intelligencedeep learningAlphaZeroNoGo gamesreinforcement learning
spellingShingle Yifan Gao
Lezhou Wu
Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
Electronics
artificial intelligence
deep learning
AlphaZero
NoGo games
reinforcement learning
title Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
title_full Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
title_fullStr Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
title_full_unstemmed Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
title_short Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge
title_sort efficiently mastering the game of nogo with deep reinforcement learning supported by domain knowledge
topic artificial intelligence
deep learning
AlphaZero
NoGo games
reinforcement learning
url https://www.mdpi.com/2079-9292/10/13/1533
work_keys_str_mv AT yifangao efficientlymasteringthegameofnogowithdeepreinforcementlearningsupportedbydomainknowledge
AT lezhouwu efficientlymasteringthegameofnogowithdeepreinforcementlearningsupportedbydomainknowledge