基於AlphaZero General與MuZero General框架實現點格棋
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
點格棋(Dots and Boxes)是一款雙人、公正、零和與完全資訊的遊戲,儘管棋盤很小就有很高的複雜度。本論文以3×3盤面大小的點格棋作為課題,實現於AlphaGo Zero、MuZero架構上,並且還提出了適用於連續走步棋規的Exact-win策略實現於點格棋上,並運用於AlphaGo Zero的訓練與對弈上。在實作上,我們採用AlphaZero General與MuZero General兩個開源碼,分別是基於AlphaGo Zero與MuZero的論文實現。兩者皆是易於理解的Python開源專案,透過簡潔的架構幫助使用者輕鬆的能在AlphaGo Zero與MuZero的架構上實現遊戲並訓練,省去了從頭開始架構AlphaGo Zero與MuZero的工作,能更專注於相關研究。從實驗結果驗證,我們實現的AlphaZero General、Exact-win與MuZero General代理人,在與破解程式對手的對弈中,分別取得了98%、100%與32%的勝率。此外,還證明了Exact-win策略用於訓練階段能有效提升訓練速度與成效,以及訓練後期代理人棋力穩定度。透過一些盤面測試,證實了這些代理人在一些盤面上確實能搜索出最佳走步並且執行。
Dots and Boxes is a two-player, impartial, zero-sum and perfect information game. In this thesis, Dots and Boxes with 3×3 board size is taken as the subject, and is implemented on the AlphaGo Zero and MuZero frameworks. The Exact-win strategy for the consecutive moves rule is proposed and implemented on Dots and Boxes, and is also applied in training and playing of the AlphaGo Zero framework.In practice, we use two open source codes, AlphaZero General and MuZero General, which are based on the original papers of AlphaGo Zero and MuZero respectively. Through these codes, we can easily implement and train the gaming program based on the structure of AlphaGo Zero and MuZero. In addition, we can save the effort of building AlphaGo Zero and MuZero from scratch, and pay more attention in related research topics.The experimental results show that the AlphaZero General, Exact-win and MuZero General agents we implemented achieved winning rates of 98%, 100% and 32% respectively in games against the solver program opponents. In addition, it is also proved that the Exact-win strategy used in the training stage can effectively improve the training speed and effectiveness, as well as the stability of the agent's strength in the later stage of training. Through some board tests, it is confirmed that these agents can indeed find the best moves and execute them on those boards.
Dots and Boxes is a two-player, impartial, zero-sum and perfect information game. In this thesis, Dots and Boxes with 3×3 board size is taken as the subject, and is implemented on the AlphaGo Zero and MuZero frameworks. The Exact-win strategy for the consecutive moves rule is proposed and implemented on Dots and Boxes, and is also applied in training and playing of the AlphaGo Zero framework.In practice, we use two open source codes, AlphaZero General and MuZero General, which are based on the original papers of AlphaGo Zero and MuZero respectively. Through these codes, we can easily implement and train the gaming program based on the structure of AlphaGo Zero and MuZero. In addition, we can save the effort of building AlphaGo Zero and MuZero from scratch, and pay more attention in related research topics.The experimental results show that the AlphaZero General, Exact-win and MuZero General agents we implemented achieved winning rates of 98%, 100% and 32% respectively in games against the solver program opponents. In addition, it is also proved that the Exact-win strategy used in the training stage can effectively improve the training speed and effectiveness, as well as the stability of the agent's strength in the later stage of training. Through some board tests, it is confirmed that these agents can indeed find the best moves and execute them on those boards.
Description
Keywords
點格棋, Dots and Boxes, AlphaGo Zero, AlphaZero, MuZero