增強學習架構與期望最小最大算法之愛因斯坦棋實作比較
No Thumbnail Available
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
愛因斯坦棋英文原名EinStein würfelt nicht!,在2004年由德國一位數學教授Ingo Althöfer所發明。這是一個包含擲骰子的機率性遊戲,走步會由骰子點數而有很大的影響。也因此要如何從被動地受到骰子點數而影響走步,變為主動影響棋子的分佈來控制骰子的機率就是一大學問。
本研究主要是將此種遊戲中的兩種不同架構進行比較:其一為使用增強學習架構,利用自我對下產生棋譜之後對類神經網路進行訓練,以增加蒙地卡羅樹節點展開以及盤面分數評估的精準度;另一個則是使用傳統的期望最小最大算法(expectiminimax)並且搭配各種剪枝手段降低搜索時間,並且在節點最後以人類經驗法則賦值的方式給予盤面分數。在使用兩種不同實作的情況之下,以自動化平台的方式來進行對弈的實驗。同時,此研究也透過提升處理器頻率提升兩者單位時間內運算量,也減少了類神經網路產生棋譜所需要的時間。
此次使用的增強學習架構為使用開放原始碼的專案進行改良,使用直譯式的Python語言;而期望最小最大算法的實作則是從無到有自行開發的程式,使用的是Crystal語言,能夠編譯成原生的x86機器碼執行。如此兩相比較的實驗可以視為新舊技術的對抗,使用新工具來實作傳統演算法是不是還有奮力一搏的機會?期待未來設備升級之後能兩者都能有改進的機會。
EinStein würfelt nicht! was invented in 2004 by Ingo Althöfer, a German professor. In the game, a dice is used to decide which pieces can be moved. Therefore, the game strategy is all about probability. We should not be confined by the dice and we can focus on how the pieces' moves should be chosen. This research will compare with two kinds of implementations of EinStein würfelt nicht!. One is reinforcement learning framework, which uses self-playing to generate training data for improving the neural networks. The other is expectiminimax implementation using heuristic and preforms alpha-beta pruning and further pruning in chance nodes. We introduce a platform that can automatically do game plays and record the results. We also improve the performance of our programs by overclocking the processor. The reinforcement learning framework is based on an open-source project in Python. The expectiminimax implementation is developed in Crystal language and then codes are compiled to native machine codes for execution. By comparing the two implementations, we can know the pros and cons of each of them.
EinStein würfelt nicht! was invented in 2004 by Ingo Althöfer, a German professor. In the game, a dice is used to decide which pieces can be moved. Therefore, the game strategy is all about probability. We should not be confined by the dice and we can focus on how the pieces' moves should be chosen. This research will compare with two kinds of implementations of EinStein würfelt nicht!. One is reinforcement learning framework, which uses self-playing to generate training data for improving the neural networks. The other is expectiminimax implementation using heuristic and preforms alpha-beta pruning and further pruning in chance nodes. We introduce a platform that can automatically do game plays and record the results. We also improve the performance of our programs by overclocking the processor. The reinforcement learning framework is based on an open-source project in Python. The expectiminimax implementation is developed in Crystal language and then codes are compiled to native machine codes for execution. By comparing the two implementations, we can know the pros and cons of each of them.
Description
Keywords
愛因斯坦棋, 電腦對局, 增強學習, 超頻, computer games, EinStein würfelt nicht!, reinforcement learning, overclocking