MDP and Reinforcement Learning for puzzles

This is a final year project in group of two, during my Bachelor Degree of Computer Science in the Chinese University of Hong Kong, researching serval Artificial intelligence algorithms.

In our project, we aimed to find out game strategy for a game called “1010!”. The first part of our project is to apply Markov Decision Process (MDP) with a smaller scale to find an optimal game strategy. The second part, we would apply reinforcement learning. After studying and making comparison among different reinforcement learning algorithms, like Monte-Carlo (MC), Temporal-Difference(TD) and neural network, we finally decided to apply Least Square Policy Iteration (LSPI) Algorithm into the game 1010! so as to enlarge the scale up to the original one.

If you would like to find out more out why we chose to apply LSPI, how it works as well as the performances and advantages or limitations, as well as more about the project, feel free to check out the document below.

markov-decision-processes-and-reinforcement-learning-for-puzzles

The LSPI algorithm was implemented in Python, and the game AI was demonstrated using Objective-C in Xcode. Here is the demo of the solution we got to win the game 1010!

MDP and Reinforcement Learning for puzzles

Share this: