Quoridor is a deceptively simple board game — move your pawn to the far side, or drop a wall to slow your opponent down. The branching is small enough to feel approachable and large enough that hand-written heuristics fall apart fast. That made it a good excuse to build a proper environment for training agents instead of just one-off scripts.

Train, watch, compare

The studio is really three things stitched together: a training backend in Python/PyTorch, a fast game core, and a React interface for running matches and watching them play out move by move. The point was to shorten the loop between "try an idea" and "see whether it's any better than the last one."

MCTS plus a policy network

The agents pair Monte Carlo Tree Search with a CNN policy network — MCTS does the lookahead, the network biases the search toward moves worth exploring. Walls are what make this interesting: a single wall reshapes the entire board's shortest paths, so the network has to learn positional ideas, not just "run toward the goal."

A/B testing as a first-class feature

The thing I leaned on most was built-in A/B testing — pit two checkpoints against each other over a batch of games and get back a win rate instead of a vibe. It's easy to convince yourself a change helped when you're watching a handful of matches; making the comparison automatic kept me honest about what was actually an improvement.

This is a quick note rather than a full writeup — I'll do a proper one when I revisit the project.