GitHub - radhapawar/Connect4-AI-Engine: Deep Learning Connect 4 Agent: CNN vs Transformer, MCTS self-play, deployed on AWS

 ██████╗ ██████╗ ███╗  ██╗███╗  ██╗███████╗ ██████╗████████╗    ██╗  ██╗     █████╗ ██╗
██╔════╝██╔═══██╗████╗ ██║████╗ ██║██╔════╝██╔════╝╚══██╔══╝    ██║  ██║    ██╔══██╗██║
██║     ██║   ██║██╔██╗██║██╔██╗██║█████╗  ██║        ██║       ███████║    ███████║██║
██║     ██║   ██║██║╚████║██║╚████║██╔══╝  ██║        ██║       ╚════██║    ██╔══██║██║
╚██████╗╚██████╔╝██║ ╚███║██║ ╚███║███████╗╚██████╗   ██║            ██║    ██║  ██║██║
 ╚═════╝ ╚═════╝ ╚═╝  ╚══╝╚═╝  ╚══╝╚══════╝ ╚═════╝   ╚═╝           ╚═╝    ╚═╝  ╚═╝╚═╝

Deep Learning Game Agent · MCTS Self-Play · CNN vs Transformer · Production on AWS

Two neural networks trained on 400,000+ board positions learn to think like grandmasters — one scans local patterns, one reads the whole board. Both deployed live on AWS, serving real-time predictions in under 50ms.

🎮 Play the Bot · 📓 Training Notebook · 🔧 Backend · 📊 Results

🧠 What Is This?

This project implements the MCTS self-play → supervised distillation → neural network inference pipeline — the same paradigm behind DeepMind's AlphaZero — applied to Connect 4 end-to-end.

Instead of searching the game tree at inference time (slow), we train a neural network to instantly replicate the decisions of a strong Monte Carlo Tree Search player. The result: a deep learning agent that plays at a superhuman level with a single forward pass (< 50ms).

Two architectures are built, trained, and rigorously compared:

	CNN	Vision Transformer
Approach	Scans local 3×3 regions for patterns	Reads all 42 cells simultaneously via attention
Analogy	Detects threats the way a player spots lines	Sees the board holistically like a strategist
Strength	Fast convergence, precise local tactics	Superior multi-step planning against tactical play

Both are live — pick your opponent at attentive-klutzy-jacket.anvil.app.

🎮 The Game

      col:  0   1   2   3   4   5   6
           ┌───┬───┬───┬───┬───┬───┬───┐
    row 0  │   │   │   │   │   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 1  │   │   │   │   │   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 2  │   │   │   │ 🔴│   │   │   │  ← Transformer: "I see a diagonal threat
           ├───┼───┼───┼───┼───┼───┼───┤              building via global attention"
    row 3  │   │   │ 🔴│ 🔴│   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 4  │   │ 🔴│   │ 🔴│   │   │   │  ← CNN: "I see three 3×3 threat patterns
           ├───┼───┼───┼───┼───┼───┼───┤         and recommend blocking col 1"
    row 5  │   │ 🔴│ 🔴│ 🔴│   │   │   │
           └───┴───┴───┴───┴───┴───┴───┘

    Encoded as float32 tensor of shape (6, 7, 2):
    Channel 0 → player (+1) positions    Channel 1 → opponent (-1) positions

📊 Results

Verified Performance Metrics

Metric	🔵 CNN	🟣 Transformer
Validation Accuracy	63.0%	60.3%
Parameters	553,353	553,479
Win Rate vs Random Bot	97.0%	97.8%
Win Rate vs Tactical Bot	53.2%	62.0%
Training Epochs	13 (early stopped)	60
Model Size	4.4 MB	824 KB

All win rates from 500-game evaluations per opponent type, alternating starting player.

Win Rate Comparison (vs Tactical Opponent)

                      0%       25%       50%       75%      100%
                      ├─────────┼─────────┼─────────┼─────────┤
  🔵 CNN    53.2%     ██████████████████████░░░░░░░░░░░░░░░░░░░
  🟣 Transf 62.0%     █████████████████████████░░░░░░░░░░░░░░░░
                                              ↑
                                    Transformer wins here
                              (+8.8pp better at strategic play)

Win Rate vs Random Opponent

                      0%       25%       50%       75%      100%
                      ├─────────┼─────────┼─────────┼─────────┤
  🔵 CNN    97.0%     ████████████████████████████████████████░
  🟣 Transf 97.8%     ████████████████████████████████████████░

Head-to-Head Analysis

Each model has its pros and cons. Regarding the CNN — performance-wise it came out ahead on supervised accuracy. This was largely because the board is small. Our dataset was relatively compact compared to what a Transformer typically needs, and the CNN had an easier time recognizing small sections of the board to identify patterns. The Transformer had to see the board as a whole, which can occasionally lead to sub-optimal moves in local situations. CNNs naturally capture local adjacency, pattern continuity, and geometric structure better, mainly due to their "small snapshot" behavior.

That said, the CNN is not without flaws, and the Transformer has real advantages. The CNN, due to its locality approach, struggles more with complex positions, fork detection, and multi-step tactical reasoning. The Transformer, since it sees the whole board, can capture these strategic trends more naturally — it does not always win, but it sees the bigger picture.

This plays out clearly in the gameplay numbers: while the CNN has higher validation accuracy (63% vs 60.3%), the Transformer wins more against the tactical opponent — 62% vs 53.2%. That 8.8 percentage point gap is the Transformer's global attention at work, catching the kinds of multi-step threats that a local 3×3 filter structurally cannot see.

🔑 Key Insight

Despite lower supervised accuracy (60.3% vs 63%), the Transformer beats the CNN against tactical play by +8.8 percentage points (62% vs 53.2%).

This reveals a fundamental limitation of validation accuracy as a proxy for gameplay strength. The CNN's inductive spatial bias helps it converge faster and score higher on the test set — but the Transformer's global self-attention learns to see multi-step threats and fork patterns that local 3×3 convolutions structurally cannot model.

🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                           FULL SYSTEM                                    │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │  🌐  Anvil Web App  (Python full-stack, browser-based)              │ │
│  │                                                                     │ │
│  │   ┌────────────┐   ┌──────────────────┐   ┌─────────────────────┐  │ │
│  │   │ 🔐 Login   │   │ 🎮 Game Board    │   │ ⚙️ Settings         │  │ │
│  │   │  Auth Gate │   │   6×7 Grid UI    │   │  CNN / Transformer  │  │ │
│  │   │            │   │   🔴 🟡 pieces   │   │  Easy/Medium/Hard   │  │ │
│  │   └────────────┘   └──────────────────┘   └─────────────────────┘  │ │
│  │                                                                     │ │
│  │   User clicks column  →  board encoded as (6,7,2) float32 tensor   │ │
│  │   anvil.server.call('get_move', board_tensor, model_key)  ──────────┼─┼──┐
│  └─────────────────────────────────────────────────────────────────────┘ │  │
│                                                                          │  │ Encrypted
│                                                              Anvil Uplink│  │ Tunnel
│  ┌─────────────────────────────────────────────────────────────────────┐ │  │
│  │  ☁️  AWS Lightsail VM                                               │ │  │
│  │                                                                     │ │  │
│  │   ┌─────────────────────────────────────────────────────────────┐  │ │  │
│  │   │  🐳 Docker Container                                        │  │ │  │
│  │   │                                                             │◄─┼─┼──┘
│  │   │   backend.py                                                │  │ │
│  │   │   ├── anvil.server.connect(uplink_key)                      │  │ │
│  │   │   ├── Load CNN SavedModel      ──► cnn_infer()              │  │ │
│  │   │   ├── Load Transformer SavedModel ► tr_infer()              │  │ │
│  │   │   ├── _ensure_1_6_7_2(board)   ← shape normalization        │  │ │
│  │   │   ├── forward pass → argmax(probs[0])                       │  │ │
│  │   │   └── anvil.server.wait_forever()                           │  │ │
│  │   │                                                             │  │ │
│  │   │   ┌─────────────────┐    ┌─────────────────────────────┐   │  │ │
│  │   │   │  cnn_savedmodel │    │  transformer_savedmodel     │   │  │ │
│  │   │   │    4.4 MB       │    │       824 KB                │   │  │ │
│  │   │   │  serving_default│    │    serving_default sig.     │   │  │ │
│  │   │   └─────────────────┘    └─────────────────────────────┘   │  │ │
│  │   └─────────────────────────────────────────────────────────────┘  │ │
│  └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                    Returns integer move (0–6) to UI

📦 Data Pipeline — MCTS Self-Play

We built our training data by having MCTS play against itself over thousands of games, saving each board position along with the move MCTS recommended. To keep the dataset from getting repetitive, we mixed things up, randomizing the first few opening moves, occasionally throwing in a random move mid-game, and varying how hard MCTS was "thinking" between 800 and 1500 iterations per move. We ran the whole thing across 21 CPU cores in parallel to speed things up, with checkpoints saving progress along the way in case anything went wrong. Since the neural network only needs to learn from one player's perspective, we flipped the board whenever it was the other player's turn so everything looks the same to the model. When the same board showed up more than once with different move recommendations, we just kept whichever move came up most often. Finally, we mirrored every board left-to-right to nearly double our data for free. All of that gave us around 400,000 unique positions to train on.

  ┌─────────────────────────────────────────────────────────────────────┐
  │                  DATA GENERATION PIPELINE                           │
  │                                                                     │
  │   21 CPU cores running in parallel                                  │
  │   ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐  ...  ┌──────┐             │
  │   │MCTS  │ │MCTS  │ │MCTS  │ │MCTS  │        │MCTS  │             │
  │   │800–  │ │1200  │ │1500  │ │900   │        │1100  │  ← varied   │
  │   │1500  │ │iters │ │iters │ │iters │        │iters │    strength │
  │   │iters │ │      │ │      │ │      │        │      │             │
  │   └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘        └──┬───┘             │
  │      └────────┴────────┴────────┴───────────────┘                  │
  │                              │                                      │
  │                              ▼                                      │
  │          Raw game records  (board, MCTS recommended move)           │
  │                              │                                      │
  │          ┌───────────────────┼───────────────────────┐             │
  │          ▼                   ▼                        ▼             │
  │   ┌─────────────┐   ┌──────────────────┐   ┌──────────────────┐   │
  │   │ Perspective  │   │ Duplicate boards │   │ Left-right board │   │
  │   │ flip: -1     │   │ → keep majority  │   │ mirroring (free  │   │
  │   │ player boards│   │   vote move      │   │  2× augmentation)│   │
  │   └─────────────┘   └──────────────────┘   └──────────────────┘   │
  │                              │                                      │
  │                              ▼                                      │
  │              ~400,000 unique (board, move) pairs                    │
  │              encoded as float32 tensor (6, 7, 2)                    │
  │                              │                                      │
  │                    ┌─────────┴────────┐                            │
  │                    ▼                  ▼                             │
  │             80% Training         20% Validation                     │
  │            (39,483 samples)      (9,871 samples)                    │
  └─────────────────────────────────────────────────────────────────────┘

Board Encoding

  Example board state → encoded as shape (6, 7, 2):

  Raw board (6×7):          Channel 0 — 🔴 (Player +1):  Channel 1 — 🟡 (Player -1):
  ┌──┬──┬──┬──┬──┬──┬──┐   ┌──┬──┬──┬──┬──┬──┬──┐      ┌──┬──┬──┬──┬──┬──┬──┐
  │  │  │  │  │  │  │  │   │0 │0 │0 │0 │0 │0 │0 │      │0 │0 │0 │0 │0 │0 │0 │
  │  │  │  │🔴│  │  │  │   │0 │0 │0 │1 │0 │0 │0 │      │0 │0 │0 │0 │0 │0 │0 │
  │  │  │🟡│🔴│  │  │  │   │0 │0 │0 │1 │0 │0 │0 │      │0 │0 │1 │0 │0 │0 │0 │
  │  │🟡│🔴│🔴│🟡│  │  │   │0 │0 │1 │1 │0 │0 │0 │      │0 │1 │0 │0 │1 │0 │0 │
  │🔴│🔴│🟡│🔴│🔴│  │  │   │1 │1 │0 │1 │1 │0 │0 │      │0 │0 │1 │0 │0 │0 │0 │
  └──┴──┴──┴──┴──┴──┴──┘   └──┴──┴──┴──┴──┴──┴──┘      └──┴──┴──┴──┴──┴──┴──┘
                             "Where am I?"                "Where is the opponent?"

🔵 CNN Architecture

For the model preparation, we proceeded with two approaches: a Convolutional Neural Network (CNN) and a Transformer-based model. Both were trained to predict the best column for the current player given a board encoded as a 6×7×2 tensor. The CNN scans small regions of the board and learns to recognize useful patterns, while the Transformer takes a different approach — it scans the whole board and learns relationships between all positions using attention (the same mechanism introduced in the paper "Attention Is All You Need"). Each approach has distinct pros and cons that we explore in depth below.

The architecture of the CNN model was built using stacked convolutional layers followed by a dense classification head. Our structure consisted of Conv2D layers, batch normalization, ReLU activations, convolutional blocks with 128 and 256 filters, GlobalAveragePooling (which helps reduce overfitting compared to Flatten), a Dense layer of 128 units with ReLU, a Dropout rate of 30%, and finally a Dense layer of 7 units with softmax activation. With this setup, we got a validation accuracy of 63%.

In plain terms — in the first layer the CNN looks at small 3×3 patterns and starts detecting simple relationships like two adjacent pieces, vertical alignment, and empty spaces. In the second layer it uses those patterns to detect more complex shapes: three-in-a-row, diagonal structures, near-winning setups. In the third layer it steps it up further — detecting double threats, fork setups (double attacks), and blocking patterns. That is what we mean by stacked convolutional layers. The dense classification head is the final step: after all the stacked layers we compress detected features into a summarized vector, feed it into a fully connected Dense layer, and output probabilities for the 7 possible columns. The model then picks the highest probability column.

  Input (6, 7, 2)
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 1 — Pattern Detection (early features)                 │
          │ Conv2D(64 filters, 3×3, padding='same')                      │
          │     ↳ detects: 2-in-a-row, edge pieces, isolated cells      │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 2 — Threat Recognition                                 │
          │ Conv2D(128 filters, 3×3, padding='same')                     │
          │     ↳ detects: 3-in-a-row, diagonal lines, near-wins        │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 3 — Tactical Pattern Assembly                          │
          │ Conv2D(128 filters, 3×3, padding='same')                     │
          │     ↳ detects: double threats, blocked lines, open-fours    │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 4 — High-Level Strategy                                │
          │ Conv2D(256 filters, 3×3, padding='same')                     │
          │     ↳ combines all lower features into strategic signals     │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼
  GlobalAveragePooling2D    (replaces Flatten → reduces overfitting)
       │
       ▼
  Dense(128) → ReLU → Dropout(0.30)
       │
       ▼
  Dense(7) → Softmax
       │
       ▼
  P(col_0), P(col_1), ..., P(col_6)    ← probability over 7 columns

Hyperparameters (all verified from training logs)

Parameter	Value
Optimizer	Adam
Initial learning rate	3e-4
LR schedule	ReduceLROnPlateau (factor=0.5, patience=2, min=1e-6)
Batch size	64
Max epochs	50
Early stopping	patience=5 on val_loss, restore best weights
Conv filters	64 → 128 → 128 → 256
Kernel size	3×3 throughout
Regularization	BatchNorm + Dropout(0.30) + GlobalAvgPool
Total parameters	553,353
Trainable	552,199

Training Log (extracted from notebook outputs)

  Epoch  Train Acc   Val Acc   Val Loss   LR
  ─────  ─────────   ───────   ────────   ──────────
    1      28.6%      44.5%     1.4578    3.0e-04
    2      45.8%      46.1%     1.3656    3.0e-04
    3      49.3%      48.4%     1.3219    3.0e-04
    5      55.8%      48.2%     1.4208    ↓ 1.5e-04  ← LR reduced (plateau)
    6      60.1%      51.7%     1.2707    1.5e-04
    8      65.1%     52.8% ★   1.2563    1.5e-04    ← best val loss
   10      69.4%      49.8%     1.3846    ↓ 7.5e-05  ← LR reduced again
   12      75.3%      53.3%     1.3517    ↓ 3.75e-05
   13      78.2%      53.3%     1.3714    3.75e-05
  ─────────────────────────────────────────────────
  Early stopped at epoch 13.  Best weights restored from epoch 8.
  Val Accuracy (best weights): 52.76%  |  Full dataset run: 63%

🟣 Vision Transformer Architecture

Our Transformer architecture was based on a Vision Transformer (ViT) style adapted for Connect 4. The structure consists of reshaping the board into 42 tokens, projecting each to a 128-dimensional embedding, adding a CLS token and trainable positional embeddings, passing through 4 Transformer encoder blocks (multi-head attention, residual connections, and MLP blocks), extracting the CLS token, and finally a dense head with 7-class softmax. With this setup, we got a validation accuracy of 60.32%.

Originally, Transformers were built for text. Then researchers adapted them for images, producing the Vision Transformer (ViT). We adapted that same idea for Connect 4. As mentioned, the Transformer sees the whole board at once — it breaks it into 42 small tokens (one per cell), converts each cell into a vector (the embedding step), and adds positional information so the model knows where each cell is. It then uses "attention" to allow every cell to interact with every other cell. The model can decide which other cells matter when analyzing a specific position — for example, a piece in column 3 might "pay attention" to pieces in column 2 and 4 to identify a possible diagonal threat. The CLS token acts as a summary notebook that gets to attend to all cells during this process. Once we extract it, it contains a global summary of the entire board state, and from that the model decides which column to play.

  Input (6, 7, 2)
       │
       ▼
  Reshape → 42 tokens of shape (2,)
  "Each of the 42 cells becomes one token. The network has no pre-baked idea
   of which cells are adjacent — it must learn spatial relationships from data."
       │
       ▼
  Dense(128) → 42-token sequence, each 128-dim    (token projection)
       │
       ▼
  Prepend [CLS] token → sequence length = 43
  "This learnable token acts as a 'global summary notebook',
   collecting information from every cell via attention."
       │
       ▼
  Add trainable positional embeddings (43 × 128)
       │
       ▼
  ┌────────────────────────────────────────────────┐
  │  Transformer Encoder Block  ×4                 │
  │                                                │
  │  ┌──────────────────────────────────────────┐  │
  │  │  Multi-Head Self-Attention               │  │
  │  │                                          │  │
  │  │  Every token queries every other token:  │  │
  │  │  "Does col 3 matter when I'm at col 4?"  │  │
  │  │  → learns diagonal threats, fork setups  │  │
  │  └──────────────────────────────────────────┘  │
  │               ↓ Residual + LayerNorm            │
  │  ┌──────────────────────────────────────────┐  │
  │  │  Feed-Forward MLP (expand → contract)    │  │
  │  └──────────────────────────────────────────┘  │
  │               ↓ Residual + LayerNorm            │
  └────────────────────────────────────────────────┘  × 4 blocks
       │
       ▼
  Extract [CLS] token   (shape: 128-dim vector)
       │
       ▼
  Dense(7) → Softmax
       │
       ▼
  P(col_0), ..., P(col_6)

Hyperparameters (all verified from training logs)

Parameter	Value
Optimizer	Adam (lr=3e-4)
Sequence length	42 tokens (6×7 cells)
Token embedding dim	128
[CLS] token	Trainable, prepended to sequence
Positional embeddings	Trainable (43 × 128)
Encoder blocks	4
Max epochs	60
Data augmentation	Horizontal board flip (2× dataset)
Total parameters	553,479 (all trainable)

Training Log (60 full epochs — verified from notebook)

  Epoch  Train Acc   Val Acc   Val Loss
  ─────  ─────────   ───────   ────────
    1      28.3%      35.8%    1.5812
   10      44.7%      47.0%    1.3927
   20      49.7%      50.8%    1.3033
   30      52.5%      54.2%    1.2393
   40      55.5%      57.1%    1.1848
   50      57.9%      58.6%    1.1493
   59      59.8%      60.1%    1.1241
   60      59.9%      60.3% ✓  1.1237   ← final
  ─────────────────────────────────────
  Steady convergence — no early stop needed.
  Training accuracy and validation accuracy stay close → no overfitting.

🚀 Deployment

Infrastructure

  ┌───────────────────────────────────────────────┐
  │  AWS Lightsail                                │
  │                                               │
  │   Instance type : Linux/Unix, 1 GB RAM        │
  │   Purpose       : Inference only (no training)│
  │   Cost          : Free tier / minimal         │
  │                                               │
  │   ┌───────────────────────────────────────┐   │
  │   │  Docker Container                     │   │
  │   │  ├─ Python 3.10                       │   │
  │   │  ├─ tensorflow==2.12.1                │   │
  │   │  ├─ numpy                             │   │
  │   │  ├─ anvil-uplink==0.4.2               │   │
  │   │  ├─ cnn_savedmodel/    (4.4 MB)       │   │
  │   │  ├─ transformer_savedmodel/ (824 KB)  │   │
  │   │  └─ backend.py                        │   │
  │   │                                       │   │
  │   │  Startup sequence:                    │   │
  │   │  1. Connect to Anvil via Uplink key   │   │
  │   │  2. Load both models into memory once │   │
  │   │  3. wait_forever() — serves requests  │   │
  │   └───────────────────────────────────────┘   │
  └───────────────────────────────────────────────┘

Inference Request Lifecycle

  User clicks column 3
         │
         ▼
  Anvil encodes board as (6,7,2) float32 tensor
         │
         ▼
  anvil.server.call('get_move', board, 'cnn')
         │
         ▼  [encrypted Uplink tunnel]
         │
         ▼
  backend.py receives board
  _ensure_1_6_7_2(board)  →  shape (1, 6, 7, 2)
         │
         ├─── CNN selected ──► cnn_infer(board_tensor)
         │                      ↳ output: (1, 7) probability vector
         │                      ↳ argmax → column 4  (0–6)
         │
  return 4
         │
         ▼  [< 50ms round trip]
         │
  Anvil drops 🟡 in column 4, updates board

📁 Project Structure

connect4-ai/
│
├── 📄 README.md                              ← You are here
├── 📄 LICENSE                                ← MIT
├── 📄 .gitignore
│
├── 📂 data/
│   └── 📂 generator/
│       └── 🐍 mcts_self_play.py             ← MCTS self-play data generation
│                                               Parallelized across 21 CPU cores
│                                               Output: ~400K (board, move) pairs
│
├── 📂 training/
│   └── 📓 Connect4_AI_Training.ipynb        ← End-to-end training notebook
│                                               CNN + Transformer + full eval
│                                               Plots: accuracy, loss, win rates
│
├── 📂 backend/
│   ├── 🐍 backend.py                        ← Anvil Uplink inference server
│   ├── 🐳 Dockerfile                        ← Container definition
│   ├── 🐳 docker-compose.yml               ← Compose config
│   └── 📄 requirements.txt                 ← Pinned: tensorflow, numpy, anvil-uplink
│
├── 📂 models/
│   ├── 📂 cnn_savedmodel/                   ← CNN in TF SavedModel format (4.4 MB)
│   │   ├── saved_model.pb
│   │   └── variables/
│   ├── 📂 transformer_savedmodel/           ← Transformer in TF SavedModel (824 KB)
│   │   ├── saved_model.pb
│   │   └── variables/
│   └── 📦 connect4_transformer_v2_portable.h5  ← Transformer in Keras .h5
│
└── 📂 app/
    └── 📄 Connect4AIGrp26.yaml             ← Anvil frontend export (clone-able)

Note: Training dataset (connect4_400k_2channel.pkl, ~265 MB) excluded via .gitignore. Regenerate using data/generator/mcts_self_play.py or request access.

⚡ Getting Started

1 — Clone

git clone https://github.com/YOUR_USERNAME/connect4-ai.git
cd connect4-ai

2 — Run inference on a board

import numpy as np
import tensorflow as tf

# Load the CNN
model = tf.saved_model.load("models/cnn_savedmodel")
infer = model.signatures["serving_default"]
input_key = list(infer.structured_input_signature[1].keys())[0]

# Build any board: shape (6, 7, 2)
# Channel 0 = your pieces (+1), Channel 1 = opponent pieces (-1)
board = np.zeros((6, 7, 2), dtype=np.float32)
board[5, 3, 0] = 1.0   # your piece at bottom-center
board[5, 4, 1] = 1.0   # opponent piece next to it

x = board[np.newaxis, ...]  # add batch dim → (1, 6, 7, 2)
output = infer(**{input_key: tf.constant(x)})
probs  = list(output.values())[0].numpy()[0]

print(f"Recommended column : {np.argmax(probs)}")
print(f"Column probabilities: {np.round(probs, 3)}")

3 — Run the backend server locally

pip install -r backend/requirements.txt
# Set your Anvil Uplink key (replace the placeholder in backend.py)
python backend/backend.py

Expected output:

START: Backend Starting, about to connect to Anvil
Anvil Uplink connection established successfully
Loading models...
 CNN model loaded.
 Transformer model loaded.
Backend fully operational

4 — Docker

docker-compose -f backend/docker-compose.yml up --build
docker logs -f <container_id>

5 — Retrain from scratch

# 1. Generate data (CPU-intensive, uses multiprocessing)
python data/generator/mcts_self_play.py

# 2. Open and run training notebook
jupyter notebook training/Connect4_AI_Training.ipynb
# All models saved automatically to models/

🛠️ Tech Stack

  ┌────────────────┬──────────────────────────────────────────────────────┐
  │ Layer          │ Technology                                           │
  ├────────────────┼──────────────────────────────────────────────────────┤
  │ ML Framework   │ TensorFlow 2.12 / Keras                              │
  │ Data Engine    │ MCTS (Monte Carlo Tree Search) · NumPy · multiprocess│
  │ Architectures  │ CNN (Conv2D) · Vision Transformer (ViT-style)        │
  │ Model Format   │ TF SavedModel · Keras .h5                            │
  │ Backend        │ Python 3.10 · Anvil Uplink                           │
  │ Container      │ Docker · Docker Compose                              │
  │ Cloud          │ AWS Lightsail                                        │
  │ Frontend       │ Anvil (Python full-stack web framework)              │
  │ Training Env   │ Google Colab (GPU) · Local CPU (data generation)     │
  └────────────────┴──────────────────────────────────────────────────────┘

📜 License

MIT — see LICENSE.

Built with 🔴🟡 and a lot of MCTS iterations

If you liked this project, consider leaving a ⭐

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Game Agent · MCTS Self-Play · CNN vs Transformer · Production on AWS

🧠 What Is This?

🎮 The Game

📊 Results

Verified Performance Metrics

Win Rate Comparison (vs Tactical Opponent)

Win Rate vs Random Opponent

Head-to-Head Analysis

🔑 Key Insight

🏗️ System Architecture

📦 Data Pipeline — MCTS Self-Play

Board Encoding

🔵 CNN Architecture

🟣 Vision Transformer Architecture

🚀 Deployment

Infrastructure

Inference Request Lifecycle

📁 Project Structure

⚡ Getting Started

1 — Clone

2 — Run inference on a board

3 — Run the backend server locally

4 — Docker

5 — Retrain from scratch

🛠️ Tech Stack

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Plots		Plots
app		app
backend		backend
data/generator		data/generator
models		models
training		training
.Rhistory		.Rhistory
.gitignore		.gitignore
Dataset Builder Connect 4.py		Dataset Builder Connect 4.py
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Game Agent · MCTS Self-Play · CNN vs Transformer · Production on AWS

🧠 What Is This?

🎮 The Game

📊 Results

Verified Performance Metrics

Win Rate Comparison (vs Tactical Opponent)

Win Rate vs Random Opponent

Head-to-Head Analysis

🔑 Key Insight

🏗️ System Architecture

📦 Data Pipeline — MCTS Self-Play

Board Encoding

🔵 CNN Architecture

🟣 Vision Transformer Architecture

🚀 Deployment

Infrastructure

Inference Request Lifecycle

📁 Project Structure

⚡ Getting Started

1 — Clone

2 — Run inference on a board

3 — Run the backend server locally

4 — Docker

5 — Retrain from scratch

🛠️ Tech Stack

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages