Skip to content

radhapawar/Connect4-AI-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ██████╗ ██████╗ ███╗  ██╗███╗  ██╗███████╗ ██████╗████████╗    ██╗  ██╗     █████╗ ██╗
██╔════╝██╔═══██╗████╗ ██║████╗ ██║██╔════╝██╔════╝╚══██╔══╝    ██║  ██║    ██╔══██╗██║
██║     ██║   ██║██╔██╗██║██╔██╗██║█████╗  ██║        ██║       ███████║    ███████║██║
██║     ██║   ██║██║╚████║██║╚████║██╔══╝  ██║        ██║       ╚════██║    ██╔══██║██║
╚██████╗╚██████╔╝██║ ╚███║██║ ╚███║███████╗╚██████╗   ██║            ██║    ██║  ██║██║
 ╚═════╝ ╚═════╝ ╚═╝  ╚══╝╚═╝  ╚══╝╚══════╝ ╚═════╝   ╚═╝           ╚═╝    ╚═╝  ╚═╝╚═╝

Deep Learning Game Agent · MCTS Self-Play · CNN vs Transformer · Production on AWS


Python TensorFlow Docker AWS Live Demo License: MIT


Two neural networks trained on 400,000+ board positions learn to think like grandmasters — one scans local patterns, one reads the whole board. Both deployed live on AWS, serving real-time predictions in under 50ms.


🎮 Play the Bot  ·  📓 Training Notebook  ·  🔧 Backend  ·  📊 Results


🧠 What Is This?

This project implements the MCTS self-play → supervised distillation → neural network inference pipeline — the same paradigm behind DeepMind's AlphaZero — applied to Connect 4 end-to-end.

Instead of searching the game tree at inference time (slow), we train a neural network to instantly replicate the decisions of a strong Monte Carlo Tree Search player. The result: a deep learning agent that plays at a superhuman level with a single forward pass (< 50ms).

Two architectures are built, trained, and rigorously compared:

CNN Vision Transformer
Approach Scans local 3×3 regions for patterns Reads all 42 cells simultaneously via attention
Analogy Detects threats the way a player spots lines Sees the board holistically like a strategist
Strength Fast convergence, precise local tactics Superior multi-step planning against tactical play

Both are live — pick your opponent at attentive-klutzy-jacket.anvil.app.


🎮 The Game

      col:  0   1   2   3   4   5   6
           ┌───┬───┬───┬───┬───┬───┬───┐
    row 0  │   │   │   │   │   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 1  │   │   │   │   │   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 2  │   │   │   │ 🔴│   │   │   │  ← Transformer: "I see a diagonal threat
           ├───┼───┼───┼───┼───┼───┼───┤              building via global attention"
    row 3  │   │   │ 🔴│ 🔴│   │   │   │
           ├───┼───┼───┼───┼───┼───┼───┤
    row 4  │   │ 🔴│   │ 🔴│   │   │   │  ← CNN: "I see three 3×3 threat patterns
           ├───┼───┼───┼───┼───┼───┼───┤         and recommend blocking col 1"
    row 5  │   │ 🔴│ 🔴│ 🔴│   │   │   │
           └───┴───┴───┴───┴───┴───┴───┘

    Encoded as float32 tensor of shape (6, 7, 2):
    Channel 0 → player (+1) positions    Channel 1 → opponent (-1) positions

📊 Results

Verified Performance Metrics

Metric 🔵 CNN 🟣 Transformer
Validation Accuracy 63.0% 60.3%
Parameters 553,353 553,479
Win Rate vs Random Bot 97.0% 97.8%
Win Rate vs Tactical Bot 53.2% 62.0%
Training Epochs 13 (early stopped) 60
Model Size 4.4 MB 824 KB

All win rates from 500-game evaluations per opponent type, alternating starting player.

Win Rate Comparison (vs Tactical Opponent)

                      0%       25%       50%       75%      100%
                      ├─────────┼─────────┼─────────┼─────────┤
  🔵 CNN    53.2%     ██████████████████████░░░░░░░░░░░░░░░░░░░
  🟣 Transf 62.0%     █████████████████████████░░░░░░░░░░░░░░░░
                                              ↑
                                    Transformer wins here
                              (+8.8pp better at strategic play)

Win Rate vs Random Opponent

                      0%       25%       50%       75%      100%
                      ├─────────┼─────────┼─────────┼─────────┤
  🔵 CNN    97.0%     ████████████████████████████████████████░
  🟣 Transf 97.8%     ████████████████████████████████████████░

Head-to-Head Analysis

Each model has its pros and cons. Regarding the CNN — performance-wise it came out ahead on supervised accuracy. This was largely because the board is small. Our dataset was relatively compact compared to what a Transformer typically needs, and the CNN had an easier time recognizing small sections of the board to identify patterns. The Transformer had to see the board as a whole, which can occasionally lead to sub-optimal moves in local situations. CNNs naturally capture local adjacency, pattern continuity, and geometric structure better, mainly due to their "small snapshot" behavior.

That said, the CNN is not without flaws, and the Transformer has real advantages. The CNN, due to its locality approach, struggles more with complex positions, fork detection, and multi-step tactical reasoning. The Transformer, since it sees the whole board, can capture these strategic trends more naturally — it does not always win, but it sees the bigger picture.

This plays out clearly in the gameplay numbers: while the CNN has higher validation accuracy (63% vs 60.3%), the Transformer wins more against the tactical opponent — 62% vs 53.2%. That 8.8 percentage point gap is the Transformer's global attention at work, catching the kinds of multi-step threats that a local 3×3 filter structurally cannot see.

🔑 Key Insight

Despite lower supervised accuracy (60.3% vs 63%), the Transformer beats the CNN against tactical play by +8.8 percentage points (62% vs 53.2%).

This reveals a fundamental limitation of validation accuracy as a proxy for gameplay strength. The CNN's inductive spatial bias helps it converge faster and score higher on the test set — but the Transformer's global self-attention learns to see multi-step threats and fork patterns that local 3×3 convolutions structurally cannot model.


🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                           FULL SYSTEM                                    │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │  🌐  Anvil Web App  (Python full-stack, browser-based)              │ │
│  │                                                                     │ │
│  │   ┌────────────┐   ┌──────────────────┐   ┌─────────────────────┐  │ │
│  │   │ 🔐 Login   │   │ 🎮 Game Board    │   │ ⚙️ Settings         │  │ │
│  │   │  Auth Gate │   │   6×7 Grid UI    │   │  CNN / Transformer  │  │ │
│  │   │            │   │   🔴 🟡 pieces   │   │  Easy/Medium/Hard   │  │ │
│  │   └────────────┘   └──────────────────┘   └─────────────────────┘  │ │
│  │                                                                     │ │
│  │   User clicks column  →  board encoded as (6,7,2) float32 tensor   │ │
│  │   anvil.server.call('get_move', board_tensor, model_key)  ──────────┼─┼──┐
│  └─────────────────────────────────────────────────────────────────────┘ │  │
│                                                                          │  │ Encrypted
│                                                              Anvil Uplink│  │ Tunnel
│  ┌─────────────────────────────────────────────────────────────────────┐ │  │
│  │  ☁️  AWS Lightsail VM                                               │ │  │
│  │                                                                     │ │  │
│  │   ┌─────────────────────────────────────────────────────────────┐  │ │  │
│  │   │  🐳 Docker Container                                        │  │ │  │
│  │   │                                                             │◄─┼─┼──┘
│  │   │   backend.py                                                │  │ │
│  │   │   ├── anvil.server.connect(uplink_key)                      │  │ │
│  │   │   ├── Load CNN SavedModel      ──► cnn_infer()              │  │ │
│  │   │   ├── Load Transformer SavedModel ► tr_infer()              │  │ │
│  │   │   ├── _ensure_1_6_7_2(board)   ← shape normalization        │  │ │
│  │   │   ├── forward pass → argmax(probs[0])                       │  │ │
│  │   │   └── anvil.server.wait_forever()                           │  │ │
│  │   │                                                             │  │ │
│  │   │   ┌─────────────────┐    ┌─────────────────────────────┐   │  │ │
│  │   │   │  cnn_savedmodel │    │  transformer_savedmodel     │   │  │ │
│  │   │   │    4.4 MB       │    │       824 KB                │   │  │ │
│  │   │   │  serving_default│    │    serving_default sig.     │   │  │ │
│  │   │   └─────────────────┘    └─────────────────────────────┘   │  │ │
│  │   └─────────────────────────────────────────────────────────────┘  │ │
│  └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                    Returns integer move (0–6) to UI

📦 Data Pipeline — MCTS Self-Play

We built our training data by having MCTS play against itself over thousands of games, saving each board position along with the move MCTS recommended. To keep the dataset from getting repetitive, we mixed things up, randomizing the first few opening moves, occasionally throwing in a random move mid-game, and varying how hard MCTS was "thinking" between 800 and 1500 iterations per move. We ran the whole thing across 21 CPU cores in parallel to speed things up, with checkpoints saving progress along the way in case anything went wrong. Since the neural network only needs to learn from one player's perspective, we flipped the board whenever it was the other player's turn so everything looks the same to the model. When the same board showed up more than once with different move recommendations, we just kept whichever move came up most often. Finally, we mirrored every board left-to-right to nearly double our data for free. All of that gave us around 400,000 unique positions to train on.

  ┌─────────────────────────────────────────────────────────────────────┐
  │                  DATA GENERATION PIPELINE                           │
  │                                                                     │
  │   21 CPU cores running in parallel                                  │
  │   ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐  ...  ┌──────┐             │
  │   │MCTS  │ │MCTS  │ │MCTS  │ │MCTS  │        │MCTS  │             │
  │   │800–  │ │1200  │ │1500  │ │900   │        │1100  │  ← varied   │
  │   │1500  │ │iters │ │iters │ │iters │        │iters │    strength │
  │   │iters │ │      │ │      │ │      │        │      │             │
  │   └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘        └──┬───┘             │
  │      └────────┴────────┴────────┴───────────────┘                  │
  │                              │                                      │
  │                              ▼                                      │
  │          Raw game records  (board, MCTS recommended move)           │
  │                              │                                      │
  │          ┌───────────────────┼───────────────────────┐             │
  │          ▼                   ▼                        ▼             │
  │   ┌─────────────┐   ┌──────────────────┐   ┌──────────────────┐   │
  │   │ Perspective  │   │ Duplicate boards │   │ Left-right board │   │
  │   │ flip: -1     │   │ → keep majority  │   │ mirroring (free  │   │
  │   │ player boards│   │   vote move      │   │  2× augmentation)│   │
  │   └─────────────┘   └──────────────────┘   └──────────────────┘   │
  │                              │                                      │
  │                              ▼                                      │
  │              ~400,000 unique (board, move) pairs                    │
  │              encoded as float32 tensor (6, 7, 2)                    │
  │                              │                                      │
  │                    ┌─────────┴────────┐                            │
  │                    ▼                  ▼                             │
  │             80% Training         20% Validation                     │
  │            (39,483 samples)      (9,871 samples)                    │
  └─────────────────────────────────────────────────────────────────────┘

Board Encoding

  Example board state → encoded as shape (6, 7, 2):

  Raw board (6×7):          Channel 0 — 🔴 (Player +1):  Channel 1 — 🟡 (Player -1):
  ┌──┬──┬──┬──┬──┬──┬──┐   ┌──┬──┬──┬──┬──┬──┬──┐      ┌──┬──┬──┬──┬──┬──┬──┐
  │  │  │  │  │  │  │  │   │0 │0 │0 │0 │0 │0 │0 │      │0 │0 │0 │0 │0 │0 │0 │
  │  │  │  │🔴│  │  │  │   │0 │0 │0 │1 │0 │0 │0 │      │0 │0 │0 │0 │0 │0 │0 │
  │  │  │🟡│🔴│  │  │  │   │0 │0 │0 │1 │0 │0 │0 │      │0 │0 │1 │0 │0 │0 │0 │
  │  │🟡│🔴│🔴│🟡│  │  │   │0 │0 │1 │1 │0 │0 │0 │      │0 │1 │0 │0 │1 │0 │0 │
  │🔴│🔴│🟡│🔴│🔴│  │  │   │1 │1 │0 │1 │1 │0 │0 │      │0 │0 │1 │0 │0 │0 │0 │
  └──┴──┴──┴──┴──┴──┴──┘   └──┴──┴──┴──┴──┴──┴──┘      └──┴──┴──┴──┴──┴──┴──┘
                             "Where am I?"                "Where is the opponent?"

🔵 CNN Architecture

For the model preparation, we proceeded with two approaches: a Convolutional Neural Network (CNN) and a Transformer-based model. Both were trained to predict the best column for the current player given a board encoded as a 6×7×2 tensor. The CNN scans small regions of the board and learns to recognize useful patterns, while the Transformer takes a different approach — it scans the whole board and learns relationships between all positions using attention (the same mechanism introduced in the paper "Attention Is All You Need"). Each approach has distinct pros and cons that we explore in depth below.

The architecture of the CNN model was built using stacked convolutional layers followed by a dense classification head. Our structure consisted of Conv2D layers, batch normalization, ReLU activations, convolutional blocks with 128 and 256 filters, GlobalAveragePooling (which helps reduce overfitting compared to Flatten), a Dense layer of 128 units with ReLU, a Dropout rate of 30%, and finally a Dense layer of 7 units with softmax activation. With this setup, we got a validation accuracy of 63%.

In plain terms — in the first layer the CNN looks at small 3×3 patterns and starts detecting simple relationships like two adjacent pieces, vertical alignment, and empty spaces. In the second layer it uses those patterns to detect more complex shapes: three-in-a-row, diagonal structures, near-winning setups. In the third layer it steps it up further — detecting double threats, fork setups (double attacks), and blocking patterns. That is what we mean by stacked convolutional layers. The dense classification head is the final step: after all the stacked layers we compress detected features into a summarized vector, feed it into a fully connected Dense layer, and output probabilities for the 7 possible columns. The model then picks the highest probability column.

  Input (6, 7, 2)
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 1 — Pattern Detection (early features)                 │
          │ Conv2D(64 filters, 3×3, padding='same')                      │
          │     ↳ detects: 2-in-a-row, edge pieces, isolated cells      │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 2 — Threat Recognition                                 │
          │ Conv2D(128 filters, 3×3, padding='same')                     │
          │     ↳ detects: 3-in-a-row, diagonal lines, near-wins        │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 3 — Tactical Pattern Assembly                          │
          │ Conv2D(128 filters, 3×3, padding='same')                     │
          │     ↳ detects: double threats, blocked lines, open-fours    │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼  ┌──────────────────────────────────────────────────────────────┐
          │ Block 4 — High-Level Strategy                                │
          │ Conv2D(256 filters, 3×3, padding='same')                     │
          │     ↳ combines all lower features into strategic signals     │
          │ BatchNormalization → Activation('relu')                       │
          └──────────────────────────────────────────────────────────────┘
       │
       ▼
  GlobalAveragePooling2D    (replaces Flatten → reduces overfitting)
       │
       ▼
  Dense(128) → ReLU → Dropout(0.30)
       │
       ▼
  Dense(7) → Softmax
       │
       ▼
  P(col_0), P(col_1), ..., P(col_6)    ← probability over 7 columns

Hyperparameters (all verified from training logs)

Parameter Value
Optimizer Adam
Initial learning rate 3e-4
LR schedule ReduceLROnPlateau (factor=0.5, patience=2, min=1e-6)
Batch size 64
Max epochs 50
Early stopping patience=5 on val_loss, restore best weights
Conv filters 64 → 128 → 128 → 256
Kernel size 3×3 throughout
Regularization BatchNorm + Dropout(0.30) + GlobalAvgPool
Total parameters 553,353
Trainable 552,199

Training Log (extracted from notebook outputs)

  Epoch  Train Acc   Val Acc   Val Loss   LR
  ─────  ─────────   ───────   ────────   ──────────
    1      28.6%      44.5%     1.4578    3.0e-04
    2      45.8%      46.1%     1.3656    3.0e-04
    3      49.3%      48.4%     1.3219    3.0e-04
    5      55.8%      48.2%     1.4208    ↓ 1.5e-04  ← LR reduced (plateau)
    6      60.1%      51.7%     1.2707    1.5e-04
    8      65.1%     52.8% ★   1.2563    1.5e-04    ← best val loss
   10      69.4%      49.8%     1.3846    ↓ 7.5e-05  ← LR reduced again
   12      75.3%      53.3%     1.3517    ↓ 3.75e-05
   13      78.2%      53.3%     1.3714    3.75e-05
  ─────────────────────────────────────────────────
  Early stopped at epoch 13.  Best weights restored from epoch 8.
  Val Accuracy (best weights): 52.76%  |  Full dataset run: 63%

🟣 Vision Transformer Architecture

Our Transformer architecture was based on a Vision Transformer (ViT) style adapted for Connect 4. The structure consists of reshaping the board into 42 tokens, projecting each to a 128-dimensional embedding, adding a CLS token and trainable positional embeddings, passing through 4 Transformer encoder blocks (multi-head attention, residual connections, and MLP blocks), extracting the CLS token, and finally a dense head with 7-class softmax. With this setup, we got a validation accuracy of 60.32%.

Originally, Transformers were built for text. Then researchers adapted them for images, producing the Vision Transformer (ViT). We adapted that same idea for Connect 4. As mentioned, the Transformer sees the whole board at once — it breaks it into 42 small tokens (one per cell), converts each cell into a vector (the embedding step), and adds positional information so the model knows where each cell is. It then uses "attention" to allow every cell to interact with every other cell. The model can decide which other cells matter when analyzing a specific position — for example, a piece in column 3 might "pay attention" to pieces in column 2 and 4 to identify a possible diagonal threat. The CLS token acts as a summary notebook that gets to attend to all cells during this process. Once we extract it, it contains a global summary of the entire board state, and from that the model decides which column to play.

  Input (6, 7, 2)
       │
       ▼
  Reshape → 42 tokens of shape (2,)
  "Each of the 42 cells becomes one token. The network has no pre-baked idea
   of which cells are adjacent — it must learn spatial relationships from data."
       │
       ▼
  Dense(128) → 42-token sequence, each 128-dim    (token projection)
       │
       ▼
  Prepend [CLS] token → sequence length = 43
  "This learnable token acts as a 'global summary notebook',
   collecting information from every cell via attention."
       │
       ▼
  Add trainable positional embeddings (43 × 128)
       │
       ▼
  ┌────────────────────────────────────────────────┐
  │  Transformer Encoder Block  ×4                 │
  │                                                │
  │  ┌──────────────────────────────────────────┐  │
  │  │  Multi-Head Self-Attention               │  │
  │  │                                          │  │
  │  │  Every token queries every other token:  │  │
  │  │  "Does col 3 matter when I'm at col 4?"  │  │
  │  │  → learns diagonal threats, fork setups  │  │
  │  └──────────────────────────────────────────┘  │
  │               ↓ Residual + LayerNorm            │
  │  ┌──────────────────────────────────────────┐  │
  │  │  Feed-Forward MLP (expand → contract)    │  │
  │  └──────────────────────────────────────────┘  │
  │               ↓ Residual + LayerNorm            │
  └────────────────────────────────────────────────┘  × 4 blocks
       │
       ▼
  Extract [CLS] token   (shape: 128-dim vector)
       │
       ▼
  Dense(7) → Softmax
       │
       ▼
  P(col_0), ..., P(col_6)

Hyperparameters (all verified from training logs)

Parameter Value
Optimizer Adam (lr=3e-4)
Sequence length 42 tokens (6×7 cells)
Token embedding dim 128
[CLS] token Trainable, prepended to sequence
Positional embeddings Trainable (43 × 128)
Encoder blocks 4
Max epochs 60
Data augmentation Horizontal board flip (2× dataset)
Total parameters 553,479 (all trainable)

Training Log (60 full epochs — verified from notebook)

  Epoch  Train Acc   Val Acc   Val Loss
  ─────  ─────────   ───────   ────────
    1      28.3%      35.8%    1.5812
   10      44.7%      47.0%    1.3927
   20      49.7%      50.8%    1.3033
   30      52.5%      54.2%    1.2393
   40      55.5%      57.1%    1.1848
   50      57.9%      58.6%    1.1493
   59      59.8%      60.1%    1.1241
   60      59.9%      60.3% ✓  1.1237   ← final
  ─────────────────────────────────────
  Steady convergence — no early stop needed.
  Training accuracy and validation accuracy stay close → no overfitting.

🚀 Deployment

Infrastructure

  ┌───────────────────────────────────────────────┐
  │  AWS Lightsail                                │
  │                                               │
  │   Instance type : Linux/Unix, 1 GB RAM        │
  │   Purpose       : Inference only (no training)│
  │   Cost          : Free tier / minimal         │
  │                                               │
  │   ┌───────────────────────────────────────┐   │
  │   │  Docker Container                     │   │
  │   │  ├─ Python 3.10                       │   │
  │   │  ├─ tensorflow==2.12.1                │   │
  │   │  ├─ numpy                             │   │
  │   │  ├─ anvil-uplink==0.4.2               │   │
  │   │  ├─ cnn_savedmodel/    (4.4 MB)       │   │
  │   │  ├─ transformer_savedmodel/ (824 KB)  │   │
  │   │  └─ backend.py                        │   │
  │   │                                       │   │
  │   │  Startup sequence:                    │   │
  │   │  1. Connect to Anvil via Uplink key   │   │
  │   │  2. Load both models into memory once │   │
  │   │  3. wait_forever() — serves requests  │   │
  │   └───────────────────────────────────────┘   │
  └───────────────────────────────────────────────┘

Inference Request Lifecycle

  User clicks column 3
         │
         ▼
  Anvil encodes board as (6,7,2) float32 tensor
         │
         ▼
  anvil.server.call('get_move', board, 'cnn')
         │
         ▼  [encrypted Uplink tunnel]
         │
         ▼
  backend.py receives board
  _ensure_1_6_7_2(board)  →  shape (1, 6, 7, 2)
         │
         ├─── CNN selected ──► cnn_infer(board_tensor)
         │                      ↳ output: (1, 7) probability vector
         │                      ↳ argmax → column 4  (0–6)
         │
  return 4
         │
         ▼  [< 50ms round trip]
         │
  Anvil drops 🟡 in column 4, updates board

📁 Project Structure

connect4-ai/
│
├── 📄 README.md                              ← You are here
├── 📄 LICENSE                                ← MIT
├── 📄 .gitignore
│
├── 📂 data/
│   └── 📂 generator/
│       └── 🐍 mcts_self_play.py             ← MCTS self-play data generation
│                                               Parallelized across 21 CPU cores
│                                               Output: ~400K (board, move) pairs
│
├── 📂 training/
│   └── 📓 Connect4_AI_Training.ipynb        ← End-to-end training notebook
│                                               CNN + Transformer + full eval
│                                               Plots: accuracy, loss, win rates
│
├── 📂 backend/
│   ├── 🐍 backend.py                        ← Anvil Uplink inference server
│   ├── 🐳 Dockerfile                        ← Container definition
│   ├── 🐳 docker-compose.yml               ← Compose config
│   └── 📄 requirements.txt                 ← Pinned: tensorflow, numpy, anvil-uplink
│
├── 📂 models/
│   ├── 📂 cnn_savedmodel/                   ← CNN in TF SavedModel format (4.4 MB)
│   │   ├── saved_model.pb
│   │   └── variables/
│   ├── 📂 transformer_savedmodel/           ← Transformer in TF SavedModel (824 KB)
│   │   ├── saved_model.pb
│   │   └── variables/
│   └── 📦 connect4_transformer_v2_portable.h5  ← Transformer in Keras .h5
│
└── 📂 app/
    └── 📄 Connect4AIGrp26.yaml             ← Anvil frontend export (clone-able)

Note: Training dataset (connect4_400k_2channel.pkl, ~265 MB) excluded via .gitignore. Regenerate using data/generator/mcts_self_play.py or request access.


⚡ Getting Started

1 — Clone

git clone https://github.com/YOUR_USERNAME/connect4-ai.git
cd connect4-ai

2 — Run inference on a board

import numpy as np
import tensorflow as tf

# Load the CNN
model = tf.saved_model.load("models/cnn_savedmodel")
infer = model.signatures["serving_default"]
input_key = list(infer.structured_input_signature[1].keys())[0]

# Build any board: shape (6, 7, 2)
# Channel 0 = your pieces (+1), Channel 1 = opponent pieces (-1)
board = np.zeros((6, 7, 2), dtype=np.float32)
board[5, 3, 0] = 1.0   # your piece at bottom-center
board[5, 4, 1] = 1.0   # opponent piece next to it

x = board[np.newaxis, ...]  # add batch dim → (1, 6, 7, 2)
output = infer(**{input_key: tf.constant(x)})
probs  = list(output.values())[0].numpy()[0]

print(f"Recommended column : {np.argmax(probs)}")
print(f"Column probabilities: {np.round(probs, 3)}")

3 — Run the backend server locally

pip install -r backend/requirements.txt
# Set your Anvil Uplink key (replace the placeholder in backend.py)
python backend/backend.py

Expected output:

START: Backend Starting, about to connect to Anvil
Anvil Uplink connection established successfully
Loading models...
 CNN model loaded.
 Transformer model loaded.
Backend fully operational

4 — Docker

docker-compose -f backend/docker-compose.yml up --build
docker logs -f <container_id>

5 — Retrain from scratch

# 1. Generate data (CPU-intensive, uses multiprocessing)
python data/generator/mcts_self_play.py

# 2. Open and run training notebook
jupyter notebook training/Connect4_AI_Training.ipynb
# All models saved automatically to models/

🛠️ Tech Stack

  ┌────────────────┬──────────────────────────────────────────────────────┐
  │ Layer          │ Technology                                           │
  ├────────────────┼──────────────────────────────────────────────────────┤
  │ ML Framework   │ TensorFlow 2.12 / Keras                              │
  │ Data Engine    │ MCTS (Monte Carlo Tree Search) · NumPy · multiprocess│
  │ Architectures  │ CNN (Conv2D) · Vision Transformer (ViT-style)        │
  │ Model Format   │ TF SavedModel · Keras .h5                            │
  │ Backend        │ Python 3.10 · Anvil Uplink                           │
  │ Container      │ Docker · Docker Compose                              │
  │ Cloud          │ AWS Lightsail                                        │
  │ Frontend       │ Anvil (Python full-stack web framework)              │
  │ Training Env   │ Google Colab (GPU) · Local CPU (data generation)     │
  └────────────────┴──────────────────────────────────────────────────────┘

📜 License

MIT — see LICENSE.


Built with 🔴🟡 and a lot of MCTS iterations

If you liked this project, consider leaving a ⭐

About

Deep Learning Connect 4 Agent: CNN vs Transformer, MCTS self-play, deployed on AWS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors