Skip to content

modi02/mini-google-drive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mini Google Drive

A fully distributed file storage system built for the Distributed Systems course at SVNIT Surat. Implements leader election, round-robin load balancing, data replication, shared metadata, and automatic failover.

Java Spring Boot Docker MySQL Nginx


Architecture

Client (Browser / curl)
         |
         β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚    Nginx    β”‚  ← Reverse proxy. Single entry point. Auto-failover.
   β”‚   :80       β”‚
   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
          β”‚ routes to active master
   β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚         Layer 1             β”‚  LEADER ELECTION
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚ Master-1 β”‚ β”‚ Master-2 β”‚  β”‚  Primary + Backup
   β”‚  β”‚  :8080   β”‚ β”‚  :8090   β”‚  β”‚  Promotes in 5s on failure
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ round-robin
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚         Layer 2             β”‚  ROUND ROBIN LOAD BALANCING
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚ Server-1 β”‚ β”‚ Server-2 β”‚  β”‚  File logic + MySQL access
   β”‚  β”‚  :8081   β”‚ β”‚  :8082   β”‚  β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ replication factor = 2
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚         Layer 3             β”‚  FILE STORAGE
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚Storage-1 β”‚ β”‚Storage-2 β”‚  β”‚  Actual file bytes
   β”‚  β”‚  :8091   β”‚ β”‚  :8092   β”‚  β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          ↕
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚    MySQL    β”‚  ← Shared metadata. All server nodes read/write same DB.
   β”‚   :3306     β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

mini-google-drive/
β”œβ”€β”€ master-server/                          # Spring Boot β€” Master nodes
β”‚   β”œβ”€β”€ src/main/java/com/minicloud/master/
β”‚   β”‚   β”œβ”€β”€ controller/
β”‚   β”‚   β”‚   └── MasterController.java       # REST endpoints, leader check, round-robin routing
β”‚   β”‚   β”œβ”€β”€ service/
β”‚   β”‚   β”‚   β”œβ”€β”€ LeaderElectionService.java  # Primary/backup election logic
β”‚   β”‚   β”‚   β”œβ”€β”€ ServerNodeRouter.java       # Round-robin across server nodes
β”‚   β”‚   β”‚   β”œβ”€β”€ ConsistentHashService.java  # Hash ring for storage node selection
β”‚   β”‚   β”‚   β”œβ”€β”€ MasterService.java          # Core master logic
β”‚   β”‚   β”‚   └── NodeHealthService.java      # Storage node health checks
β”‚   β”‚   └── model/
β”‚   β”‚       β”œβ”€β”€ FileMetadata.java
β”‚   β”‚       └── StorageNode.java
β”‚   β”œβ”€β”€ src/main/resources/
β”‚   β”‚   β”œβ”€β”€ static/
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard.html              # Live monitoring dashboard
β”‚   β”‚   β”‚   └── index.html                  # File upload/download web UI
β”‚   β”‚   └── application.properties
β”‚   └── Dockerfile
β”‚
β”œβ”€β”€ server-node/                            # Spring Boot β€” NEW in Phase 2
β”‚   β”œβ”€β”€ src/main/java/com/minidrive/servernode/
β”‚   β”‚   β”œβ”€β”€ FileController.java             # REST: /upload /download /files /health
β”‚   β”‚   β”œβ”€β”€ FileService.java                # Upload with replication, download with fallback
β”‚   β”‚   β”œβ”€β”€ FileMetadata.java               # JPA entity β†’ file_metadata table
β”‚   β”‚   β”œβ”€β”€ FileMetadataRepository.java     # Spring Data JPA repository
β”‚   β”‚   β”œβ”€β”€ StorageNodeClient.java          # HTTP client for storage nodes
β”‚   β”‚   └── ServerNodeApplication.java      # Main class
β”‚   β”œβ”€β”€ src/main/resources/
β”‚   β”‚   └── application.properties
β”‚   └── Dockerfile
β”‚
β”œβ”€β”€ storage-node/                           # Spring Boot β€” File storage (Phase 1)
β”‚   β”œβ”€β”€ src/main/java/com/minicloud/storagenode/
β”‚   β”‚   β”œβ”€β”€ controller/FileController.java  # /files/upload /files/download /health
β”‚   β”‚   └── service/FileStorageService.java
β”‚   └── Dockerfile
β”‚
β”œβ”€β”€ docker-compose.yml                      # All 9 containers
β”œβ”€β”€ nginx.conf                              # Reverse proxy config
β”œβ”€β”€ init.sql                                # MySQL table creation
└── README.md

Quick Start

Prerequisites

  • Docker Desktop running
  • Git

Run Everything

git clone https://github.com/modi02/mini-google-drive
cd mini-google-drive
docker-compose build --no-cache
docker-compose up

Wait ~60 seconds for all 9 containers to be healthy.

Open in Browser

URL Description
http://localhost/ File upload/download web UI
http://localhost/dashboard.html Live monitoring dashboard

API Reference

All requests go through Nginx at http://localhost (port 80).

Master Endpoints

Method Endpoint Description
GET /master/health Leader status, peer alive status
GET /master/status Full cluster: all nodes, alive lists, leader info
POST /master/upload Upload file (multipart/form-data, field: file)
GET /master/download/{fileName} Download file by name
GET /master/files List all files from MySQL

Example curl Commands

# Upload
echo "hello world" > test.txt
curl -X POST http://localhost/master/upload -F file=@test.txt

# Download
curl -O http://localhost/master/download/test.txt

# List files
curl http://localhost/master/files

# Check cluster status
curl http://localhost/master/status

# Check individual masters
curl http://localhost:8080/master/health   # master-1
curl http://localhost:8090/master/health   # master-2

Docker Services

Container Image Ports IP
nginx nginx:1.25-alpine 80:80 172.20.0.8
master-1 build: ./master-server 8080:8080 172.20.0.2
master-2 build: ./master-server 8090:8080 172.20.0.9
server-node-1 build: ./server-node 8081:8081 172.20.0.5
server-node-2 build: ./server-node 8082:8082 172.20.0.6
storage-node-1 build: ./storage-node 8091:8091 172.20.0.3
storage-node-2 build: ./storage-node 8092:8092 172.20.0.4
mysql mysql:8.0 3306:3306 172.20.0.10

Fault Tolerance Demo

1. Leader Election β€” Kill Primary Master

# Check current leader
curl http://localhost:8080/master/health
# β†’ {"isLeader":true, "status":"UP"}

# Kill primary
docker kill master-1

# Wait 6 seconds β€” backup promotes itself
curl http://localhost:8090/master/health
# β†’ {"isLeader":true, "status":"UP"}  ← backup is now leader!

# Bring primary back β€” it becomes backup
docker start master-1
curl http://localhost:8080/master/health
# β†’ {"isLeader":false, "status":"UP"}  ← back as backup

2. Storage Node Fault Tolerance

# Upload a file
echo "test data" > test.txt
curl -X POST http://localhost/master/upload -F file=@test.txt

# Kill one storage node
docker stop storage-node-1

# Download still works β€” falls back to storage-node-2
curl -O http://localhost/master/download/test.txt
cat test.txt  # β†’ test data

3. Round Robin Load Balancing

# Upload multiple files and watch docker logs
# Requests alternate between server-node-1 and server-node-2
docker logs master-1 2>&1 | grep "Round robin selected"

Distributed Computing Concepts

Concept Implementation
Consistent Hashing Storage node selection β€” minimizes remapping when nodes change
Data Replication Every file stored on all storage nodes (replication factor = 2)
Leader Election Simplified Raft β€” primary/backup masters, promotes in 5s
Fault Tolerance Download falls back to replica if storage node is down
Load Balancing Round-robin across server nodes using AtomicInteger
Shared State MySQL as distributed metadata store β€” all nodes in sync
Health Monitoring Periodic pings every 5-10s, automatic alive-list maintenance
Reverse Proxy Nginx β€” single entry point, transparent master failover
CAP Theorem AP system β€” Available + Partition Tolerant
Eventual Consistency MySQL metadata may briefly lag under high load

Database Schema

-- File metadata (one row per uploaded file)
CREATE TABLE file_metadata (
    id           BIGINT AUTO_INCREMENT PRIMARY KEY,
    file_name    VARCHAR(255) NOT NULL,
    file_size    BIGINT NOT NULL,
    content_type VARCHAR(100),
    checksum     VARCHAR(64),
    storage_nodes VARCHAR(500),   -- "http://storage-node-1:8091,http://storage-node-2:8092"
    uploaded_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status       VARCHAR(20) DEFAULT 'ACTIVE'
);

-- Server node registry
CREATE TABLE server_nodes (
    id             BIGINT AUTO_INCREMENT PRIMARY KEY,
    node_url       VARCHAR(255) NOT NULL UNIQUE,
    status         VARCHAR(20) DEFAULT 'UP',
    last_heartbeat TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    registered_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Phase Comparison

Feature Phase 1 Phase 2
Master nodes 1 (SPOF) 2 (leader election)
Metadata storage In-memory HashMap Shared MySQL
Middle layer None Server nodes x2
Entry point Direct :8080 Nginx :80
Master failover Manual Automatic (5s)
Containers 3 9
File persistence across restart
Multi-node consistency

Known Limitations

  • MySQL SPOF β€” MySQL itself has no replication. Production fix: MySQL Galera Cluster or etcd
  • Split-brain window β€” 5-second window where both masters may think they are leader. Production fix: full Raft consensus
  • No partial write recovery β€” if server node crashes mid-upload, file may be partially stored
  • Synchronous replication β€” upload waits for all storage nodes. Slower but consistent

Course: Distributed Systems β€” B.Tech CSE, SVNIT Surat Academic Year: 2025-26

About

A distributed file storage system using Java, Spring Boot & Docker, featuring consistent hashing, file replication across nodes, fault tolerance, and a live health dashboard. Inspired by Amazon S3, HDFS, and Dropbox.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors