Scripts for orchestrating zero-downtime blue/green database migrations using AWS RDS and Route 53.
These scripts were originally developed for a MySQL 5.7 to 8.0 major version upgrade in production, but the strategy and tooling are applicable to any blue/green database migration scenario.
- Blog Post: How We Upgraded Our Core Database With Just 5 Minutes of Downtime - Detailed write-up of the migration strategy and execution
- Sequence Diagram - Visual representation of the exact order of steps
$ tree nonprod
.
├── 1.read-switchover
│ └── 1.point_read_dns_endpoints_to_green.sh
├── 2.read-rollback
│ ├── backup-dns-rollback
│ │ ├── 1.point_read_dns_endpoints_to_blue.sh
│ │ └── 2.kill_connections_on_green_readers.sh
│ └── readme.md
├── 3.prep-write-switchover
│ └── 1.lower_ttl.sh
├── 4.write-switchover
│ ├── 1.set_blue_to_read_only.sh
│ ├── 2.make_green_writable.sh
│ ├── 3.point_write_dns_endpoints_to_green.sh
│ └── 4.kill_connections_on_blue.sh
├── 5.write-rollback
│ ├── 1.set_green_to_read_only.sh
│ ├── 2.make_blueprime_writable.sh
│ ├── 3.point_all_4_dns_to_blueprime.sh
│ └── 4.kill_connections_on_green.sh
├── 6.post-upgrade
│ └── 1.raise_ttl.sh
└── 7.post-rollback
└── 1.raise_ttl.sh
python -m unittest
Example output:
$ python update_record_ttl.py "qa.example.com" "mysql-core-batch-reader.qa.example.com." raise
Successfully updated TTL for mysql-core-batch-reader.qa.example.com. (Type: CNAME) to 60 seconds
Example output:
$ python kill_connections.py --db-port 10012 --dry-run
Found 4 processes to kill
Dry run mode: No processes will be killed
Would kill process ID: 73821
Would kill process ID: 73818
Would kill process ID: 73815
Would kill process ID: 73816
$ python kill_connections.py --db-port 10012
Found 4 processes to kill
Killed processes: [73816]
Killed processes: [73821, 73818, 73815]
All kill commands have been sent