Skip to content

Commit d97e312

Browse files
committed
Code review
1 parent 91073d0 commit d97e312

4 files changed

Lines changed: 105 additions & 42 deletions

File tree

README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Events over SQL.
44

55
Simple, Reliable, Fast.
66

7-
Able to publish and consume thousands of events per second on a single PostgreSQL instance.
7+
Able to publish and consume thousands of events per second on a single Postgres instance.
88

99
With sharding, it can easily support tens of thousands events per second for virtually endless scalability.
1010

@@ -91,4 +91,9 @@ It's a rather acceptable tradeoff and easy to enforce at the library level.
9191

9292
## How to use it
9393

94-
TODO: for now, check out benchmarks/app being an example app.
94+
TODO: for now, check out benchmarks/app being an example app.
95+
96+
97+
## How to get it
98+
99+
TODO

benchmarks/README.md

Lines changed: 93 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -4,60 +4,63 @@ If you care only about numbers, you can find them in the results dir.
44

55
Some background and details:
66
* all benchmarks were run on [DigitalOcean](https://www.digitalocean.com/) infrastructure
7-
* benchmarks were run with both single Postgres instance serving as events backend as well as multiple (sharding)
8-
* there are the following components:
9-
* `app` - simple Spring Boot that uses EventSQL to consume events
10-
* `runner` - script that uses EventSQL to publish events with set per second rate and amount and waits for consumers to finish consumption, gathering relevant stats (it's running benchmarks)
11-
* `events-db` - Postgres serving as a backend for EventSQL; events are published and consumed from it.
12-
Depending on the benchmark, we run it in one or a few (3) instances
13-
* most of the setup to run benchmarks is automated and described below, so it's fairly easy to reproduce
7+
* benchmarks were run with both single Postgres instance serving as the events backend as well as multiple (sharding)
8+
* we have the following components:
9+
* `app` - simple Spring Boot that uses *EventSQL* to consume events
10+
* `runner` - script that uses *EventSQL* to publish events with set per second rate and amount, waits for consumers to finish consumption and gathers relevant stats (it's running benchmarks)
11+
* `events-db` - Postgres serving as a backend for *EventSQL* - events are published to and consumed from it;
12+
depending on the benchmark, it's run in one or a few (3) instances
13+
* most of the setup to run benchmarks is automated and described below; it's fairly straightforward to reproduce
1414

1515
## Infrastructure
1616

17-
Defined in `prepare_infra.py`; sometimes resources are limited by `docker run`, but essentially:
18-
* app (consumer) runs on 2 GB and with 2 CPUs (AMD) machine
19-
* each events-db runs on 8 GB and with 4 CPUs (AMD) machine
20-
* each benchmarks-runner runs alongside events-db, but is throttled to 2 GB memory and 2 CPUs
21-
* there is a basic firewall and virtual private network (vpc) setup, so that nobody is bothering us during tests
17+
Defined in the `prepare_infra.py` script; sometimes resources are limited by `docker run` command, but essentially:
18+
* *benchmarks-app (consumer)* runs on 2 GB and 2 CPUs (AMD) machine
19+
* each *events-db* runs on 8 GB and 4 CPUs (AMD) machine
20+
* each *benchmarks-runner* runs alongside *events-db*, but is throttled to 2 GB memory and 2 CPUs
21+
* there is a basic firewall and virtual private network (vpc) setup (`prepare_infra.py`), so that nobody is bothering us during benchmarks
2222

2323
## Requirements
2424

25-
* DigitalOcean account - you might also use different infrastructure provider, but will need to adjust `prepare_infra.py` script accordingly or write your own
25+
* DigitalOcean account - you might also use a different infrastructure provider but will need to adjust `prepare_infra.py` script accordingly or write your own setup from scratch
2626
* Python 3 & Bash for scripts
2727
* Java 21 + compatible Maven version to build apps
28-
* Docker to dockerize them and run various command (scripts assume non-root, current user, access)
28+
* Docker to dockerize them and run various commands (scripts assume non-root, current user, access)
2929

3030
## Preparation
3131

3232
### Infra
3333

34-
From scripts, dir, Python env setup:
34+
From scripts dir, Python env setup:
3535
```
3636
bash init_python_env.bash
3737
source venv/bin/activate
3838
```
3939

40-
Prepare infra; this can take a while, since we are creating a few machines - one for the consumer app and three for multiple Postgres instances.
40+
The following might take a while, since we are creating a few machines - one for the consumer app and three for multiple Postgres instances:
4141
```
4242
export DO_API_TOKEN=<your DigitalOcean API key>
43-
export SSH_KEY_FINGERPRINT=<fingerprint of your ssh key, uploaded to DigitalOcean, giving you ssh access to machines>
43+
export SSH_KEY_FINGERPRINT=<fingerprint of your ssh key, uploaded to DigitalOcean, giving you ssh access to created machines>
4444
4545
python prepare_infra.py
4646
```
4747

48-
We right now have 4 machines connected with each other by the vpc.
48+
After it finishes, on the DigitalOcean UI we should see something like this:
49+
![droplets](droplets.png)
50+
51+
We right now have four machines connected to each other by the vpc.
4952
To each we have access, using ssh public key authentication, as the `eventsql` user.
50-
Infrastructure is ready, let's prepare the apps.
53+
Infrastructure is now ready, let's prepare apps.
5154

52-
### Build apps
55+
### Apps
5356

5457
Let's build `events-db` (from scripts dir again):
5558
```
5659
export APP=events-db
5760
bash build_and_package.bash
5861
```
5962

60-
Let's build `app`:
63+
Let's build `app` (consumer):
6164
```
6265
export APP=app
6366
export DB0_HOST="<db0 private ip>"
@@ -66,37 +69,37 @@ export DB2_HOST="<db2 private ip>"
6669
bash build_and_package.bash
6770
```
6871

69-
Private ips can be taken from DigitalOcean UI - only they will work, public ips will not, since we have set up a firewall blocking traffic of this kind.
72+
Private ips can be taken from the DigitalOcean UI - only they will work, public ips will not, since we have set up a firewall blocking traffic of this kind.
7073

7174
Finally, let's build `runner`:
7275
```
7376
export APP=runner
7477
bash build_and_package.bash
7578
```
7679

77-
### Deploy apps
80+
### Deployment
7881

79-
As all apps are now ready, let's deploy them!
82+
As all apps are now packaged and ready, let's deploy them!
8083

81-
We deploy by copying gzipped Docker images + load and run scripts to the target machines.
84+
We deploy by copying gzipped Docker images alongside with load and run scripts to the target machines.
8285

83-
Three events-dbs:
86+
Three `events-dbs`:
8487
```
8588
export EVENTS_DB0_HOST=<ip of events-db-0 machine"
8689
export EVENTS_DB1_HOST=<ip of events-db-1 machine"
8790
export EVENTS_DB2_HOST=<ip of events-db-2 machine"
8891
bash deploy_events_dbs.bash
8992
```
9093

91-
App:
94+
`app`:
9295
```
9396
export APP_HOST=<ip of consumer app machine>
9497
bash deploy_app.bash
9598
```
9699

97100
All dbs and app are running now.
98-
With runners it is slightly different - we will copy them to target machines, but not run them just yet.
99-
They will run on the same machines dbs are hosted; each db has a corresponding benchmarks runner:
101+
With `benchmark-runners` it is slightly different - we will copy them to target machines but not run just yet.
102+
They will run on the same machines dbs are hosted; each db has a corresponding benchmarks-runner:
100103
```
101104
export EVENTS_DB0_HOST=<ip of events-db-0 machine"
102105
export EVENTS_DB1_HOST=<ip of events-db-1 machine"
@@ -112,16 +115,25 @@ Everything is now ready to run various benchmarks.
112115

113116
Let's start with single db cases.
114117

115-
First, copy and run `collect_docker_stats.bash` script to one of the events dbs machine and start collecting them:
118+
First, copy and run `collect_docker_stats.bash` script to one of the events dbs machine and start collecting stats:
116119
```
117120
scp collect_docker_stats.bash eventsql@<events-db-ip>:/home/eventsql
118121
ssh eventsql@<events-db-ip>
119122
bash collect_docker_stats.bash
123+
124+
Removing previous stats file, if exists...
125+
126+
Collecting docker stats to /tmp/docker_stats.txt...
127+
Stats collected, sleeping for 10 s...
128+
...
129+
Collecting docker stats to /tmp/docker_stats.txt...
130+
Stats collected, sleeping for 10 s...
131+
...
120132
```
121133

122-
You might do the same for the consumer machine to have those stats as well.
134+
You might do the same for the consumer machine to its stats as well.
123135

124-
Finally, run various benchmarks:
136+
Finally, let's run various benchmarks:
125137
```
126138
export RUNNER_HOST=<events-db-ip>
127139
export EVENTS_RATE=1000
@@ -140,9 +152,9 @@ bash run_single_db_benchmark.bash
140152

141153
### Multiple dbs
142154

143-
It's almost the same, difference being that we need to repeat steps on the all machines, more or less simultaneously.
155+
It's almost the same, the difference being that we need to repeat steps on all machines, more or less simultaneously.
144156

145-
To simplify it, I've prepared a script that does it.
157+
For simplicity, I've prepared a script that does it.
146158
So, all we have to do is:
147159
```
148160
export RUNNER0_HOST=<events-db-0-ip>
@@ -164,4 +176,50 @@ We have 3 dbs (shards), so real rates are:
164176
3 * 5000 = 15 000 per second
165177
3 * 10000 = 30 000 per second
166178
```
167-
...which is quite a lot!
179+
...which is quite a lot!
180+
181+
As you can see in the results, we got pretty close to these rates:
182+
```
183+
...
184+
185+
Publishing 300000 events with 5000 per second rate took: PT1M3.436S, which means 4729 per second rate
186+
3 runner instances were running in parallel, so the real rate was 14187 per second for 900000 events
187+
188+
...
189+
190+
Waiting for consumption....
191+
192+
...
193+
194+
Consumer of 2 partition is at the event 2588597, but latest event is 2590000; waiting for 1s...
195+
Consumer of 3 partition is at the event 2589971, but latest event is 2589996; waiting for 1s...
196+
Consumer of 3 partition is at the event 2589971, but latest event is 2589996; waiting for 1s...
197+
198+
...
199+
200+
Consuming 300000 events with 5000 per second rate took: PT1M6.875S, which means 4486 per second rate
201+
3 runner instances were running in parallel, so the real rate was 13458 per second for 900000 events
202+
203+
...
204+
205+
...
206+
207+
Publishing 600000 events with 10000 per second rate took: PT1M11.099S, which means 8438 per second rate
208+
3 runner instances were running in parallel, so the real rate was 25314 per second for 1800000 events
209+
210+
...
211+
212+
Waiting for consumption....
213+
214+
...
215+
216+
Consumer of 0 partition is at the event 3187024, but latest event is 3189999; waiting for 1s...
217+
218+
...
219+
220+
Consuming 600000 events with 10000 per second rate took: PT1M12.561S, which means 8268 per second rate
221+
3 runner instances were running in parallel, so the real rate was 24804 per second for 1800000 events
222+
223+
...
224+
225+
```

benchmarks/droplets.png

111 KB
Loading

benchmarks/scripts/init_machine.bash

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/bin/bash
22
set -euo pipefail
33

4-
# Create user and setup passwordless sudo to simplify admin tasks
4+
# Create user and set up passwordless sudo to simplify admin tasks
55
useradd --create-home --shell "/bin/bash" --groups sudo "_user_placeholder_"
66
echo "_user_placeholder_ ALL=(ALL) NOPASSWD: ALL" | EDITOR='tee -a' visudo
77

@@ -21,24 +21,24 @@ sed --in-place 's/^KerberosAuthentication.*/KerberosAuthentication no/g' /etc/ss
2121
sed --in-place 's/^GSSAPIAuthentication.*/GSSAPIAuthentication no/g' /etc/ssh/sshd_config
2222
if sshd -t -q; then systemctl restart ssh; fi
2323

24-
# Install docker & allow non sudo access
24+
# Install docker & allow non-sudo access
2525
apt-get update
2626
apt-get install ca-certificates curl
2727
install -m 0755 -d /etc/apt/keyrings
2828
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
2929
chmod a+r /etc/apt/keyrings/docker.asc
3030

31-
# Add the repository to Apt sources:
31+
# Add the repository to apt sources:
3232
echo \
3333
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
3434
$(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
3535
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
3636
apt-get update
3737

38-
# Finally, install Docker:
38+
# Finally, install docker:
3939
apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
4040

41-
# Allow non root access to a docker
41+
# Allow non-root access to docker
4242
usermod -aG docker _user_placeholder_
4343
# limit docker logs size
4444
echo '{

0 commit comments

Comments
 (0)