Skip to content

Commit 074e91a

Browse files
authored
Merge pull request #4 from BinaryIgor/benchmarks2
Benchmarks 2.0
2 parents 2fe4d8f + d97e312 commit 074e91a

43 files changed

Lines changed: 1708 additions & 635 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# Maven/Java
22
target/
33
out/
4+
# Python/Scripts
5+
venv/
6+
.venv/
7+
dist/
48
# Intellij
59
.idea/
610
*.iml

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
Events over SQL.
44

5+
Simple, Reliable, Fast.
6+
7+
Able to publish and consume thousands of events per second on a single Postgres instance.
8+
9+
With sharding, it can easily support tens of thousands events per second for virtually endless scalability.
10+
11+
For scalability details, see [benchmarks](/benchmarks/README.md).
12+
513
## How it works
614

715
We just need to have three tables:
@@ -80,3 +88,12 @@ WHERE topic = :topic AND name = :c_name AND partition = 0;
8088
Limitation being that if consumer is partitioned, it must have the exact same number of partition as in the topic
8189
definition.
8290
It's a rather acceptable tradeoff and easy to enforce at the library level.
91+
92+
## How to use it
93+
94+
TODO: for now, check out benchmarks/app being an example app.
95+
96+
97+
## How to get it
98+
99+
TODO

TODO.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
* performance benchmarks on infra & scripts to reproduce them
21
* usage examples
32
* just pub/sub
43
* giving access to event tables as a means of a simple export - since they are all there
54
* expiring events/TTL?
65
* compact topics - unique key
76
* join, aka streams
8-
* more elaborate definitions change support
7+
* more elaborate definitions change support - especially around partitions growth & shrinkage
98
* JavaDocs
109
* Support schemas init in registry - why require schemas from users, if it is always the same?

benchmarks/README.md

Lines changed: 219 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,225 @@
11
# Benchmarks
22

3-
Various benchmarks to show performance of EventSQL.
3+
If you care only about numbers, you can find them in the results dir.
44

5-
## Queries
5+
Some background and details:
6+
* all benchmarks were run on [DigitalOcean](https://www.digitalocean.com/) infrastructure
7+
* benchmarks were run with both single Postgres instance serving as the events backend as well as multiple (sharding)
8+
* we have the following components:
9+
* `app` - simple Spring Boot that uses *EventSQL* to consume events
10+
* `runner` - script that uses *EventSQL* to publish events with set per second rate and amount, waits for consumers to finish consumption and gathers relevant stats (it's running benchmarks)
11+
* `events-db` - Postgres serving as a backend for *EventSQL* - events are published to and consumed from it;
12+
depending on the benchmark, it's run in one or a few (3) instances
13+
* most of the setup to run benchmarks is automated and described below; it's fairly straightforward to reproduce
14+
15+
## Infrastructure
16+
17+
Defined in the `prepare_infra.py` script; sometimes resources are limited by `docker run` command, but essentially:
18+
* *benchmarks-app (consumer)* runs on 2 GB and 2 CPUs (AMD) machine
19+
* each *events-db* runs on 8 GB and 4 CPUs (AMD) machine
20+
* each *benchmarks-runner* runs alongside *events-db*, but is throttled to 2 GB memory and 2 CPUs
21+
* there is a basic firewall and virtual private network (vpc) setup (`prepare_infra.py`), so that nobody is bothering us during benchmarks
22+
23+
## Requirements
24+
25+
* DigitalOcean account - you might also use a different infrastructure provider but will need to adjust `prepare_infra.py` script accordingly or write your own setup from scratch
26+
* Python 3 & Bash for scripts
27+
* Java 21 + compatible Maven version to build apps
28+
* Docker to dockerize them and run various commands (scripts assume non-root, current user, access)
29+
30+
## Preparation
31+
32+
### Infra
33+
34+
From scripts dir, Python env setup:
35+
```
36+
bash init_python_env.bash
37+
source venv/bin/activate
38+
```
39+
40+
The following might take a while, since we are creating a few machines - one for the consumer app and three for multiple Postgres instances:
41+
```
42+
export DO_API_TOKEN=<your DigitalOcean API key>
43+
export SSH_KEY_FINGERPRINT=<fingerprint of your ssh key, uploaded to DigitalOcean, giving you ssh access to created machines>
44+
45+
python prepare_infra.py
46+
```
47+
48+
After it finishes, on the DigitalOcean UI we should see something like this:
49+
![droplets](droplets.png)
50+
51+
We right now have four machines connected to each other by the vpc.
52+
To each we have access, using ssh public key authentication, as the `eventsql` user.
53+
Infrastructure is now ready, let's prepare apps.
54+
55+
### Apps
56+
57+
Let's build `events-db` (from scripts dir again):
58+
```
59+
export APP=events-db
60+
bash build_and_package.bash
61+
```
62+
63+
Let's build `app` (consumer):
64+
```
65+
export APP=app
66+
export DB0_HOST="<db0 private ip>"
67+
export DB1_HOST="<db1 private ip>"
68+
export DB2_HOST="<db2 private ip>"
69+
bash build_and_package.bash
70+
```
71+
72+
Private ips can be taken from the DigitalOcean UI - only they will work, public ips will not, since we have set up a firewall blocking traffic of this kind.
73+
74+
Finally, let's build `runner`:
75+
```
76+
export APP=runner
77+
bash build_and_package.bash
78+
```
79+
80+
### Deployment
81+
82+
As all apps are now packaged and ready, let's deploy them!
83+
84+
We deploy by copying gzipped Docker images alongside with load and run scripts to the target machines.
85+
86+
Three `events-dbs`:
87+
```
88+
export EVENTS_DB0_HOST=<ip of events-db-0 machine"
89+
export EVENTS_DB1_HOST=<ip of events-db-1 machine"
90+
export EVENTS_DB2_HOST=<ip of events-db-2 machine"
91+
bash deploy_events_dbs.bash
92+
```
93+
94+
`app`:
95+
```
96+
export APP_HOST=<ip of consumer app machine>
97+
bash deploy_app.bash
98+
```
99+
100+
All dbs and app are running now.
101+
With `benchmark-runners` it is slightly different - we will copy them to target machines but not run just yet.
102+
They will run on the same machines dbs are hosted; each db has a corresponding benchmarks-runner:
103+
```
104+
export EVENTS_DB0_HOST=<ip of events-db-0 machine"
105+
export EVENTS_DB1_HOST=<ip of events-db-1 machine"
106+
export EVENTS_DB2_HOST=<ip of events-db-2 machine"
107+
bash deploy_runners.bash
108+
```
109+
110+
Everything is now ready to run various benchmarks.
111+
112+
## Running benchmarks
113+
114+
### Single db
115+
116+
Let's start with single db cases.
117+
118+
First, copy and run `collect_docker_stats.bash` script to one of the events dbs machine and start collecting stats:
119+
```
120+
scp collect_docker_stats.bash eventsql@<events-db-ip>:/home/eventsql
121+
ssh eventsql@<events-db-ip>
122+
bash collect_docker_stats.bash
123+
124+
Removing previous stats file, if exists...
125+
126+
Collecting docker stats to /tmp/docker_stats.txt...
127+
Stats collected, sleeping for 10 s...
128+
...
129+
Collecting docker stats to /tmp/docker_stats.txt...
130+
Stats collected, sleeping for 10 s...
131+
...
132+
```
133+
134+
You might do the same for the consumer machine to its stats as well.
135+
136+
Finally, let's run various benchmarks:
137+
```
138+
export RUNNER_HOST=<events-db-ip>
139+
export EVENTS_RATE=1000
140+
# EVENTS_RATE * 60 for benchmark to last approximately 1 minute
141+
export EVENTS_TO_PUBLISH=60000
142+
bash run_single_db_benchmark.bash
143+
144+
export EVENTS_RATE=5000
145+
export EVENTS_TO_PUBLISH=300000
146+
bash run_single_db_benchmark.bash
147+
148+
export EVENTS_RATE=10000
149+
export EVENTS_TO_PUBLISH=600000
150+
bash run_single_db_benchmark.bash
151+
```
152+
153+
### Multiple dbs
154+
155+
It's almost the same, the difference being that we need to repeat steps on all machines, more or less simultaneously.
156+
157+
For simplicity, I've prepared a script that does it.
158+
So, all we have to do is:
159+
```
160+
export RUNNER0_HOST=<events-db-0-ip>
161+
export RUNNER1_HOST=<events-db-1-ip>
162+
export RUNNER2_HOST=<events-db-2-ip>
163+
164+
export EVENTS_RATE=5000
165+
# EVENTS_RATE * 60 for benchmark to last approximately 1 minute
166+
export EVENTS_TO_PUBLISH=300000
167+
bash run_multiple_dbs_benchmark.bash
168+
169+
export EVENTS_RATE=10000
170+
export EVENTS_TO_PUBLISH=600000
171+
bash run_multiple_dbs_benchmark.bash
172+
```
173+
174+
We have 3 dbs (shards), so real rates are:
175+
```
176+
3 * 5000 = 15 000 per second
177+
3 * 10000 = 30 000 per second
6178
```
7-
select id, convert_from(value, 'UTF8')::json from account_created_event limit 10;
8-
create index account_created_event_email
9-
on account_created_event ((encode(value, 'escape')::json->>'email'));
179+
...which is quite a lot!
180+
181+
As you can see in the results, we got pretty close to these rates:
10182
```
183+
...
184+
185+
Publishing 300000 events with 5000 per second rate took: PT1M3.436S, which means 4729 per second rate
186+
3 runner instances were running in parallel, so the real rate was 14187 per second for 900000 events
187+
188+
...
189+
190+
Waiting for consumption....
191+
192+
...
193+
194+
Consumer of 2 partition is at the event 2588597, but latest event is 2590000; waiting for 1s...
195+
Consumer of 3 partition is at the event 2589971, but latest event is 2589996; waiting for 1s...
196+
Consumer of 3 partition is at the event 2589971, but latest event is 2589996; waiting for 1s...
197+
198+
...
199+
200+
Consuming 300000 events with 5000 per second rate took: PT1M6.875S, which means 4486 per second rate
201+
3 runner instances were running in parallel, so the real rate was 13458 per second for 900000 events
202+
203+
...
204+
205+
...
206+
207+
Publishing 600000 events with 10000 per second rate took: PT1M11.099S, which means 8438 per second rate
208+
3 runner instances were running in parallel, so the real rate was 25314 per second for 1800000 events
209+
210+
...
211+
212+
Waiting for consumption....
213+
214+
...
215+
216+
Consumer of 0 partition is at the event 3187024, but latest event is 3189999; waiting for 1s...
217+
218+
...
219+
220+
Consuming 600000 events with 10000 per second rate took: PT1M12.561S, which means 8268 per second rate
221+
3 runner instances were running in parallel, so the real rate was 24804 per second for 1800000 events
222+
223+
...
11224
12-
## TODO
13-
* sharding version tests -> endpoint to see when it's ready
225+
```
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
app="benchmarks-app"
5+
app_dir="app"
6+
tag="${TAG:-latest}"
7+
tagged_image="${app}:${tag}"
8+
9+
echo "Creating package in dist directory for $tagged_image image..."
10+
echo "Preparing dist dir..."
11+
12+
rm -r -f dist
13+
mkdir dist
14+
15+
echo "Building jar..."
16+
17+
mvn clean install
18+
19+
echo "Building image..."
20+
21+
docker build . -t "$tagged_image"
22+
23+
gzipped_image_path="dist/$app.tar.gz"
24+
25+
echo "Image built, exporting it to $gzipped_image_path, this can take a while..."
26+
27+
docker save "$tagged_image" | gzip > ${gzipped_image_path}
28+
29+
echo "Image exported, preparing scripts..."
30+
31+
export DB0_HOST=${DB0_HOST:-localhost}
32+
export DB0_URL="jdbc:postgresql://$DB0_HOST:5432/events"
33+
export DB0_ENABLED=${DB0_ENABLED:-true}
34+
35+
export DB1_HOST=${DB1_HOST:-localhost}
36+
export DB1_URL="jdbc:postgresql://$DB1_HOST:5432/events"
37+
export DB1_ENABLED=${DB1_ENABLED:-true}
38+
39+
export DB2_HOST=${DB2_HOST:-localhost}
40+
export DB2_URL="jdbc:postgresql://$DB2_HOST:5432/events"
41+
export DB2_ENABLED=${DB2_ENABLED:-true}
42+
43+
export app=$app
44+
export tag=$tag
45+
export run_cmd="docker run -d \\
46+
-e DB0_URL=\"$DB0_URL\" -e DB0_ENABLED=\"$DB0_ENABLED\" \\
47+
-e DB1_URL=\"$DB1_URL\" -e DB1_ENABLED=\"$DB1_ENABLED\" \\
48+
-e DB2_URL=\"$DB2_URL\" -e DB2_ENABLED=\"$DB2_ENABLED\" \\
49+
--network host --restart unless-stopped \\
50+
--name $app $tagged_image"
51+
52+
cd ..
53+
envsubst '${app} ${tag}' < scripts/template_load_and_run_app.bash > $app_dir/dist/load_and_run_app.bash
54+
envsubst '${app} ${run_cmd}' < scripts/template_run_app.bash > $app_dir/dist/run_app.bash
55+
56+
echo "Package prepared."

benchmarks/app/build_and_run.bash

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)