|
1 | 1 | # Benchmarks |
2 | 2 |
|
3 | | -Various benchmarks to show performance of EventSQL. |
| 3 | +If you care only about numbers, you can find them in the results dir. |
4 | 4 |
|
5 | | -## Queries |
| 5 | +Some background and details: |
| 6 | +* all benchmarks were run on [DigitalOcean](https://www.digitalocean.com/) infrastructure |
| 7 | +* benchmarks were run with both single Postgres instance serving as events backend as well as multiple (sharding) |
| 8 | +* there are the following components: |
| 9 | + * `app` - simple Spring Boot that uses EventSQL to consume events |
| 10 | + * `runner` - script that uses EventSQL to publish events with set per second rate and amount and waits for consumers to finish consumption, gathering relevant stats (it's running benchmarks) |
| 11 | + * `events-db` - Postgres serving as a backend for EventSQL; events are published and consumed from it. |
| 12 | + Depending on the benchmark, we run it in one or a few (3) instances |
| 13 | +* most of the setup to run benchmarks is automated and described below, so it's fairly easy to reproduce |
| 14 | + |
| 15 | +## Infrastructure |
| 16 | + |
| 17 | +Defined in `prepare_infra.py`; sometimes resources are limited by `docker run`, but essentially: |
| 18 | +* app (consumer) runs on 2 GB and with 2 CPUs (AMD) machine |
| 19 | +* each events-db runs on 8 GB and with 4 CPUs (AMD) machine |
| 20 | +* each benchmarks-runner runs alongside events-db, but is throttled to 2 GB memory and 2 CPUs |
| 21 | +* there is a basic firewall and virtual private network (vpc) setup, so that nobody is bothering us during tests |
| 22 | + |
| 23 | +## Requirements |
| 24 | + |
| 25 | +* DigitalOcean account - you might also use different infrastructure provider, but will need to adjust `prepare_infra.py` script accordingly or write your own |
| 26 | +* Python 3 & Bash for scripts |
| 27 | +* Java 21 + compatible Maven version to build apps |
| 28 | +* Docker to dockerize them and run various command (scripts assume non-root, current user, access) |
| 29 | + |
| 30 | +## Preparation |
| 31 | + |
| 32 | +### Infra |
| 33 | + |
| 34 | +From scripts, dir, Python env setup: |
| 35 | +``` |
| 36 | +bash init_python_env.bash |
| 37 | +source venv/bin/activate |
| 38 | +``` |
| 39 | + |
| 40 | +Prepare infra; this can take a while, since we are creating a few machines - one for the consumer app and three for multiple Postgres instances. |
| 41 | +``` |
| 42 | +export DO_API_TOKEN=<your DigitalOcean API key> |
| 43 | +export SSH_KEY_FINGERPRINT=<fingerprint of your ssh key, uploaded to DigitalOcean, giving you ssh access to machines> |
| 44 | +
|
| 45 | +python prepare_infra.py |
| 46 | +``` |
| 47 | + |
| 48 | +We right now have 4 machines connected with each other by the vpc. |
| 49 | +To each we have access, using ssh public key authentication, as the `eventsql` user. |
| 50 | +Infrastructure is ready, let's prepare the apps. |
| 51 | + |
| 52 | +### Build apps |
| 53 | + |
| 54 | +Let's build `events-db` (from scripts dir again): |
| 55 | +``` |
| 56 | +export APP=events-db |
| 57 | +bash build_and_package.bash |
| 58 | +``` |
| 59 | + |
| 60 | +Let's build `app`: |
| 61 | +``` |
| 62 | +export APP=app |
| 63 | +export DB0_HOST="<db0 private ip>" |
| 64 | +export DB1_HOST="<db1 private ip>" |
| 65 | +export DB2_HOST="<db2 private ip>" |
| 66 | +bash build_and_package.bash |
| 67 | +``` |
| 68 | + |
| 69 | +Private ips can be taken from DigitalOcean UI - only they will work, public ips will not, since we have set up a firewall blocking traffic of this kind. |
| 70 | + |
| 71 | +Finally, let's build `runner`: |
| 72 | +``` |
| 73 | +export APP=runner |
| 74 | +bash build_and_package.bash |
| 75 | +``` |
| 76 | + |
| 77 | +### Deploy apps |
| 78 | + |
| 79 | +As all apps are now ready, let's deploy them! |
| 80 | + |
| 81 | +We deploy by copying gzipped Docker images + load and run scripts to the target machines. |
| 82 | + |
| 83 | +Three events-dbs: |
6 | 84 | ``` |
7 | | -select id, convert_from(value, 'UTF8')::json from account_created_event limit 10; |
8 | | -create index account_created_event_email |
9 | | -on account_created_event ((encode(value, 'escape')::json->>'email')); |
| 85 | +export EVENTS_DB0_HOST=<ip of events-db-0 machine" |
| 86 | +export EVENTS_DB1_HOST=<ip of events-db-1 machine" |
| 87 | +export EVENTS_DB2_HOST=<ip of events-db-2 machine" |
| 88 | +bash deploy_events_dbs.bash |
10 | 89 | ``` |
11 | 90 |
|
12 | | -## TODO |
13 | | -* sharding version tests -> endpoint to see when it's ready |
| 91 | +App: |
| 92 | +``` |
| 93 | +export APP_HOST=<ip of consumer app machine> |
| 94 | +bash deploy_app.bash |
| 95 | +``` |
| 96 | + |
| 97 | +All dbs and app are running now. |
| 98 | +With runners it is slightly different - we will copy them to target machines, but not run them just yet. |
| 99 | +They will run on the same machines dbs are hosted; each db has a corresponding benchmarks runner: |
| 100 | +``` |
| 101 | +export EVENTS_DB0_HOST=<ip of events-db-0 machine" |
| 102 | +export EVENTS_DB1_HOST=<ip of events-db-1 machine" |
| 103 | +export EVENTS_DB2_HOST=<ip of events-db-2 machine" |
| 104 | +bash deploy_runners.bash |
| 105 | +``` |
| 106 | + |
| 107 | +Everything is now ready to run various benchmarks. |
| 108 | + |
| 109 | +## Running benchmarks |
| 110 | + |
| 111 | +### Single db |
| 112 | + |
| 113 | +Let's start with single db cases. |
| 114 | + |
| 115 | +First, copy and run `collect_docker_stats.bash` script to one of the events dbs machine and start collecting them: |
| 116 | +``` |
| 117 | +scp collect_docker_stats.bash eventsql@<events-db-ip>:/home/eventsql |
| 118 | +ssh eventsql@<events-db-ip> |
| 119 | +bash collect_docker_stats.bash |
| 120 | +``` |
| 121 | + |
| 122 | +You might do the same for the consumer machine to have those stats as well. |
| 123 | + |
| 124 | +Finally, run various benchmarks: |
| 125 | +``` |
| 126 | +export RUNNER_HOST=<events-db-ip> |
| 127 | +export EVENTS_RATE=1000 |
| 128 | +# EVENTS_RATE * 60 for benchmark to last approximately 1 minute |
| 129 | +export EVENTS_TO_PUBLISH=60000 |
| 130 | +bash run_single_db_benchmark.bash |
| 131 | +
|
| 132 | +export EVENTS_RATE=5000 |
| 133 | +export EVENTS_TO_PUBLISH=300000 |
| 134 | +bash run_single_db_benchmark.bash |
| 135 | +
|
| 136 | +export EVENTS_RATE=10000 |
| 137 | +export EVENTS_TO_PUBLISH=600000 |
| 138 | +bash run_single_db_benchmark.bash |
| 139 | +``` |
| 140 | + |
| 141 | +### Multiple dbs |
| 142 | + |
| 143 | +It's almost the same, difference being that we need to repeat steps on the all machines, more or less simultaneously. |
| 144 | + |
| 145 | +To simplify it, I've prepared a script that does it. |
| 146 | +So, all we have to do is: |
| 147 | +``` |
| 148 | +export RUNNER0_HOST=<events-db-0-ip> |
| 149 | +export RUNNER1_HOST=<events-db-1-ip> |
| 150 | +export RUNNER2_HOST=<events-db-2-ip> |
| 151 | +
|
| 152 | +export EVENTS_RATE=5000 |
| 153 | +# EVENTS_RATE * 60 for benchmark to last approximately 1 minute |
| 154 | +export EVENTS_TO_PUBLISH=300000 |
| 155 | +bash run_multiple_dbs_benchmark.bash |
| 156 | +
|
| 157 | +export EVENTS_RATE=10000 |
| 158 | +export EVENTS_TO_PUBLISH=600000 |
| 159 | +bash run_multiple_dbs_benchmark.bash |
| 160 | +``` |
| 161 | + |
| 162 | +We have 3 dbs (shards), so real rates are: |
| 163 | +``` |
| 164 | +3 * 5000 = 15 000 per second |
| 165 | +3 * 10000 = 30 000 per second |
| 166 | +``` |
| 167 | +...which is quite a lot! |
0 commit comments