- Constant rate
- Throughput at given latency
- Latency at given throughput
- Latency for a set Throughput
- Pitfalls
- Coordinated Omission: CO_USER_GROUP, CO_PDF
- Why don't I get the throughput I benchmarked?
- Performance is not composable. If you have the performanservice A and service B
- Synthetic benchmarking actually sucks - best data is obtained from production load
- Always use JMH for java micro benchmarks, always!
- Dont aggregate percentiles
- Dont average an average stat
See: Performance Methodology I, Performance Methodology II
A proper test environment:
- Relevant: reproduces the phenomena (production like, relevant data volume and veracity)
- Isolated: leaves out unwanted effects
- Measurable: provides metrics
- Reliable: produces consistent result (usage patterns)
Pre-testing:
- define perf goals/requirements
- have a baseline or otherwise establish one
- Install + configure app to same specs as production
- Setup monitoring
- Kill everything on the system which is not running in production or better yet test on production system (not necessarily in production)
- Spike test to ensure correctness
- Actual test - (when working with java run for at least 30 minutes for JVM to become hot)
- Collect and validate data
- Repeat
- Start with testing just a single node in the cluster
Mainly Shipilevs (JMH the lesser of two evils, The Art of Java Benchmarking) and Alexandrescu (Writing Fast Code I, [Writing Fast Code II]) thoughts.
Benchmarks are experiments
- Computer Science: Functional requirements, often very abstract
- Software (Performance) Engineering: Exploring complex interactions between hardware, software and data. Based on empirical evidence (real/natural science)
Experiments requires control and they require a model from which we derive our tests based on a hypothesis.
Eg: Based on our current understanding of the system we assume that X will help us achieve better latency. We try to control our environment and only change X.
We have to continuously understand, refine or reject our performance model.
Say you have the performance of A and B. If you but those two together the performance is NOT A + B. We cannot derive what will be the performance (It can be almost anything).
Optimizations distort the performance model. It is too complex to predict the performance of a complex system. There are to many unclear interdependencies (from hardware, OS, compiler, runtime, outside influences ...)
Benchmarking is the (endless) fight against the optimizations (therefore a good benchmarking harness must mange optimizations).
Example: Whats the performance of new Object()?
Answer: You cannot really tell, it could be allocated in TLAB or need to go to LOB or maybe its scalarized or allocation can be completely eliminated by JIT.
Every new optimization can/will break our performance model.
The minimum contains to noise, it is the faster YOUR code can run.
Without a baseline you dont have anything to compare against.
Low latency cannot be achieved at very high bandwidth Very high bandwidth means you need to sacrifice latency
Most benchmarks require warmup. Waiting for transient responses to settle down (JIT, OS scheduling, ...)