Postgres_GKE PR2-sysbench - Standalone PostgreSQL Sysbench benchmark on GKE by manojcns · Pull Request #6563 · GoogleCloudPlatform/PerfKitBenchmarker

manojcns · 2026-03-29T21:41:09Z

Summary

This PR introduces standalone PostgreSQL Sysbench benchmark for GKE, enabling repeatable OLTP performance testing using Sysbench workloads against a single-instance PostgreSQL deployment on Kubernetes.

This is the second in a series of three PRs for GKE PostgreSQL benchmarking:

PR1 (merged): Core GKE infrastructure hardening (HugePages flag gating,
LoadBalancer endpoint refactor, additive Kubernetes cluster methods)
PR2 (this PR): Standalone PostgreSQL Sysbench benchmark on GKE
PR3 (upcoming): High-Availability PostgreSQL benchmark using CloudNativePG operator

All new files. Zero modifications to existing benchmark modules.

Files Changed

1. [perfkitbenchmarker/linux_benchmarks/postgres_sysbench_gke_benchmark.py) (New)

Deploys PostgreSQL as a Kubernetes StatefulSet using a Jinja2-rendered manifest
Client pod runs Sysbench from within the cluster (no external VM needed)
Configurable storage classes: hyperdisk-balanced, pd-ssd
Supports optimization profiles ('baseline', 'infra-tuned', 'kernel-tuned', 'postgres-tuned', 'infra+postgres', 'infra+postgres+hugepages') covering shared_buffers, HugePages, kernel params, and WAL tuning
HugePages provisioned via --system-config-from-file on the GKE nodepool — existing
benchmarks that do not set gke_node_system_config are completely unaffected
All flags namespaced under postgres_gke_* — no collisions with existing flags

2. `data/container/postgres_sysbench/` (New — 2 manifest templates)

postgres_all.yaml.j2: StatefulSet + Service + PVC for the Postgres server pod
client_pod.yaml.j2: Sysbench client pod spec

3. `docs/` (New — 2 documentation files)

GKE_PostgreSQL_Quickstart_generic.MD: Quickstart guide with example PKB commands
Technical_Architecture_PostgreSQL_PKB.md: Architecture overview

Backward Compatibility

No existing files modified. All changes are purely additive new files.
Flag namespace postgres_gke_* does not conflict with any existing PKB flags.
HugePages logic is gated: only activates when an optimization profile with
hugepages key is selected. Benchmarks not using this flag are unaffected.
sysbench_benchmark.py (VM-based MySQL/Postgres) is not touched — the new
benchmark is GKE-only and independently implemented.

hankfreund · 2026-03-30T16:10:35Z

+```
+
+
+## HA (CloudNativePG) Tests


Again, please remove all HA references.

Acknowledged, will update

hankfreund · 2026-03-30T16:18:25Z

+```
+
+### Architecture & Logic
+1.  **Pod as VM Abstraction**: PKB's Kubernetes provider treats Kubernetes pods as Virtual Machines. When the benchmark runs, PKB provisions:


You mention "Pod as VM", but it looks like the code supports both that as well as k8s-native.

Is there a strong reason to support both? I'd prefer just the k8s-native way, since that's the way users would run a real workload.

Acknowledged, will update code

hankfreund · 2026-03-30T16:21:05Z

+*   **v1 (Infrastructure)**: Uses Container-Optimized OS (COS) for nodes and Ubuntu 24.04 for the client.
+*   **v2 (Startup)**: Uses Ubuntu node image and removes the init container for faster startup (at the cost of less robust permission handling).
+*   **v3 (Kernel)**: Applies sysctl tuning (`vm.swappiness=1`, `vm.dirty_ratio=10`, etc.) to the node.
+*   **v4 (HugePages)**: Enables HugePages (2MB) on the node and configures PostgreSQL (`huge_pages=on`) to use them. This reduces TLB misses and improves memory management efficiency.
+*   **v6 (Postgres Tuning)**: Applies aggressive PostgreSQL configuration tuning (e.g., `shared_buffers=35GB`, `effective_io_concurrency=200`, `max_worker_processes=32`).
+*   **v1+v6+v4 (All-in-One)**: Combines Infrastructure, Postgres Tuning, and HugePages for maximum performance.
+*   **v1+v6+v4+hostnetwork (HostNetwork Optimized)**: Extends the "All-in-One" profile by enabling Host Networking (`hostNetwork: true`) for the PostgreSQL pods. This bypasses the Kubernetes CNI/Overlay network stack, allowing the database to use the node's native network interface for maximum throughput and reduced latency.
+


Can the profiles be renamed using words instead of arbitrary numbers? It'd be easier to understand something like infra+hugepages+pg (or similar).

Hey Hank, This can be done, would below work? Thanks

v1 -> infra-tuned (COS nodes + Ubuntu clients)
v2 -> fast-startup (No init container)
v3 -> kernel-tuned (Sysctl params)
v4 -> hugepages (HugePages on)
v6 -> postgres-tuned (Aggressive PG memory/worker tuning)
v1+v6 -> infra+postgres
v1+v6+v4 -> infra+postgres+hugepages
v1+v6+v4+hostnetwork -> infra+postgres+hugepages+hostnetwork

Looks great, thanks!

hankfreund · 2026-03-30T16:31:00Z

+      requests:
+        cpu: "4"
+        memory: "10Gi"
+      limits:
+        cpu: "8"
+        memory: "20Gi"


I understand the original intent of this benchmark was very narrow in scope, but I think it'd be worthwhile to provide configurability so users can run the benchmark generically on other machine types, rather than being limited to the original ones tested.

Acknowledged, let me review

hankfreund · 2026-03-30T16:34:20Z

+    - name: PGUSER
+      value: benchmark
+    - name: PGPASSWORD
+      value: {{ password }}


Can a k8s secret be used instead? The client is exposing the password here to anyone that can kubectl describe pod.

Acknowledged, let me review

hankfreund · 2026-03-30T16:47:34Z

Is this used anywhere? Can it be removed? I see the node-config.yaml next, which is the preferred way of handling this setup.

hankfreund · 2026-03-30T16:57:20Z

+
+    # Wait a bit for resources to be created
+    logging.info('Waiting 30 seconds for resources to be created...')
+    time.sleep(30)


Better to poll for availability with timeout than to sleep arbitrarily.

hankfreund · 2026-03-30T16:57:58Z

+
+        # Give it more time to stabilize (important for large shared_buffers)
+        logging.info('Waiting 60 seconds for PostgreSQL to fully stabilize...')
+        time.sleep(60)


Is there a way to poll here instead of waiting?

manojcns · 2026-04-10T01:39:11Z

@hankfreund - this PR is ready for re-review; please take a look when time permits

hankfreund

Thanks Manoj! I'll ping the team to take a look.

jellyfishcake · 2026-04-27T20:24:22Z

+
+# PostgreSQL configuration flags
+flags.DEFINE_string(
+    'postgres_gke_shared_buffers',


could you just have 2 sets of defaults for baseline and optimized? This will limit the choices for the pkb users. See SHARED_BUFFERS_CONF in perfkitbenchmarker/linux_packages/postgresql.py.

jellyfishcake · 2026-04-27T20:24:46Z

+    'PostgreSQL shared_buffers size (baseline: 15GB, optimized: 35GB)',
+)
+flags.DEFINE_integer(
+    'postgres_gke_max_connections', 1000, 'PostgreSQL max_connections'


Just hard code into default if you do not plan to use other numbers as variables.

jellyfishcake · 2026-04-27T20:24:57Z

+)
+flags.DEFINE_string(
+    'postgres_gke_effective_cache_size',
+    '30GB',


see comment above

jellyfishcake · 2026-04-27T20:27:19Z

+"""
+
+# Machine type to disk type mapping
+MACHINE_DISK_MAPPING = {


Pls do not add this mapping here.
This file should be cloud agnostic.
We usually manage this complexity at scheduling time.

jellyfishcake · 2026-04-27T20:30:10Z

+
+# Optimization profiles
+# NOTE: These profile memory and CPU values are tuned for c4-standard-16 and n2-standard-16 only.
+OPTIMIZATION_PROFILES = {


since you have this, why have the flags above?

jellyfishcake · 2026-04-27T20:49:06Z

+    ]['GCP']['machine_type']
+
+    # Calculate dynamic HugePages needed mapped to the architecture
+    machine_family = server_machine.split('-')[0]


we dont put gcp specific code in benchmark

jellyfishcake · 2026-04-27T20:51:18Z

+    if machine_family in ['c4a', 'n4', 'n4a', 'n4d']:
+      node_mem_gb = node_cpus * 4.0
+    elif machine_family == 'c4d':
+      node_mem_gb = node_cpus * 3.875


I am not familiar with GKE, but can you pull the accessible memory using gke version of 'free -h'?
This is not maintainable.

jellyfishcake · 2026-04-27T20:52:10Z

+    hugepage_mb = int(pod_mem_gb * 0.45) * 1024
+    hugepage_size2m = int(hugepage_mb / 2)
+
+    import os


imports at top of file.

jellyfishcake · 2026-04-27T20:52:45Z

+  if not machine_type:
+    machine_type = 'c4-standard-16'
+
+  parts = machine_type.split('-')


prefer you query from within the pod/node.

jellyfishcake · 2026-04-27T20:53:23Z

+      ].vm_spec['GCP']['machine_type']
+    except (KeyError, AttributeError):
+      # Default to c4-standard-16 if we can't find it
+      machine_type = 'c4-standard-16'


fail if you cannot find it

jellyfishcake · 2026-04-27T22:49:15Z

+  vm_util.IssueCommand(cmd)
+
+  # 2. Delete the Client Pod
+  cmd = [


Does the container cluster resource not clean this up automatically?

jellyfishcake · 2026-04-27T22:50:00Z

+  ]
+  vm_util.IssueCommand(cmd)
+
+  # 3. Explicitly delete all PVCs to ensure disks are released


Does deleting the cluster not delete the pvcs?

…on Profiles

… scrub HA/VM docs)

- Enforce profile-based tuning by removing granular config flags. - Deduplicate base PostgreSQL configurations across profiles. - Standardize storage using native PKB data_disk flags. - Replace hardcoded GCP machine mappings with dynamic K8s API queries. - Rely on PKB's native cluster lifecycle manager for teardown.

- Renamed 'postgres_sysbench_gke' to 'kubernetes_postgres_sysbench' to adhere to the official PKB naming convention for K8s benchmarks. - Updated module filename to kubernetes_postgres_sysbench_benchmark.py. - Updated BENCHMARK_NAME and BENCHMARK_CONFIG root keys internally. - Refactored module docstrings and markdown documentation to accurately reflect cloud-agnostic Kubernetes capability rather than being exclusively GKE-focused.

hankfreund reviewed Mar 30, 2026

View reviewed changes

Comment thread docs/GKE_PostgreSQL_Quickstart_generic.MD Outdated

hankfreund reviewed Mar 30, 2026

View reviewed changes

Comment thread docs/GKE_PostgreSQL_Quickstart_generic.MD Outdated

hankfreund reviewed Mar 30, 2026

View reviewed changes

manojcns force-pushed the postgres-gke-pr2-sysbench branch from 908c8b2 to a5ba694 Compare April 9, 2026 22:23

hankfreund approved these changes Apr 13, 2026

View reviewed changes

manojcns marked this pull request as ready for review April 22, 2026 18:27

jellyfishcake reviewed Apr 27, 2026

View reviewed changes

manoj-sasankan-sada added 6 commits May 6, 2026 15:07

Introduce Postgres Sysbench Benchmark on GKE with Advanced Optimizati…

2081784

…on Profiles

refactor: address PR reviewer feedback(dynamic memory, remove sleeps,…

7688ab5

… scrub HA/VM docs)

fix: map hyperdisk-balanced to N4 instances

9e9a12c

style: apply pyink formatting and remove dynamic yaml files

ee936f4

manojcns force-pushed the postgres-gke-pr2-sysbench branch from 9d3ed13 to b8ac5f5 Compare May 6, 2026 19:09

Fix: removed duplicate import

3ea2b8f

manojcns requested a review from jellyfishcake May 7, 2026 00:54

Conversation

manojcns commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed

1. [perfkitbenchmarker/linux_benchmarks/postgres_sysbench_gke_benchmark.py) (New)

2. data/container/postgres_sysbench/ (New — 2 manifest templates)

3. docs/ (New — 2 documentation files)

Backward Compatibility

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manojcns commented Apr 10, 2026

Uh oh!

hankfreund left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manojcns commented Mar 29, 2026 •

edited

Loading

2. `data/container/postgres_sysbench/` (New — 2 manifest templates)

3. `docs/` (New — 2 documentation files)