diff --git a/docs/GKE_PostgreSQL_Quickstart_generic.MD b/docs/GKE_PostgreSQL_Quickstart_generic.MD new file mode 100644 index 0000000000..d53df2061c --- /dev/null +++ b/docs/GKE_PostgreSQL_Quickstart_generic.MD @@ -0,0 +1,170 @@ +# PostgreSQL on GKE - Benchmark Quickstart Guide + +## Overview + +This guide covers the PKB benchmark module for PostgreSQL +performance automation on Google Kubernetes Engine (GKE): + +- **`kubernetes_postgres_sysbench`** — Standalone PostgreSQL benchmark: deploys a + single PostgreSQL instance as a Kubernetes StatefulSet and runs Sysbench OLTP + workloads from a client pod within the same cluster. + +This benchmark: +- Creates and tears down GKE infrastructure automatically via PKB. +- Measures TPS (Transactions Per Second), QPS (Queries Per Second), and Latency. +- Supports multiple optimization profiles for tuning PostgreSQL and GKE node configuration. + +## Architecture Overview + +1. **GKE Cluster**: Created with 2 nodepools: + * `postgres`: For the PostgreSQL server (StatefulSet). + * `clients`: For the Sysbench client (Pod). +2. **Private Networking**: + * PostgreSQL runs as a StatefulSet with a Persistent Volume. + * Sysbench connects via the **Private Pod IP** of the server. + * No public IPs are used for database traffic. +3. **Storage**: + * Uses `pd-ssd` (for N-series) or `hyperdisk-balanced` (for C4-series). + +## New Developer Setup (First Time Only) + +If you're a new developer cloning this repository for the first time: + +```bash +# 1. Clone the repository +git clone +cd PerfKitBenchmarker + +# 2. Create Python virtual environment (first time only) +python3 -m venv venv_postgres + +# 3. Activate virtual environment +source venv_postgres/bin/activate + +# 4. Install Python dependencies +pip install "setuptools<70.0.0" +pip install pytz +pip install -r requirements.txt +# This may take 2-3 minutes + +# 5. Authenticate with GCP +gcloud auth login +gcloud auth application-default login + +# 6. Set your GCP project +export PROJECT_ID="your-project-id" +gcloud config set project $PROJECT_ID +``` + +**Note**: The `venv_postgres/` directory is NOT in git. Each developer creates their own. + +## Prerequisites (For Each Session) + +```bash +# 1. Activate virtual environment +source venv_postgres/bin/activate + +# 2. Create temp directory (for logs and results) +mkdir -p pkb_temp + +# 3. Set GCP project variable +export PROJECT_ID="your-project-id" +``` + +**Note**: +- The `pkb_temp/` directory stores benchmark logs and results. It's excluded from git via `.gitignore`. + + +## Baseline Tests + +Runs the benchmark with standard PostgreSQL settings (no special tuning). + +### Baseline Run (C4 Standard) + +```bash +python3 pkb.py \ + --benchmarks=kubernetes_postgres_sysbench \ + --cloud=GCP \ + --vm_platform=Kubernetes \ + --zone=us-central1-a \ + --project=$PROJECT_ID \ + --postgres_gke_server_machine_type=c4-standard-16 \ + --postgres_gke_client_machine_type=c4-standard-16 \ + --postgres_gke_disk_type=hyperdisk-balanced \ + --postgres_gke_disk_size=500 \ + --postgres_gke_optimization_profile=baseline \ + --sysbench_tables=10 \ + --sysbench_table_size=4000000 \ + --sysbench_run_threads=512 \ + --sysbench_run_seconds=300 \ + --sysbench_testname=oltp_read_write \ + --metadata=cloud:GCP \ + --metadata=geo:us-central1 \ + --metadata=scenario:postgres_baseline \ + --temp_dir=./pkb_temp \ + --run_stage_iterations=1 \ + --owner=$(whoami | tr '.' '-') \ + --log_level=error \ + --accept_licenses +``` + +## Optimized Tests - example + +Runs the benchmark with specific optimization profiles. + + + +### 1. Profile: Postgres Tuned +Aggressive PostgreSQL configuration tuning (Shared Buffers, Workers, etc.). + +```bash +python3 pkb.py \ + --benchmarks=kubernetes_postgres_sysbench \ + --cloud=GCP \ + --vm_platform=Kubernetes \ + --zone=us-central1-a \ + --project=$PROJECT_ID \ + --postgres_gke_server_machine_type=c4-standard-16 \ + --postgres_gke_client_machine_type=c4-standard-16 \ + --postgres_gke_disk_type=hyperdisk-balanced \ + --postgres_gke_disk_size=500 \ + --postgres_gke_optimization_profile=postgres-tuned \ + --sysbench_tables=10 \ + --sysbench_table_size=4000000 \ + --sysbench_run_threads=512 \ + --sysbench_run_seconds=300 \ + --sysbench_testname=oltp_read_write \ + --metadata=cloud:GCP \ + --metadata=geo:us-central1 \ + --metadata=scenario:postgres_optimized \ + --metadata=optimization_profile:postgres-tuned \ + --temp_dir=./pkb_temp \ + --run_stage_iterations=1 \ + --owner=$(whoami | tr '.' '-') \ + --log_level=error \ + --accept_licenses +``` + + + + + +## Understanding the Workload + +* **Workload**: Sysbench OLTP Read/Write (`oltp_read_write`). +* **Tables**: 10 tables. +* **Table Size**: 4,000,000 rows per table. +* **Threads**: 512 concurrent threads. +* **Duration**: 300 seconds (5 minutes) per run. + +## Results Location + +Results are saved to: +``` +./pkb_temp/runs//perfkitbenchmarker_results.json +``` + +View results: +```bash +cat ./pkb_temp/runs//perfkitbenchmarker_results.json | jq +``` diff --git a/docs/Technical_Architecture_PostgreSQL_PKB.md b/docs/Technical_Architecture_PostgreSQL_PKB.md new file mode 100644 index 0000000000..78231e719a --- /dev/null +++ b/docs/Technical_Architecture_PostgreSQL_PKB.md @@ -0,0 +1,110 @@ +# Technical Architecture: PostgreSQL Benchmarking on GKE with PerfKitBenchmarker + +This document provides a technical deep dive into the architecture and implementation of the PostgreSQL benchmarking suite used for evaluating performance on Google Kubernetes Engine (GKE). It covers the implementation details of both the Baseline and Optimized benchmarks, explaining how PerfKitBenchmarker (PKB) is leveraged to simulate real-world workloads using Sysbench. + +## Overview + +The benchmarking suite is designed to compare the performance of standard PostgreSQL deployments ("Baseline") against GKE-optimized PostgreSQL configurations ("Optimized"). The benchmarks use `sysbench` (OLTP Read/Write) as the load generator and are orchestrated by PKB. + +## Baseline Benchmark Implementation + +The baseline benchmark is executed using the `kubernetes_postgres_sysbench` benchmark configuration. This configuration represents a standard, unoptimized PostgreSQL deployment on Kubernetes. + +### Execution Command + + +```bash +python3 pkb.py \ + --benchmarks=kubernetes_postgres_sysbench \ + --postgres_gke_optimization_profile=baseline \ + ... +``` + +### Architecture & Logic +1. **Kubernetes-Native Architecture**: PKB provisions a native Kubernetes architecture: + * **Server**: A StatefulSet with 1 replica (`postgres-standalone-0`) running PostgreSQL 16. + * **Client**: A separate Pod (`postgres-client`) running `sysbench`. +2. **StatefulSet & Storage**: The PostgreSQL server uses a StatefulSet to ensure stable identity and persistent storage. It claims a Persistent Volume (PVC) using either `pd-ssd` (for N-series) or `hyperdisk-balanced` (for C4-series). +3. **Private Connectivity**: To ensure secure and low-latency communication, the client pod connects to the server using the **Pod IP** (`.status.podIP`) of the server pod. This avoids any potential public load balancer paths and keeps traffic internal to the cluster. +4. **Secure Authentication**: The benchmark generates a password (or uses `POSTGRES_PASSWORD` env var) and passes it securely to the server (via Secret) and the client (via `PGPASSWORD` env var). + +## Optimized Benchmark Implementation + +The optimized benchmark uses the same `kubernetes_postgres_sysbench` benchmark class but applies specific "Optimization Profiles" to tune the infrastructure and database configuration. + +### Execution Command + +```bash +python3 pkb.py \ + --benchmarks=kubernetes_postgres_sysbench \ + --postgres_gke_optimization_profile=infra+postgres+hugepages \ + ... +``` + +### Optimization Profiles +The benchmark supports granular optimization profiles that can be combined: + +* **infra-tuned**: Uses Container-Optimized OS (COS) for nodes and Ubuntu 24.04 for the client. +* **fast-startup**: Uses Ubuntu node image and removes the init container for faster startup (at the cost of less robust permission handling). +* **kernel-tuned**: Applies sysctl tuning (`vm.swappiness=1`, `vm.dirty_ratio=10`, etc.) to the node. +* **hugepages**: Enables HugePages (2MB) on the node and configures PostgreSQL (`huge_pages=on`) to use them. This reduces TLB misses and improves memory management efficiency. +* **postgres-tuned**: Applies aggressive PostgreSQL configuration tuning. +* **infra+postgres**: Combines Infrastructure and Postgres Tuning profiles. +* **infra+postgres+hugepages**: Combines Infrastructure, Postgres Tuning, and HugePages for maximum performance. +* **infra+postgres+hugepages+hostnetwork**: Extends the "All-in-One" profile by enabling Host Networking (`hostNetwork: true`) for the PostgreSQL pods. This bypasses the Kubernetes CNI/Overlay network stack, allowing the database to use the node's native network interface for maximum throughput and reduced latency. + +## Control Parameters Comparison + +The following table summarizes the key control parameters used in both the Baseline and Optimized runs. + +### Sysbench Parameters (Load Generator) + +| Parameter | Baseline | Optimized | +| :--- | :--- | :--- | +| `tables` | 10 | 10 | +| `table_size` | 4,000,000 | 4,000,000 | +| `threads` | 512 | 512 | +| `testname` | oltp_read_write | oltp_read_write | +| `duration` | 300s | 300s | +| `report_interval` | 10s | 10s | + +### PostgreSQL Server Parameters + +Memory configurations like `shared_buffers` and `effective_cache_size` are determined dynamically by a rule-based sizing engine that detects the Server Machine Type (`--postgres_gke_server_machine_type`) and aggressively scales K8s pod resources to ~85% of total node RAM, assigning proportionate limits to PostgreSQL to prevent Out-Of-Memory. + +| Parameter | Baseline | Optimized (postgres-tuned / infra+postgres+hugepages) | +| :--- | :--- | :--- | +| **Shared Buffers** | 25% of Pod RAM | 40% of Pod RAM | +| **Effective Cache Size** | 50% of Pod RAM | 75% of Pod RAM | +| **Work Mem** | 64MB | 256MB | +| **Effective IO Concurrency** | 100 | 200 | +| **Huge Pages** | Off | On (hugepages) | +| **WAL Buffers** | 64MB | 512MB | +| **Max Worker Processes** | 20 | 32 | +| **Host Network** | False | Optional (infra+postgres+hugepages+hostnetwork) | + +## Implementation Details + +### 1. Private IP Implementation +To enforce private networking: +* The benchmark explicitly retrieves the Pod IP: `kubectl get pod postgres-standalone-0 -o jsonpath={.status.podIP}`. +* This IP is passed to `sysbench` via the `--pgsql-host` flag. +* The client architecture operates exclusively via native K8s pods initialized in the exact namespace as the Server, maintaining an exact replication of enterprise internal-cluster layouts. + +### 2. Disk Type Selection +The benchmark automatically maps machine types to optimal disk types: +* **C4 / C4A / C4D / N4 / N4A / N4D**: `hyperdisk-balanced` + +### 3. Sysbench Execution +* The benchmark installs `sysbench` in the client pod via `apt-get`. +* It executes the `oltp_read_write.lua` script located at `/usr/share/sysbench/`. +* The execution command includes a timeout buffer (`duration + 120s`) to prevent premature termination. + +### 4. Password Handling & Security +* **Dynamic Password Generation**: A unique password is generated per benchmark run based on the Run URI, ensuring isolation between runs. The plaintext password is never hardcoded or stored in source control. PostgreSQL handles password hashing internally on the server side. +* **Secret Management**: + * **Standalone**: Password is injected into the PostgreSQL pod via the StatefulSet manifest and passed to the Sysbench client via the `PGPASSWORD` environment variable, preventing it from appearing in process listings or command-line logs. + + + +* **Disk Automation**: Selects `hyperdisk-balanced` (C4) or `pd-ssd` (N2) automatically. diff --git a/perfkitbenchmarker/data/container/postgres_sysbench/client_pod.yaml.j2 b/perfkitbenchmarker/data/container/postgres_sysbench/client_pod.yaml.j2 new file mode 100644 index 0000000000..0d486d7c1d --- /dev/null +++ b/perfkitbenchmarker/data/container/postgres_sysbench/client_pod.yaml.j2 @@ -0,0 +1,44 @@ +--- +# Client pod for running Sysbench benchmarks +apiVersion: v1 +kind: Pod +metadata: + name: postgres-client + namespace: {{ namespace }} + labels: + app: postgres-client +spec: + # PKB will handle node selection through nodepool configuration + tolerations: + - key: "kubernetes.io/arch" + operator: "Equal" + value: "arm64" + effect: "NoSchedule" + containers: + - name: postgres-client + image: {{ client_image }} + imagePullPolicy: IfNotPresent + command: + - sleep + - infinity + resources: + requests: + cpu: "{{ client_cpu_request }}" + memory: "{{ client_memory_request }}" + limits: + cpu: "{{ client_cpu_limit }}" + memory: "{{ client_memory_limit }}" + env: + - name: PGHOST + value: postgres-standalone + - name: PGPORT + value: "5432" + - name: PGUSER + value: benchmark + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: sysbench-passwords + key: benchmark-password + - name: PGDATABASE + value: benchmark \ No newline at end of file diff --git a/perfkitbenchmarker/data/container/postgres_sysbench/postgres_all.yaml.j2 b/perfkitbenchmarker/data/container/postgres_sysbench/postgres_all.yaml.j2 new file mode 100644 index 0000000000..303013f223 --- /dev/null +++ b/perfkitbenchmarker/data/container/postgres_sysbench/postgres_all.yaml.j2 @@ -0,0 +1,297 @@ +--- +# Namespace for PostgreSQL resources +apiVersion: v1 +kind: Namespace +metadata: + name: {{ namespace }} +--- +# Storage class for PostgreSQL data +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: postgres-storage-class +provisioner: kubernetes.io/gce-pd +parameters: + type: {{ disk_type }} + replication-type: none # Use zonal PDs for better performance +reclaimPolicy: Delete +allowVolumeExpansion: true +volumeBindingMode: WaitForFirstConsumer +--- +# Secret for PostgreSQL credentials +apiVersion: v1 +kind: Secret +metadata: + name: postgres-secret + namespace: {{ namespace }} +type: Opaque +stringData: + POSTGRES_USER: {{ postgres_user }} + POSTGRES_PASSWORD: {{ postgres_password }} + POSTGRES_DB: {{ postgres_database }} +--- +# ConfigMap for PostgreSQL configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: postgres-config + namespace: {{ namespace }} +data: + postgresql.conf: | + # Connection settings + listen_addresses = '*' + max_connections = {{ max_connections }} + + # Memory settings + shared_buffers = {{ shared_buffers }} + effective_cache_size = {{ effective_cache_size }} + maintenance_work_mem = 1GB + work_mem = {{ work_mem }} + + # Parallel execution + max_worker_processes = {{ max_worker_processes }} + max_parallel_workers_per_gather = {{ max_parallel_workers_per_gather }} + max_parallel_workers = {{ max_parallel_workers }} + + # Write ahead log + wal_buffers = {{ wal_buffers }} + max_wal_size = {{ max_wal_size }} + min_wal_size = 80MB + checkpoint_timeout = {{ checkpoint_timeout }} + checkpoint_completion_target = {{ checkpoint_completion_target }} + + # Query tuning + effective_io_concurrency = {{ effective_io_concurrency }} + random_page_cost = {{ random_page_cost }} + huge_pages = {{ huge_pages | default('try') }} + wal_level = {{ wal_level | default('replica') }} + synchronous_commit = {{ synchronous_commit | default('on') }} + + # Autovacuum + autovacuum = on + autovacuum_max_workers = {{ autovacuum_max_workers }} + autovacuum_naptime = 10s + + # Logging + log_destination = 'stderr' + logging_collector = off + log_line_prefix = '{{ log_line_prefix | default('%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ') }}' + log_checkpoints = {{ log_checkpoints | default('on') }} + log_connections = {{ log_connections | default('off') }} + log_disconnections = {{ log_disconnections | default('off') }} + log_lock_waits = {{ log_lock_waits | default('off') }} + log_temp_files = {{ log_temp_files | default('0') }} + log_autovacuum_min_duration = {{ log_autovacuum_min_duration | default('-1') }} + log_error_verbosity = {{ log_error_verbosity | default('default') }} + client_min_messages = {{ client_min_messages | default('notice') }} + log_min_messages = {{ log_min_messages | default('warning') }} + log_min_error_statement = {{ log_min_error_statement | default('error') }} + log_min_duration_statement = {{ log_min_duration_statement | default('-1') }} + + # Statistics + track_activities = on + track_counts = on + track_io_timing = on + + pg_hba.conf: | + # TYPE DATABASE USER ADDRESS METHOD + local all all trust + host all all 0.0.0.0/0 md5 + host all all ::/0 md5 +--- +# ConfigMap for init scripts +apiVersion: v1 +kind: ConfigMap +metadata: + name: postgres-init-scripts + namespace: {{ namespace }} +data: + init.sql: | + -- Create benchmark database if it doesn't exist + SELECT 'CREATE DATABASE {{ postgres_database }}' + WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '{{ postgres_database }}'); + + -- Grant privileges + GRANT ALL PRIVILEGES ON DATABASE {{ postgres_database }} TO {{ postgres_user }}; + + -- Set default configuration + ALTER SYSTEM SET shared_buffers = '{{ shared_buffers }}'; + ALTER SYSTEM SET effective_cache_size = '{{ effective_cache_size }}'; + ALTER SYSTEM SET work_mem = '{{ work_mem }}'; + + init-permissions.sh: | + #!/bin/bash + set -e + + echo "Setting up PostgreSQL data directory permissions..." + # Ensure proper permissions on data directory + chown -R postgres:postgres /var/lib/postgresql/data || true + chmod 700 /var/lib/postgresql/data || true + echo "Permissions set successfully" +--- +# Service for PostgreSQL +apiVersion: v1 +kind: Service +metadata: + name: postgres-standalone + namespace: {{ namespace }} +spec: + type: ClusterIP + selector: + app: postgres-standalone + ports: + - port: 5432 + targetPort: 5432 + protocol: TCP + name: postgres +--- +# StatefulSet for PostgreSQL +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: postgres-standalone + namespace: {{ namespace }} +spec: + serviceName: postgres-standalone + replicas: 1 + selector: + matchLabels: + app: postgres-standalone + template: + metadata: + labels: + app: postgres-standalone + spec: + {% if host_network %} + hostNetwork: true + dnsPolicy: ClusterFirstWithHostNet + {% endif %} + securityContext: + fsGroup: 999 # postgres group ID + # PKB will handle node selection through nodepool configuration + tolerations: + - key: "kubernetes.io/arch" + operator: "Equal" + value: "arm64" + effect: "NoSchedule" + {% if use_init_container %} + initContainers: + - name: init-permissions + image: postgres:{{ postgres_version }} + command: + - "bash" + - "-c" + - "/scripts/init-permissions.sh" + securityContext: + runAsUser: 0 # Run as root for permissions setup + volumeMounts: + - name: init-scripts + mountPath: /scripts + - name: postgres-data + mountPath: /var/lib/postgresql/data + {% endif %} + containers: + - name: postgres + image: postgres:{{ postgres_version }} + ports: + - containerPort: 5432 + name: postgres + envFrom: + - secretRef: + name: postgres-secret + env: + - name: PGDATA + value: /var/lib/postgresql/data/pgdata + - name: POSTGRES_HOST_AUTH_METHOD + value: "md5" + volumeMounts: + - name: postgres-data + mountPath: /var/lib/postgresql/data + - name: postgres-config + mountPath: /etc/postgresql/postgresql.conf + subPath: postgresql.conf + - name: postgres-config + mountPath: /etc/postgresql/pg_hba.conf + subPath: pg_hba.conf + - name: init-scripts + mountPath: /docker-entrypoint-initdb.d/init.sql + subPath: init.sql + {% if hugepages %} + - name: hugepages + mountPath: /dev/hugepages + readOnly: false + {% endif %} + resources: + requests: + cpu: "{{ cpu_request | default('6') }}" + memory: "{{ memory_request | default('15Gi') }}" + {% if hugepages %} + {% if hugepages.hugepage_size2m %} + hugepages-2Mi: "{{ hugepages.hugepage_size2m * 2 }}Mi" + {% endif %} + {% if hugepages.hugepage_size1g %} + hugepages-1Gi: "{{ hugepages.hugepage_size1g }}Gi" + {% endif %} + {% endif %} + limits: + cpu: "{{ cpu_limit | default('10') }}" + memory: "{{ memory_limit | default('20Gi') }}" + {% if hugepages %} + {% if hugepages.hugepage_size2m %} + hugepages-2Mi: "{{ hugepages.hugepage_size2m * 2 }}Mi" + {% endif %} + {% if hugepages.hugepage_size1g %} + hugepages-1Gi: "{{ hugepages.hugepage_size1g }}Gi" + {% endif %} + {% endif %} + command: + - "docker-entrypoint.sh" + - "postgres" + - "-c" + - "config_file=/etc/postgresql/postgresql.conf" + - "-c" + - "hba_file=/etc/postgresql/pg_hba.conf" + readinessProbe: + exec: + command: + - pg_isready + - -U + - {{ postgres_user }} + - -d + - {{ postgres_database }} + initialDelaySeconds: 180 # Increased to 3 minutes + periodSeconds: 15 + timeoutSeconds: 10 + failureThreshold: 40 # Increased to 10 minutes total + livenessProbe: + exec: + command: + - pg_isready + - -U + - {{ postgres_user }} + initialDelaySeconds: 240 # Increased to 4 minutes + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 20 # Increased forgivenesss + volumes: + - name: postgres-config + configMap: + name: postgres-config + - name: init-scripts + configMap: + name: postgres-init-scripts + defaultMode: 0755 + {% if hugepages %} + - name: hugepages + emptyDir: + medium: HugePages + {% endif %} + volumeClaimTemplates: + - metadata: + name: postgres-data + spec: + accessModes: ["ReadWriteOnce"] + storageClassName: postgres-storage-class + resources: + requests: + storage: {{ disk_size }} diff --git a/perfkitbenchmarker/linux_benchmarks/kubernetes_postgres_sysbench_benchmark.py b/perfkitbenchmarker/linux_benchmarks/kubernetes_postgres_sysbench_benchmark.py new file mode 100644 index 0000000000..f58c7e66f7 --- /dev/null +++ b/perfkitbenchmarker/linux_benchmarks/kubernetes_postgres_sysbench_benchmark.py @@ -0,0 +1,1091 @@ +# Copyright 2024 PerfKitBenchmarker Authors. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Run Sysbench against PostgreSQL on Kubernetes. + +This benchmark measures the performance of PostgreSQL deployed on Kubernetes +using Sysbench. It supports multiple machine architectures and optimization +profiles across different environments. + +This benchmark deploys PostgreSQL as a Kubernetes StatefulSet and uses native +client pods for Sysbench load generation. +""" + +import functools +import logging +import os +import time +from typing import Any, Dict, List + +from absl import flags +from perfkitbenchmarker import background_tasks +from perfkitbenchmarker import benchmark_spec +from perfkitbenchmarker import configs +from perfkitbenchmarker import errors +from perfkitbenchmarker import data +from perfkitbenchmarker.resources.container_service import kubernetes_commands +from perfkitbenchmarker import sample +from perfkitbenchmarker import vm_util +from perfkitbenchmarker.linux_packages import sysbench +from perfkitbenchmarker.linux_packages import postgresql +from perfkitbenchmarker.linux_benchmarks import sysbench_benchmark + +FLAGS = flags.FLAGS + + + +# Infrastructure flags +flags.DEFINE_string( + 'postgres_kubernetes_server_machine_type', + None, + 'Machine type for PostgreSQL server nodes', +) +flags.DEFINE_string( + 'postgres_kubernetes_client_machine_type', None, 'Machine type for client nodes' +) +flags.DEFINE_enum( + 'postgres_kubernetes_optimization_profile', + 'baseline', + [ + 'baseline', + 'infra-tuned', + 'fast-startup', + 'kernel-tuned', + 'hugepages', + 'postgres-tuned', + 'infra+postgres', + 'infra+postgres+hugepages', + 'infra+postgres+hugepages+hostnetwork', + ], + 'Optimization profile to use', +) +flags.DEFINE_bool( + 'postgres_kubernetes_use_init_container', + True, + 'Whether to use init container for system updates (baseline: True, v2:' + ' False)', +) +flags.DEFINE_string( + 'postgres_kubernetes_client_cpu_request', + '4', + 'CPU request for Sysbench client pod', +) +flags.DEFINE_string( + 'postgres_kubernetes_client_memory_request', + '10Gi', + 'Memory request for Sysbench client pod', +) +flags.DEFINE_string( + 'postgres_kubernetes_client_cpu_limit', '8', 'CPU limit for Sysbench client pod' +) +flags.DEFINE_string( + 'postgres_kubernetes_client_memory_limit', + '20Gi', + 'Memory limit for Sysbench client pod', +) + +# Note: sysbench_load_threads is already defined in sysbench_benchmark.py + +BENCHMARK_NAME = 'kubernetes_postgres_sysbench' +BENCHMARK_CONFIG = """ +kubernetes_postgres_sysbench: + description: > + Run Sysbench against PostgreSQL on Kubernetes. + Supports multiple machine types and optimization profiles. + container_cluster: + cloud: GCP + type: Kubernetes + vm_count: 1 + vm_spec: + GCP: + machine_type: c4-standard-16 + nodepools: + postgres: + vm_spec: + GCP: + machine_type: c4-standard-16 + zone: us-central1-a + boot_disk_size: 500 + boot_disk_type: hyperdisk-balanced + vm_count: 1 + clients: + vm_spec: + GCP: + machine_type: c4-standard-16 + zone: us-central1-a + boot_disk_size: 100 + boot_disk_type: hyperdisk-balanced + vm_count: 1 + flags: + # Sysbench defaults matching baseline + sysbench_tables: 10 + sysbench_table_size: 4000000 + sysbench_run_threads: 512 + sysbench_run_seconds: 300 + sysbench_report_interval: 10 + sysbench_testname: oltp_read_write +""" + + + +# Base PostgreSQL configuration shared across profiles +BASE_POSTGRES_CONFIG = { + 'max_connections': 1000, + 'random_page_cost': 1.1, + 'checkpoint_timeout': '15min', + 'checkpoint_completion_target': 0.9, + 'effective_io_concurrency': 200, + 'max_wal_size': '16GB', +} + +# Optimization profiles +# NOTE: These profile memory and CPU values are tuned for c4-standard-16 and n2-standard-16 only. +OPTIMIZATION_PROFILES = { + 'baseline': { + 'postgres': { + 'shared_buffers': '15GB', + 'effective_cache_size': '30GB', + 'work_mem': '64MB', + 'max_worker_processes': 20, + 'max_parallel_workers_per_gather': 8, + 'max_parallel_workers': 12, + 'wal_buffers': '64MB', + 'autovacuum_max_workers': 3, + 'effective_io_concurrency': 100, + 'checkpoint_timeout': '5min', + }, + 'use_init_container': True, + 'node_image': 'UBUNTU_CONTAINERD', + 'client_image': 'ubuntu:20.04', + }, + 'infra-tuned': { + 'postgres': { + 'shared_buffers': '15GB', + 'effective_cache_size': '30GB', + 'work_mem': '64MB', + 'max_worker_processes': 20, + 'max_parallel_workers_per_gather': 8, + 'max_parallel_workers': 12, + 'wal_buffers': '64MB', + 'autovacuum_max_workers': 3, + 'effective_io_concurrency': 100, + 'checkpoint_timeout': '5min', + }, + 'use_init_container': True, + 'node_image': 'COS_CONTAINERD', + 'client_image': 'ubuntu:24.04', + }, + 'fast-startup': { + 'postgres': { + 'shared_buffers': '15GB', + 'effective_cache_size': '30GB', + 'work_mem': '64MB', + 'max_worker_processes': 20, + 'max_parallel_workers_per_gather': 8, + 'max_parallel_workers': 12, + 'wal_buffers': '64MB', + 'autovacuum_max_workers': 3, + 'effective_io_concurrency': 100, + 'checkpoint_timeout': '5min', + }, + 'use_init_container': False, + 'node_image': 'UBUNTU_CONTAINERD', + 'client_image': 'ubuntu:20.04', + }, + 'kernel-tuned': { + 'postgres': { + 'shared_buffers': '15GB', + 'effective_cache_size': '30GB', + 'work_mem': '64MB', + 'max_worker_processes': 20, + 'max_parallel_workers_per_gather': 8, + 'max_parallel_workers': 12, + 'wal_buffers': '64MB', + 'autovacuum_max_workers': 3, + 'effective_io_concurrency': 100, + 'checkpoint_timeout': '5min', + }, + 'use_init_container': True, + 'node_image': 'UBUNTU_CONTAINERD', + 'client_image': 'ubuntu:20.04', + 'kernel_params': { + 'vm.swappiness': 1, + 'vm.dirty_ratio': 10, + 'vm.dirty_background_ratio': 5, + 'net.core.netdev_max_backlog': 4000, + }, + }, + 'hugepages': { + 'postgres': { + 'shared_buffers': '15GB', + 'effective_cache_size': '30GB', + 'work_mem': '64MB', + 'max_worker_processes': 20, + 'max_parallel_workers_per_gather': 8, + 'max_parallel_workers': 12, + 'wal_buffers': '64MB', + 'autovacuum_max_workers': 3, + 'effective_io_concurrency': 100, + 'checkpoint_timeout': '5min', + 'huge_pages': 'on', + }, + 'use_init_container': True, + 'node_image': 'UBUNTU_CONTAINERD', + 'client_image': 'ubuntu:20.04', + }, + 'postgres-tuned': { + 'postgres': { + 'shared_buffers': '35GB', + 'effective_cache_size': '50GB', + 'work_mem': '256MB', + 'max_worker_processes': 32, + 'max_parallel_workers_per_gather': 12, + 'max_parallel_workers': 24, + 'wal_buffers': '512MB', + 'autovacuum_max_workers': 6, + 'wal_level': 'replica', + }, + 'use_init_container': True, + 'node_image': 'UBUNTU_CONTAINERD', + 'client_image': 'ubuntu:20.04', + }, + 'infra+postgres': { + 'postgres': { + 'shared_buffers': '35GB', + 'effective_cache_size': '50GB', + 'work_mem': '256MB', + 'max_worker_processes': 32, + 'max_parallel_workers_per_gather': 12, + 'max_parallel_workers': 24, + 'wal_buffers': '512MB', + 'autovacuum_max_workers': 6, + 'wal_level': 'replica', + }, + 'use_init_container': True, + 'node_image': 'COS_CONTAINERD', + 'client_image': 'ubuntu:24.04', + }, + 'infra+postgres+hugepages': { + 'postgres': { + 'shared_buffers': '35GB', + 'effective_cache_size': '50GB', + 'work_mem': '256MB', + 'max_worker_processes': 32, + 'max_parallel_workers_per_gather': 12, + 'max_parallel_workers': 24, + 'wal_buffers': '512MB', + 'autovacuum_max_workers': 6, + 'wal_level': 'replica', + 'huge_pages': 'on', + 'synchronous_commit': 'on', + 'log_line_prefix': '%t [%p]: [%l-1] user=%u,db=%d ', + 'log_checkpoints': 'on', + 'log_connections': 'on', + 'log_disconnections': 'on', + 'log_lock_waits': 'on', + 'log_temp_files': '0', + 'log_autovacuum_min_duration': '0', + 'log_error_verbosity': 'default', + 'client_min_messages': 'notice', + 'log_min_messages': 'warning', + 'log_min_error_statement': 'error', + 'log_min_duration_statement': '1000', + }, + 'use_init_container': True, + 'node_image': 'COS_CONTAINERD', + 'client_image': 'ubuntu:24.04', + }, + 'infra+postgres+hugepages+hostnetwork': { + 'postgres': { + 'shared_buffers': '35GB', + 'effective_cache_size': '50GB', + 'work_mem': '256MB', + 'max_worker_processes': 32, + 'max_parallel_workers_per_gather': 12, + 'max_parallel_workers': 24, + 'wal_buffers': '512MB', + 'autovacuum_max_workers': 6, + 'wal_level': 'replica', + 'huge_pages': 'on', + 'synchronous_commit': 'on', + 'log_line_prefix': '%t [%p]: [%l-1] user=%u,db=%d ', + 'log_checkpoints': 'on', + 'log_connections': 'on', + 'log_disconnections': 'on', + 'log_lock_waits': 'on', + 'log_temp_files': '0', + 'log_autovacuum_min_duration': '0', + 'log_error_verbosity': 'default', + 'client_min_messages': 'notice', + 'log_min_messages': 'warning', + 'log_min_error_statement': 'error', + 'log_min_duration_statement': '1000', + }, + 'use_init_container': True, + 'node_image': 'COS_CONTAINERD', + 'client_image': 'ubuntu:24.04', + 'host_network': True, + }, +} + + +def GetConfig(user_config: Dict[str, Any]) -> Dict[str, Any]: + """Load and return benchmark config spec. + + Args: + user_config: User provided configuration overrides. + + Returns: + Merged benchmark configuration. + """ + config = configs.LoadConfig(BENCHMARK_CONFIG, user_config, BENCHMARK_NAME) + + # Apply machine type overrides + if FLAGS.postgres_kubernetes_server_machine_type: + # Update postgres nodepool + vm_spec = config['container_cluster']['nodepools']['postgres']['vm_spec'] + for cloud in vm_spec: + vm_spec[cloud]['machine_type'] = FLAGS.postgres_kubernetes_server_machine_type + + # Update default root nodepool (if it exists) + if 'vm_spec' in config['container_cluster']: + root_vm_spec = config['container_cluster']['vm_spec'] + for cloud in root_vm_spec: + root_vm_spec[cloud][ + 'machine_type' + ] = FLAGS.postgres_kubernetes_server_machine_type + + if FLAGS.postgres_kubernetes_client_machine_type: + # Update nodepool + client_vm_spec = config['container_cluster']['nodepools']['clients'][ + 'vm_spec' + ] + for cloud in client_vm_spec: + client_vm_spec[cloud][ + 'machine_type' + ] = FLAGS.postgres_kubernetes_client_machine_type + + + + + + # Apply HugePages system config if needed (GCP Only) + if config.get('container_cluster', {}).get('cloud') == 'GCP' and ( + 'hugepages' in FLAGS.postgres_kubernetes_optimization_profile + or 'all-in-one' in FLAGS.postgres_kubernetes_optimization_profile + ): + logging.info('Enabling Dynamic HugePages via GKE System Config') + server_machine = config['container_cluster']['nodepools']['postgres'][ + 'vm_spec' + ]['GCP']['machine_type'] + + # Calculate dynamic HugePages needed + machine_family = server_machine.split('-')[0] + node_cpus = 16 + try: + node_cpus = int(server_machine.split('-')[2]) + except IndexError: + pass + + node_mem_gb = 60.0 + if machine_family in ['c4a', 'n4', 'n4a', 'n4d']: + node_mem_gb = node_cpus * 4.0 + elif machine_family == 'c4d': + node_mem_gb = node_cpus * 3.875 + elif machine_family == 'c4': + node_mem_gb = node_cpus * 3.75 + + pod_mem_gb = int(node_mem_gb * 0.85) + hugepage_mb = int(pod_mem_gb * 0.45) * 1024 + hugepage_size2m = int(hugepage_mb / 2) + + config_path = os.path.join(FLAGS.temp_dir, 'hugepages-node-config.yaml') + with open(config_path, 'w') as f: + f.write( + 'linuxConfig:\n hugepageConfig:\n hugepage_size2m:' + f' {hugepage_size2m}\n' + ) + + FLAGS.gke_node_system_config = config_path + + # Upgrade the default nodepool to match the server machine type + if 'vm_spec' not in config['container_cluster']: + config['container_cluster']['vm_spec'] = {'GCP': {}} + elif 'GCP' not in config['container_cluster']['vm_spec']: + config['container_cluster']['vm_spec']['GCP'] = {} + + config['container_cluster']['vm_spec']['GCP']['machine_type'] = server_machine + logging.info('Upgraded default cluster nodepool to %s to satisfy HugePages allocation requirements.', server_machine) + + return config + + +def _GetPostgresPassword() -> str: + """Get PostgreSQL password from run_uri.""" + return postgresql.GetPsqlUserPassword(FLAGS.run_uri) + + +def _GetNodeResources() -> tuple[float, int]: + """Gets allocatable memory in GB and CPU count from K8s node.""" + mem_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'get', + 'nodes', + '-o', + 'jsonpath={.items[0].status.allocatable.memory}', + ] + cpu_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'get', + 'nodes', + '-o', + 'jsonpath={.items[0].status.allocatable.cpu}', + ] + mem_stdout, _, _ = vm_util.IssueCommand(mem_cmd) + cpu_stdout, _, _ = vm_util.IssueCommand(cpu_cmd) + + if not mem_stdout or not cpu_stdout: + raise ValueError("Failed to retrieve node capacity from Kubernetes") + + mem_str = mem_stdout.strip() + if mem_str.endswith('Ki'): + node_mem_gb = int(mem_str[:-2]) / (1024 * 1024) + elif mem_str.endswith('Mi'): + node_mem_gb = int(mem_str[:-2]) / 1024 + elif mem_str.endswith('Gi'): + node_mem_gb = float(mem_str[:-2]) + else: + node_mem_gb = int(mem_str) / (1024 * 1024 * 1024) + + cpu_str = cpu_stdout.strip() + if cpu_str.endswith('m'): + node_cpus = max(1, int(float(cpu_str[:-1]) / 1000)) + else: + node_cpus = int(cpu_str) + + return node_mem_gb, node_cpus + + +def _GetDynamicResources() -> Dict[str, Any]: + """Dynamically calculates K8s resource limits based on Node capacity.""" + node_mem_gb, node_cpus = _GetNodeResources() + + return { + 'cpu_request': str(max(node_cpus - 2, 1)), + 'cpu_limit': str(max(node_cpus - 1, 1)), + 'memory_request': f'{int(node_mem_gb * 0.85)}Gi', + 'memory_limit': f'{int(node_mem_gb * 0.85)}Gi', + 'calculated_node_mem_gb': node_mem_gb, + } + + +def _GetPostgreSQLConfig() -> Dict[str, Any]: + """Get effective PostgreSQL configuration based on profile and flags. + + Returns: + Dictionary of PostgreSQL configuration parameters. + """ + # Start with baseline + profile = OPTIMIZATION_PROFILES[FLAGS.postgres_kubernetes_optimization_profile] + pg_config = BASE_POSTGRES_CONFIG.copy() + pg_config.update(OPTIMIZATION_PROFILES['baseline']['postgres']) + + dynamic_resources = _GetDynamicResources() + pod_mem_gb = int(dynamic_resources['calculated_node_mem_gb'] * 0.85) + + if 'postgres' in profile: + pg_config.update(profile['postgres']) + + # Apply Dynamic tuning based on profile aggressiveness + if ( + 'postgres-tuned' in FLAGS.postgres_kubernetes_optimization_profile + or 'all-in-one' in FLAGS.postgres_kubernetes_optimization_profile + or 'postgres' in FLAGS.postgres_kubernetes_optimization_profile + ): + pg_config['shared_buffers'] = f'{int(pod_mem_gb * 0.40)}GB' + pg_config['effective_cache_size'] = f'{int(pod_mem_gb * 0.75)}GB' + # If explicit HugePages mapping exists + if 'huge_pages' in profile['postgres']: + pg_config['huge_pages'] = profile['postgres']['huge_pages'] + else: + # Baseline/Infrastructure focused tunings defaults + pg_config['shared_buffers'] = f'{int(pod_mem_gb * 0.25)}GB' + pg_config['effective_cache_size'] = f'{int(pod_mem_gb * 0.50)}GB' + + return pg_config + + +def _PreparePostgreSQLCluster(bm_spec: benchmark_spec.BenchmarkSpec) -> None: + """Deploy PostgreSQL on the Kubernetes cluster. + + Args: + bm_spec: Benchmark specification. + """ + cluster = bm_spec.container_cluster + profile = OPTIMIZATION_PROFILES[FLAGS.postgres_kubernetes_optimization_profile] + + # Determine disk type for storage class + # Get machine type from config or flag + if FLAGS.postgres_kubernetes_server_machine_type: + machine_type = FLAGS.postgres_kubernetes_server_machine_type + else: + try: + # Try to get from benchmark config + machine_type = bm_spec.config.container_cluster.nodepools[ + 'postgres' + ].vm_spec['GCP']['machine_type'] + except (KeyError, AttributeError): + raise ValueError("Could not determine machine type from config. Please specify --postgres_kubernetes_server_machine_type") + + disk_type = FLAGS.data_disk_type + # Get Dynamic Resource Sizing + pg_config = _GetPostgreSQLConfig() + dynamic_resources = _GetDynamicResources() + pod_mem_gb = int(dynamic_resources['calculated_node_mem_gb'] * 0.85) + + hugepages = profile.get('hugepages') + + # If HugePages is enabled, calculate exact 2MB pages mapping dynamically + if ( + 'hugepages' in FLAGS.postgres_kubernetes_optimization_profile + or 'all-in-one' in FLAGS.postgres_kubernetes_optimization_profile + or 'hugepages' in profile + ): + hugepage_mb = int(pod_mem_gb * 0.45) * 1024 # 5% buffer over shared_buffers + hugepages = {'hugepage_size2m': int(hugepage_mb / 2), 'hugepage_size1g': 0} + pg_config['huge_pages'] = 'on' + + # Adjust standard K8s memory allocations downwards to leave RAM for HugePages + dynamic_resources['memory_request'] = ( + f"{int(dynamic_resources['calculated_node_mem_gb'] * 0.25)}Gi" + ) + dynamic_resources['memory_limit'] = ( + f"{int(dynamic_resources['calculated_node_mem_gb'] * 0.25)}Gi" + ) + + template_params = { + 'namespace': 'default', + 'postgres_version': '16', + 'postgres_user': 'benchmark', + 'postgres_password': _GetPostgresPassword(), + 'postgres_database': 'benchmark', + 'disk_size': f'{FLAGS.data_disk_size}Gi', + 'disk_type': disk_type, + 'use_init_container': profile.get('use_init_container', True), + 'host_network': profile.get('host_network', False), + 'client_image': profile.get('client_image', 'ubuntu:20.04'), + # Resource configuration from dynamic calculator + 'cpu_request': dynamic_resources['cpu_request'], + 'cpu_limit': dynamic_resources['cpu_limit'], + 'memory_request': dynamic_resources['memory_request'], + 'memory_limit': dynamic_resources['memory_limit'], + 'hugepages': hugepages, + **pg_config, # Include all PostgreSQL parameters + } + + # Apply manifests + kubernetes_commands.ApplyManifest( + 'container/postgres_sysbench/postgres_all.yaml.j2', **template_params + ) + + # Wait for PostgreSQL pod to be ready (not StatefulSet ready replicas) + try: + # First wait for pod to exist and be running + logging.info('Waiting for PostgreSQL pod to be ready (up to 30 minutes)...') + + @vm_util.Retry( + max_retries=3, + retryable_exceptions=( + errors.VmUtil.IssueCommandTimeoutError, + errors.VmUtil.IssueCommandError, + ), + ) + def _WaitForPodReady(): + cluster.WaitForResource( + 'pod/postgres-standalone-0', # resource_name + 'Ready', # condition_name + namespace='default', + timeout=1800, # 30 minutes for large deployments with HugePages + ) + + _WaitForPodReady() + logging.info('PostgreSQL pod is ready') + + # Verify PostgreSQL is actually accepting connections using active polling + logging.info('Polling for PostgreSQL connectivity...') + + @vm_util.Retry( + max_retries=12, + poll_interval=5, + retryable_exceptions=(errors.VmUtil.IssueCommandError,), + ) + def _WaitForPostgresReady(): + check_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-standalone-0', + '--', + 'pg_isready', + '-U', + 'benchmark', + '-d', + 'benchmark', + ] + vm_util.IssueCommand(check_cmd) + + # Check if we can execute a query + query_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-standalone-0', + '--', + 'bash', + '-c', + 'psql -U benchmark -d benchmark -c "SELECT 1"', + ] + vm_util.IssueCommand(query_cmd) + + _WaitForPostgresReady() + logging.info('PostgreSQL connectivity and query test successful') + + except Exception as e: + # If waiting fails, gather debug info + logging.error('PostgreSQL pod failed to become ready: %s', e) + + # Get pod details + describe_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'describe', + 'pod', + '-n', + 'default', + '-l', + 'app=postgres-standalone', + ] + stdout, _, _ = vm_util.IssueCommand(describe_cmd, raise_on_failure=False) + logging.error('Pod description:\n%s', stdout) + + # Get pod logs + logs_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'logs', + '-n', + 'default', + '-l', + 'app=postgres-standalone', + '--tail=100', + ] + stdout, _, _ = vm_util.IssueCommand(logs_cmd, raise_on_failure=False) + logging.error('Pod logs:\n%s', stdout) + + # Get events + events_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'get', + 'events', + '-n', + 'default', + '--sort-by=.lastTimestamp', + ] + stdout, _, _ = vm_util.IssueCommand(events_cmd, raise_on_failure=False) + logging.error('Recent events:\n%s', stdout) + + raise + + # Get Service IP + # Get Pod IP (more reliable for private IP requirement) + get_ip_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'get', + 'pod', + 'postgres-standalone-0', + '-n', + 'default', + '-o', + 'jsonpath={.status.podIP}', + ] + stdout, _, _ = vm_util.IssueCommand(get_ip_cmd) + service_ip = stdout.strip() if stdout else 'postgres-standalone-0' + + bm_spec.postgres_service_ip = service_ip + logging.info('PostgreSQL service available at: %s', service_ip) + + +def _PrepareSysbenchClient(bm_spec: benchmark_spec.BenchmarkSpec) -> None: + """Prepare Sysbench on client pods. + + Args: + bm_spec: Benchmark specification. + """ + # Deploy client pod and install sysbench + cluster = bm_spec.container_cluster + profile = OPTIMIZATION_PROFILES[FLAGS.postgres_kubernetes_optimization_profile] + + # Create K8s Secret for sysbench password (reviewer feedback) + logging.info('Creating sysbench-passwords secret...') + vm_util.IssueCommand([ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'delete', + 'secret', + 'sysbench-passwords', + '-n', + 'default', + '--ignore-not-found', + ]) + vm_util.IssueCommand([ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'create', + 'secret', + 'generic', + 'sysbench-passwords', + '--from-literal=benchmark-password=' + _GetPostgresPassword(), + '-n', + 'default', + ]) + + template_params = { + 'namespace': 'default', + 'client_image': profile.get('client_image', 'ubuntu:20.04'), + 'client_cpu_request': FLAGS.postgres_kubernetes_client_cpu_request, + 'client_cpu_limit': FLAGS.postgres_kubernetes_client_cpu_limit, + 'client_memory_request': FLAGS.postgres_kubernetes_client_memory_request, + 'client_memory_limit': FLAGS.postgres_kubernetes_client_memory_limit, + } + + kubernetes_commands.ApplyManifest( + 'container/postgres_sysbench/client_pod.yaml.j2', **template_params + ) + + # Wait for client pod - WaitForResource accepts namespace parameter + cluster.WaitForResource('pod/postgres-client', 'Ready', namespace='default') + + # Install sysbench and dependencies in pod + install_commands = [ + 'for i in {1..5}; do apt-get update && break || sleep 15; done', + ( + 'export DEBIAN_FRONTEND=noninteractive; for i in {1..3}; do apt-get' + ' install -y git build-essential automake libtool pkg-config && break' + ' || sleep 15; done' + ), + ( + 'export DEBIAN_FRONTEND=noninteractive; for i in {1..3}; do apt-get' + ' install -y libmysqlclient-dev libpq-dev && break || sleep 15; done' + ), + ( + 'export DEBIAN_FRONTEND=noninteractive; for i in {1..3}; do apt-get' + ' install -y sysbench postgresql-client && break || sleep 15; done' + ), + ] + + @vm_util.Retry( + max_retries=3, retryable_exceptions=(errors.VmUtil.IssueCommandError,) + ) + def _RunInstallCmd(install_cmd): + kubectl_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-client', + '--', + 'bash', + '-c', + install_cmd, + ] + vm_util.IssueCommand(kubectl_cmd) + + for cmd in install_commands: + _RunInstallCmd(cmd) + + +def _LoadDatabase(bm_spec: benchmark_spec.BenchmarkSpec) -> None: + """Load initial data into PostgreSQL using Sysbench. + + Args: + bm_spec: Benchmark specification. + """ + postgres_ip = bm_spec.postgres_service_ip + + # Build sysbench prepare command + sysbench_params = sysbench.SysbenchInputParameters( + db_driver='pgsql', + tables=FLAGS.sysbench_tables, + table_size=FLAGS.sysbench_table_size, + threads=FLAGS.sysbench_load_threads, + db_user='benchmark', + db_password=_GetPostgresPassword(), + db_name='benchmark', + host_ip=postgres_ip, + port=5432, + built_in_test=True, + test=f'{sysbench.LUA_SCRIPT_PATH}oltp_read_write.lua', + ) + + # Run in client pod + # Manually construct command to avoid VM-specific paths and secure password + lua_script = '/usr/share/sysbench/oltp_read_write.lua' + + cmd = ( + f'sysbench {lua_script} ' + '--db-driver=pgsql ' + f'--tables={FLAGS.sysbench_tables} ' + f'--table_size={FLAGS.sysbench_table_size} ' + f'--threads={FLAGS.sysbench_load_threads} ' + '--pgsql-user=benchmark ' + '--pgsql-db=benchmark ' + f'--pgsql-host={postgres_ip} ' + '--pgsql-port=5432 ' + 'prepare' + ) + + kubectl_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-client', + '--', + 'bash', + '-c', + cmd, + ] + vm_util.IssueCommand(kubectl_cmd) + + logging.info('Database loaded successfully') + + +def Prepare(bm_spec: benchmark_spec.BenchmarkSpec) -> None: + """Prepare PostgreSQL and Sysbench for benchmarking. + + Args: + bm_spec: Benchmark specification. + """ + prepare_fns = [ + functools.partial(_PreparePostgreSQLCluster, bm_spec), + functools.partial(_PrepareSysbenchClient, bm_spec), + ] + + background_tasks.RunThreaded(lambda f: f(), prepare_fns) + + # Load database after both PostgreSQL and client are ready + _LoadDatabase(bm_spec) + + +def Run(bm_spec: benchmark_spec.BenchmarkSpec) -> List[sample.Sample]: + """Run Sysbench against PostgreSQL. + + Args: + bm_spec: Benchmark specification. + + Returns: + List of performance samples. + """ + postgres_ip = bm_spec.postgres_service_ip + samples = [] + + # Get list of workload types to run + workload_types = ( + FLAGS.sysbench_testname.split(',') + if ',' in FLAGS.sysbench_testname + else [FLAGS.sysbench_testname] + ) + + for workload in workload_types: + # Build sysbench run command + sysbench_params = sysbench.SysbenchInputParameters( + db_driver='pgsql', + tables=FLAGS.sysbench_tables, + table_size=FLAGS.sysbench_table_size, + threads=FLAGS.sysbench_run_threads, + report_interval=FLAGS.sysbench_report_interval, + db_user='benchmark', + db_password=_GetPostgresPassword(), + db_name='benchmark', + host_ip=postgres_ip, + port=5432, + built_in_test=True, + test=f'{sysbench.LUA_SCRIPT_PATH}{workload}.lua', + ) + + # Execute benchmark + # Stability: Update statistics and flush buffers + # Same logic as HA benchmark for consistency + logging.info('Running ANALYZE to update statistics for benchmark tables...') + for i in range(1, FLAGS.sysbench_tables + 1): + table_name = f'sbtest{i}' + analyze_cmd = ( + f'psql -h {postgres_ip} -U benchmark -d benchmark -c "ANALYZE' + f' {table_name};"' + ) + kubectl_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-client', + '--', + 'bash', + '-c', + analyze_cmd, + ] + vm_util.IssueCommand(kubectl_cmd) + + logging.info('Executing 3 Checkpoints to flush buffers...') + checkpoint_cmd = ( + f'psql -h {postgres_ip} -U benchmark -d benchmark -c "CHECKPOINT;"' + ) + kubectl_chk = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-client', + '--', + 'bash', + '-c', + checkpoint_cmd, + ] + + for i in range(3): + logging.info('Issuing Checkpoint %d/3', i + 1) + vm_util.IssueCommand(kubectl_chk) + + # Manually construct command for Pod mode + lua_script = f'/usr/share/sysbench/{workload}.lua' + + run_cmd = ( + f'sysbench {lua_script} ' + '--db-driver=pgsql ' + f'--tables={FLAGS.sysbench_tables} ' + f'--table_size={FLAGS.sysbench_table_size} ' + f'--threads={FLAGS.sysbench_run_threads} ' + f'--report-interval={FLAGS.sysbench_report_interval} ' + f'--time={FLAGS.sysbench_run_seconds} ' + '--pgsql-user=benchmark ' + '--pgsql-db=benchmark ' + f'--pgsql-host={postgres_ip} ' + '--pgsql-port=5432 ' + 'run' + ) + + kubectl_cmd = [ + FLAGS.kubectl, + '--kubeconfig', + FLAGS.kubeconfig, + 'exec', + '-n', + 'default', + 'postgres-client', + '--', + 'bash', + '-c', + run_cmd, + ] + stdout, _, _ = vm_util.IssueCommand( + kubectl_cmd, timeout=FLAGS.sysbench_run_seconds + 120 + ) + logging.info('Sysbench completed successfully on pod') + + # Log output for debugging + logging.debug( + 'Sysbench output (first 500 chars): %s', + stdout[:500] if stdout else 'No output', + ) + + # Parse sysbench output + metadata = sysbench.GetMetadata(sysbench_params) + machine_type = FLAGS.postgres_kubernetes_server_machine_type or 'c4-standard-16' + pg_conf = _GetPostgreSQLConfig() + metadata.update({ + 'optimization_profile': FLAGS.postgres_kubernetes_optimization_profile, + 'postgres_shared_buffers': pg_conf['shared_buffers'], + 'postgres_effective_cache_size': pg_conf['effective_cache_size'], + 'machine_type': machine_type, + 'disk_type': FLAGS.data_disk_type or 'auto', + 'workload_type': workload, + }) + + # Parse sysbench output + try: + time_series_samples = sysbench.ParseSysbenchTimeSeries(stdout, metadata) + samples.extend(time_series_samples) + logging.info('Parsed %d time series samples', len(time_series_samples)) + except Exception as e: + logging.warning('Failed to parse time series: %s', e) + + try: + latency_samples = sysbench.ParseSysbenchLatency([stdout], metadata) + samples.extend(latency_samples) + logging.info('Parsed %d latency samples', len(latency_samples)) + except Exception as e: + logging.warning('Failed to parse latency: %s', e) + + try: + transaction_samples = sysbench.ParseSysbenchTransactions(stdout, metadata) + samples.extend(transaction_samples) + logging.info('Parsed %d transaction samples', len(transaction_samples)) + except Exception as e: + logging.warning('Failed to parse transactions: %s', e) + + if not samples: + logging.error( + 'No samples parsed from sysbench output. Output was: %s', + stdout[:1000], + ) + + logging.info('Total samples collected: %d', len(samples)) + return samples + + +def Cleanup(bm_spec: benchmark_spec.BenchmarkSpec) -> None: + """Clean up PostgreSQL resources. + + Args: + bm_spec: Benchmark specification. + """ + # PKB container cluster lifecycle handles namespace and cluster deletion, + # which automatically garbage-collects Pods, StatefulSets, and PVCs. + pass