docs: update performance reports with Sprint 2 results

poyrazK · poyrazK · commit 356f0013c457 · 2026-04-10T23:04:53.000+03:00
- Updated README.md with 181M rows/s scan performance.
- Updated SQLite comparison docs with detailed analysis and future roadmap.
- Updated Phase 8 baseline.
diff --git a/README.md b/README.md
@@ -23,6 +23,21 @@ A lightweight, distributed SQL database engine. Designed for cloud environments
 - **Volcano & Vectorized Engine**: Flexible execution models supporting traditional row-based and high-performance columnar processing.
 - **PostgreSQL Wire Protocol**: Handshake and simple query protocol implementation for tool compatibility.
 
+## Performance
+
+CloudSQL is engineered for extreme performance, outperforming industry standards like SQLite in raw execution speed:
+
+- **6.6M+ Point Inserts/s**: Optimized prepared statement caching and batch insert fast-paths make CloudSQL **58x faster** than SQLite.
+- **181M+ Rows Scanned/s**: Zero-allocation `TupleView` architecture and lazy deserialization make CloudSQL **9x faster** than SQLite for sequential scans.
+- **Lock-Free Fast-Paths**: Intelligent detection of non-transactional workloads bypasses expensive visibility overheads.
+
+| Benchmark | cloudSQL | SQLite3 | Lead |
+| :--- | :--- | :--- | :--- |
+| **Point Inserts** | 6.69M rows/s | 114.1k rows/s | **+58x** |
+| **Sequential Scan** | 181.4M rows/s | 20.6M rows/s | **+9x** |
+
+For more details, see the [Performance Report](./docs/performance/SQLITE_COMPARISON.md).
+
 ## Project Structure
 
 - `include/`: Header files defining the core engine and distributed API.
diff --git a/docs/performance/SQLITE_COMPARISON.md b/docs/performance/SQLITE_COMPARISON.md
@@ -16,7 +16,7 @@ This report documents the head-to-head performance comparison between the `cloud
 | Benchmark | cloudSQL (Pre-Opt) | cloudSQL (Post-Opt) | SQLite3 | Final Status |
 | :--- | :--- | :--- | :--- | :--- |
 | **Point Inserts (10k)** | 16.1k rows/s | **6.69M rows/s** | 114.1k rows/s | **CloudSQL +58x faster** |
-| **Sequential Scan (10k)** | 3.1M items/s | **5.1M items/s** | 20.6M items/s | SQLite 4.0x faster |
+| **Sequential Scan (10k)** | 3.1M items/s | **181.4M rows/s** | 20.6M rows/s | **CloudSQL +9x faster** |
 
 ## 4. Architectural Analysis
 
@@ -27,9 +27,11 @@ Following our latest optimizations, `cloudSQL` completely bridged the insert gap
 3.  **In-Memory Architecture**: This configuration allows `cloudSQL` to behave as a massive unhindered memory bump-allocator, whereas SQLite still respects basic transactional boundaries even with `PRAGMA synchronous=OFF`.
 
 ### Sequential Scans
-We reduced the scan gap from 6.5x down to **4.0x** slower than SQLite. The remaining gap is attributed to:
-1.  **Volcano Model Overhead**: `cloudSQL` uses a tuple-at-a-time iterator model with virtual function calls for `next()`.
-2.  **Value Type Allocations**: Scanning in `cloudSQL` fundamentally builds `std::pmr::vector<common::Value>` using `std::variant` properties for each row, constructing dense memory structures. SQLite's cursor is highly optimized to avoid unnecessary buffer copying unless columns are fetched.
+We have completely flipped the scan gap. `cloudSQL` is now **~9x faster** than SQLite for raw sequential scans. This was achieved by:
+1.  **Zero-Allocation `TupleView`**: Instead of materializing `std::vector<common::Value>` per row, we now use a lightweight view that points directly into the pinned `BufferPool` page.
+2.  **Lazy Deserialization**: Values are only decoded from the binary format when explicitly accessed, avoiding all overhead for skipped columns.
+3.  **Fast-Path MVCC**: For non-transactional scans (the common case for bulk data processing), we bypass complex visibility logic and only perform a single `xmax == 0` check.
+4.  **Iterator Caching**: The `PageHeader` is now cached during page transitions, eliminating repetitive `memcpy` calls in the scan hot path.
 
 ## 5. Post-Optimization Enhancements
 We addressed the gaps via the following optimizations:
@@ -38,6 +40,7 @@ We addressed the gaps via the following optimizations:
 3.  **Batch Insert Mode**: Skipping single-row undo logs and exclusive locks to exploit pure in-memory bump allocation. This drove the `INSERT` speedup well past SQLite limits, as we write raw tuples uninterrupted.
 
 ## 6. Future Roadmap
-To close the remaining 4.0x gap in `SEQ_SCAN`:
-*   Use zero-copy `TupleView` classes directly mapping against the buffer page to avoid allocating `std::vector<common::Value>` per row.
-*   Switch to Arrow-based columnar execution architecture for vectorized OLAP.
+With the scan gap closed, our focus shifts to higher-level analytical throughput:
+*   **Stage 1: SIMD-Accelerated Filtering**: Utilize AVX-512/NEON instructions to filter multiple rows in a single CPU cycle.
+*   **Stage 2: Vectorized Execution**: Move from row-at-a-time `TupleView` to batch-at-a-time `VectorBatch` processing.
+*   **Stage 3: Columnar Storage**: Transition from row-oriented heap files to columnar persistence for extreme analytical scanning.
diff --git a/docs/phases/PHASE_8_ANALYTICS.md b/docs/phases/PHASE_8_ANALYTICS.md
@@ -26,9 +26,13 @@ Optimized global analytical queries (`COUNT`, `SUM`).
 - **Vectorized Global Aggregate**: Aggregates entire batches of data with minimal branching and high cache locality.
 - **Type-Specific Aggregation**: Leverages C++ templates to generate highly efficient aggregation logic for different data types.
 
-## Lessons Learned
-- Vectorized execution significantly outperforms the traditional Volcano model for large-scale analytical queries.
-- Columnar storage is essential for minimizing I/O overhead when only a subset of columns is accessed.
+## Recent Improvements (Engine Benchmarking)
+As of our latest sprint, we have established a high-performance baseline for the engine's core scanning logic:
+- **Baseline Speed**: 181M rows/s (Sequential Scan).
+- **Core Technology**: Zero-allocation `TupleView` classes and lazy deserialization.
+- **Comparison**: Outperforms SQLite by 9x in raw scan throughput.
+
+This provides the necessary groundwork for future SIMD and full vectorized optimizations.
 
 ## Status: 100% Test Pass
 Successfully verified the end-to-end vectorized pipeline, including columnar data persistence and complex analytical query patterns, through dedicated integration tests.