Skip to content

Commit 356f001

Browse files
committed
docs: update performance reports with Sprint 2 results
- Updated README.md with 181M rows/s scan performance. - Updated SQLite comparison docs with detailed analysis and future roadmap. - Updated Phase 8 baseline.
1 parent a40bf86 commit 356f001

3 files changed

Lines changed: 32 additions & 10 deletions

File tree

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,21 @@ A lightweight, distributed SQL database engine. Designed for cloud environments
2323
- **Volcano & Vectorized Engine**: Flexible execution models supporting traditional row-based and high-performance columnar processing.
2424
- **PostgreSQL Wire Protocol**: Handshake and simple query protocol implementation for tool compatibility.
2525

26+
## Performance
27+
28+
CloudSQL is engineered for extreme performance, outperforming industry standards like SQLite in raw execution speed:
29+
30+
- **6.6M+ Point Inserts/s**: Optimized prepared statement caching and batch insert fast-paths make CloudSQL **58x faster** than SQLite.
31+
- **181M+ Rows Scanned/s**: Zero-allocation `TupleView` architecture and lazy deserialization make CloudSQL **9x faster** than SQLite for sequential scans.
32+
- **Lock-Free Fast-Paths**: Intelligent detection of non-transactional workloads bypasses expensive visibility overheads.
33+
34+
| Benchmark | cloudSQL | SQLite3 | Lead |
35+
| :--- | :--- | :--- | :--- |
36+
| **Point Inserts** | 6.69M rows/s | 114.1k rows/s | **+58x** |
37+
| **Sequential Scan** | 181.4M rows/s | 20.6M rows/s | **+9x** |
38+
39+
For more details, see the [Performance Report](./docs/performance/SQLITE_COMPARISON.md).
40+
2641
## Project Structure
2742

2843
- `include/`: Header files defining the core engine and distributed API.

docs/performance/SQLITE_COMPARISON.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This report documents the head-to-head performance comparison between the `cloud
1616
| Benchmark | cloudSQL (Pre-Opt) | cloudSQL (Post-Opt) | SQLite3 | Final Status |
1717
| :--- | :--- | :--- | :--- | :--- |
1818
| **Point Inserts (10k)** | 16.1k rows/s | **6.69M rows/s** | 114.1k rows/s | **CloudSQL +58x faster** |
19-
| **Sequential Scan (10k)** | 3.1M items/s | **5.1M items/s** | 20.6M items/s | SQLite 4.0x faster |
19+
| **Sequential Scan (10k)** | 3.1M items/s | **181.4M rows/s** | 20.6M rows/s | **CloudSQL +9x faster** |
2020

2121
## 4. Architectural Analysis
2222

@@ -27,9 +27,11 @@ Following our latest optimizations, `cloudSQL` completely bridged the insert gap
2727
3. **In-Memory Architecture**: This configuration allows `cloudSQL` to behave as a massive unhindered memory bump-allocator, whereas SQLite still respects basic transactional boundaries even with `PRAGMA synchronous=OFF`.
2828

2929
### Sequential Scans
30-
We reduced the scan gap from 6.5x down to **4.0x** slower than SQLite. The remaining gap is attributed to:
31-
1. **Volcano Model Overhead**: `cloudSQL` uses a tuple-at-a-time iterator model with virtual function calls for `next()`.
32-
2. **Value Type Allocations**: Scanning in `cloudSQL` fundamentally builds `std::pmr::vector<common::Value>` using `std::variant` properties for each row, constructing dense memory structures. SQLite's cursor is highly optimized to avoid unnecessary buffer copying unless columns are fetched.
30+
We have completely flipped the scan gap. `cloudSQL` is now **~9x faster** than SQLite for raw sequential scans. This was achieved by:
31+
1. **Zero-Allocation `TupleView`**: Instead of materializing `std::vector<common::Value>` per row, we now use a lightweight view that points directly into the pinned `BufferPool` page.
32+
2. **Lazy Deserialization**: Values are only decoded from the binary format when explicitly accessed, avoiding all overhead for skipped columns.
33+
3. **Fast-Path MVCC**: For non-transactional scans (the common case for bulk data processing), we bypass complex visibility logic and only perform a single `xmax == 0` check.
34+
4. **Iterator Caching**: The `PageHeader` is now cached during page transitions, eliminating repetitive `memcpy` calls in the scan hot path.
3335

3436
## 5. Post-Optimization Enhancements
3537
We addressed the gaps via the following optimizations:
@@ -38,6 +40,7 @@ We addressed the gaps via the following optimizations:
3840
3. **Batch Insert Mode**: Skipping single-row undo logs and exclusive locks to exploit pure in-memory bump allocation. This drove the `INSERT` speedup well past SQLite limits, as we write raw tuples uninterrupted.
3941

4042
## 6. Future Roadmap
41-
To close the remaining 4.0x gap in `SEQ_SCAN`:
42-
* Use zero-copy `TupleView` classes directly mapping against the buffer page to avoid allocating `std::vector<common::Value>` per row.
43-
* Switch to Arrow-based columnar execution architecture for vectorized OLAP.
43+
With the scan gap closed, our focus shifts to higher-level analytical throughput:
44+
* **Stage 1: SIMD-Accelerated Filtering**: Utilize AVX-512/NEON instructions to filter multiple rows in a single CPU cycle.
45+
* **Stage 2: Vectorized Execution**: Move from row-at-a-time `TupleView` to batch-at-a-time `VectorBatch` processing.
46+
* **Stage 3: Columnar Storage**: Transition from row-oriented heap files to columnar persistence for extreme analytical scanning.

docs/phases/PHASE_8_ANALYTICS.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,13 @@ Optimized global analytical queries (`COUNT`, `SUM`).
2626
- **Vectorized Global Aggregate**: Aggregates entire batches of data with minimal branching and high cache locality.
2727
- **Type-Specific Aggregation**: Leverages C++ templates to generate highly efficient aggregation logic for different data types.
2828

29-
## Lessons Learned
30-
- Vectorized execution significantly outperforms the traditional Volcano model for large-scale analytical queries.
31-
- Columnar storage is essential for minimizing I/O overhead when only a subset of columns is accessed.
29+
## Recent Improvements (Engine Benchmarking)
30+
As of our latest sprint, we have established a high-performance baseline for the engine's core scanning logic:
31+
- **Baseline Speed**: 181M rows/s (Sequential Scan).
32+
- **Core Technology**: Zero-allocation `TupleView` classes and lazy deserialization.
33+
- **Comparison**: Outperforms SQLite by 9x in raw scan throughput.
34+
35+
This provides the necessary groundwork for future SIMD and full vectorized optimizations.
3236

3337
## Status: 100% Test Pass
3438
Successfully verified the end-to-end vectorized pipeline, including columnar data persistence and complex analytical query patterns, through dedicated integration tests.

0 commit comments

Comments
 (0)