SQL API: Add schema-aware array-rows response + queryRef/schemaRef caching (Kalam-link friendly)

## Problem
Today `/v1/api/sql` returns result rows as JSON objects plus a `columns` list, and the “typed” mode wraps each value. This makes it hard for Kalam-link to:
- reliably understand the returned schema (types/nullability/order)
- decode efficiently (objects are heavier than arrays)
- cache decoding metadata safely across requests (schema drift vs query hash)

Relevant code:
- Request model: sql_request.rs
- Response model: sql_response.rs
- Handler: sql_handler.rs
- ScalarValue→JSON conversions: arrow_json_conversion.rs

## Goals
1. Make the REST SQL response self-describing (Arrow schema), so clients know exact output types/order.
2. Make the payload efficient (rows as arrays, stable column order).
3. Enable safe client+server caching:
   - Server caches “prepared query template” (queryRef)
   - Client caches schema decoding (schemaRef → schema)
4. Keep ScalarValue→JSON conversion logic centralized (no duplicate conversion paths).
5. Return a schema with datatype of: KalamDataType

## Non-goals (for this issue)
- Changing WebSocket payload formats (can follow later, but this issue targets REST `/v1/api/sql`).
- Introducing a full GraphQL layer or persisted-query registry UI.

## Proposed API changes

### New request fields (backward-compatible)
Extend the existing request model with optional fields:

- `queryRef?: { version: number, sha256: string }`
- `sql?: string` (already exists; allow either `sql` OR `queryRef`)
- `params?: JsonValue[]` (already exists)
- `includeSchema?: boolean` (default `false` when client sends `knownSchemaRefs`, else `true` on first query)
- `knownSchemaRefs?: string[]` (client tells server which schema hashes it already has)
- `resultFormat?: "array_rows" | "object_rows"` (default `object_rows` for backward compat; Kalam-link uses `array_rows`)

Notes:
- Keep `serialization_mode` as-is (simple vs typed value encoding), but **do not change row shape by mode**.
- Multi-statement + params stays rejected (current behavior), unless expanded later.

### New response fields (for SELECT-like results)
For each `QueryResult` (or a new `QueryResultV2`):
- `queryRef?: { version: number, sha256: string }` (returned when request used `sql`)
- `schemaRef?: { version: number, sha256: string }` (always returned for row results)
- `schema?: { fields: [...] , metadata?: ... }` (returned only when `includeSchema=true` OR client doesn’t have schemaRef)
- `rows`: array of rows
  - when `resultFormat="array_rows"`: `rows: JsonValue[][]`
  - when `resultFormat="object_rows"`: keep current `rows: object[]`
- Deprecate `columns` when schema is present (or keep temporarily but treat as redundant)

### Optional endpoint (recommended for robustness)
- `GET /v1/api/schema/{schemaSha256}`
  - Returns `{ schemaRef, schema }`
  - Lets clients recover schema without re-running the query.

## Hashing / caching semantics (critical)

### schemaRef (correctness)
- `schemaRef.sha256 = sha256(canonical_arrow_schema_json)`
- Canonicalization requirements:
  - stable field order
  - include name, data_type, nullable (and ideally metadata)
- Must be computed from the **output schema** of the query (post-projection), not just table schema.

### queryRef (server perf)
Used to let the client send only `{queryRef, params}` on subsequent calls.

- `queryRef.sha256` should be derived from at least:
  - normalized SQL template (no params embedded)
  - relevant session settings that can affect planning/schema
  - auth/tenant scope key (so we never reuse a plan across incompatible permission contexts)

A safe starting scope key could include: user role + namespace_id + any “current user” context.

Also define server cache behavior:
- LRU + TTL (avoid unbounded memory) use Moka
- Why: async-safe, TTL + size-based eviction, lock-free reads
- Cache entries invalidated on schemaRef change, or simply allow schemaRef mismatch to trigger schema resend.

## Concrete examples (end-to-end)

### Step 1: First call (client only knows SQL)
Request:
```json
POST /v1/api/sql
{
  "sql": "select * from chat.messages where conversation_id = $1",
  "params": ["conv_123"],
  "serialization_mode": "simple",
  "resultFormat": "array_rows",
  "includeSchema": true
}
```

Response:
```json
200 OK
{
  "status": "success",
  "results": [
    {
      "queryRef": { "version": 1, "sha256": "q_7a1b...e21" },
      "schemaRef": { "version": 1, "sha256": "s_55c9...91a" },
      "schema": {
        "fields": [
          {"name":"msg_id","data_type":"Int64","nullable":false},
          {"name":"conversation_id","data_type":"Utf8","nullable":false},
          {"name":"from","data_type":"Utf8","nullable":false},
          {"name":"timestamp","data_type":"Timestamp(Microsecond, None)","nullable":false},
          {"name":"content","data_type":"Utf8","nullable":true},
          {"name":"metadata","data_type":"Utf8","nullable":true}
        ]
      },
      "rows": [
        ["123456789", "conv_123", "alice", 1699000000000000, "Hello", null]
      ],
      "row_count": 1
    }
  ],
  "took": 15.0
}
```

### Step 2: Subsequent call (client sends queryRef + params only)
Request:
```json
POST /v1/api/sql
{
  "queryRef": { "version": 1, "sha256": "q_7a1b...e21" },
  "params": ["conv_999"],
  "serialization_mode": "simple",
  "resultFormat": "array_rows",
  "knownSchemaRefs": ["s_55c9...91a"]
}
```

Response (schema omitted, but schemaRef included):
```json
200 OK
{
  "status": "success",
  "results": [
    {
      "schemaRef": { "version": 1, "sha256": "s_55c9...91a" },
      "rows": [
        ["123456790", "conv_999", "bob", 1699000000001234, "Hi", null]
      ],
      "row_count": 1
    }
  ],
  "took": 3.2
}
```

### Step 3: Schema drift (ALTER TABLE / view change / SELECT * expansion)
Client sends same request as Step 2; server detects schemaRef changed and returns schema again:
```json
200 OK
{
  "status": "success",
  "results": [
    {
      "schemaRef": { "version": 1, "sha256": "s_a912...44f" },
      "schema": {
        "fields": [
          {"name":"msg_id","data_type":"Int64","nullable":false},
          "...",
          {"name":"edited_at","data_type":"Timestamp(Microsecond, None)","nullable":true}
        ]
      },
      "rows": [
        ["123456789", "conv_123", "alice", 1699000000000000, "Hello", null, null]
      ],
      "row_count": 1
    }
  ],
  "took": 3.4
}
```

## Important considerations / pitfalls
- Do not key server queryRef cache only by SQL text; it must be scoped to avoid cross-user leakage and schema mismatches.
- schemaRef should be based on output schema, not physical table schema.
- Current masking logic in the handler should be reviewed for array row format (masking by column name still possible, but row shape changes).
- Keep ScalarValue→JSON conversion centralized (extend conversion utilities rather than adding one-off conversions).

## Implementation sketch (high level)
1. Add new response types/fields in sql_response.rs
   - Support `rows` as either array-rows or object-rows (feature-flagged by `resultFormat`)
   - Add `schemaRef`, optional `schema`, optional `queryRef`
2. Extend request model in sql_request.rs
3. Add schema serialization helper + sha256 hashing
   - Canonical Arrow schema JSON + sha256
4. Extend conversion in arrow_json_conversion.rs
   - Add RecordBatch → array-rows conversion that still calls the single ScalarValue→JSON function
5. Update handler logic in sql_handler.rs
   - If request has `queryRef`, resolve cached SQL/plan; else compute and return `queryRef`
   - Always attach `schemaRef`; include `schema` only if requested/needed
6. Add tests
   - schemaRef stable for same schema
   - schema omitted when `knownSchemaRefs` contains it
   - schema included when schema changes

## Acceptance criteria
- REST SQL endpoint supports `resultFormat="array_rows"` with stable ordering matching Arrow schema fields.
- Response includes `schemaRef` for row-returning statements.
- Response includes `schema` only when needed (first call / cache miss / schema drift).
- Client can send `queryRef + params` without sending SQL on subsequent calls.
- No duplicate ScalarValue→JSON conversion logic introduced.
- Backward compatibility preserved (existing clients can keep using object rows).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL API: Add schema-aware array-rows response + queryRef/schemaRef caching (Kalam-link friendly) #50

Problem

Goals

Non-goals (for this issue)

Proposed API changes

New request fields (backward-compatible)

New response fields (for SELECT-like results)

Optional endpoint (recommended for robustness)

Hashing / caching semantics (critical)

schemaRef (correctness)

queryRef (server perf)

Concrete examples (end-to-end)

Step 1: First call (client only knows SQL)

Step 2: Subsequent call (client sends queryRef + params only)

Step 3: Schema drift (ALTER TABLE / view change / SELECT * expansion)

Important considerations / pitfalls

Implementation sketch (high level)

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SQL API: Add schema-aware array-rows response + queryRef/schemaRef caching (Kalam-link friendly) #50

Description

Problem

Goals

Non-goals (for this issue)

Proposed API changes

New request fields (backward-compatible)

New response fields (for SELECT-like results)

Optional endpoint (recommended for robustness)

Hashing / caching semantics (critical)

schemaRef (correctness)

queryRef (server perf)

Concrete examples (end-to-end)

Step 1: First call (client only knows SQL)

Step 2: Subsequent call (client sends queryRef + params only)

Step 3: Schema drift (ALTER TABLE / view change / SELECT * expansion)

Important considerations / pitfalls

Implementation sketch (high level)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions