Align datastore_search timestamp output with the dump endpoint#9
Conversation
Both `datastore_search` and the `datastore_dump` (CSV / NDJSON) now emit TIMESTAMP / DATETIME columns in the same fixed-shape ISO 8601 form — `2026-06-08T00:00:00`, UTC implicit, no offset, no fractional. Round-trip works in both directions: BigQuery's `CAST(STRING AS TIMESTAMP)` accepts the no-TZ form and assumes UTC, so a dumped value can be re-upserted unchanged.
- New `format_select_column(name, bq_type)` helper in `bigquery/lib.py` — single source of truth for the SELECT-list expression. TIMESTAMP -> `FORMAT_TIMESTAMP('%Y-%m-%dT%H:%M:%S', col, 'UTC') AS col`; DATETIME -> `FORMAT_DATETIME('%Y-%m-%dT%H:%M:%S', col) AS col`; everything else passes through.
- `backend.py::_build_export_select` (dump) — thin loop calling the helper. Same SQL as before, no behaviour change.
- `bigquery/search.py::_project_column` — wraps Frictionless `datetime` columns by translating to BQ `TIMESTAMP` via `bigquery_type` and calling the helper. `_project_column` is used by both `build_search` and `build_count` so DISTINCT semantics stay consistent between the data query and the count.
- `build_search` ORDER BY now references `t.\`col\`` rather than the bare column name — the projection alias for datetime columns shouldn't shadow the native TIMESTAMP at sort time. ISO 8601 sorts lexicographically the same as the underlying TIMESTAMP, so this is defensive future-proofing.
- `datastore_search_sql` is intentionally untouched (user-supplied SQL stays verbatim).
Tests:
- `tests/engines/bigquery/test_tables.py` — SQL-shape assertions updated for the new ORDER BY and the wrapped `_updated_at` projection.
- `tests/test_datastore_dump.py` — already pinned to the new format.
- 3 `NotFoundError` matchers in `test_tables.py` flipped "not declared" -> "not found" to match the shortened messages on main (upsert / search / info paths).
Test infrastructure:
- `tests/conftest.py` forces `AUTH_TYPE=ckan` (not setdefault) so a developer .env that has `AUTH_TYPE=anonymous` can't silently reroute the dict-resource create branch + the read-only force guard, which would skip ~7 endpoint tests locally. CI is unaffected (no .env). The CKAN_URL line above stays `setdefault` for the same reason.
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR standardizes BigQuery datetime output across search, export, and count operations by introducing a shared column formatter that converts TIMESTAMP/DATETIME to UTC ISO-8601 strings without fractional seconds. Error messages for missing resources are unified, and test infrastructure enforces consistent CKAN authentication throughout the suite. ChangesDateTime Formatting & Query Standardization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
datastore_searchanddatastore_dump(CSV / NDJSON) now emitTIMESTAMP/DATETIMEcolumns in the same fixed ISO 8601 shape —2026-06-08T00:00:00, UTC implicit, no offset, no fractional. Round-trip stays clean: BigQuery'sCAST(STRING AS TIMESTAMP)accepts the no-TZ form and assumes UTC, so a dumped value can be re-upserted unchanged.datastore_search_sqlis intentionally untouched — user SQL stays verbatim.Why
Before this change, a single TIMESTAMP value rendered three different ways:
datastore_dump(CSV / NDJSON)"2026-06-08T00:00:00"datastore_search— JSON"2026-06-08T00:00:00+00:00"(orjson default)datastore_search— CSV / TSV"2026-06-08 00:00:00+00:00"(str(datetime) via csv.writer)Clients that string-compare or naively parse ISO 8601 hit a footgun. Unifying on a single BigQuery-side cast removes the divergence at source.
Code change
bigquery/lib.py— newformat_select_column(name, bq_type)helper. Single source of truth for the SELECT-list expression. TIMESTAMP →FORMAT_TIMESTAMP('%Y-%m-%dT%H:%M:%S', col, 'UTC') AS col; DATETIME →FORMAT_DATETIME(...); other types pass through.backend.py::_build_export_select(dump) — thin loop calling the helper. No behaviour change; existing dump tests still pass byte-for-byte.bigquery/search.py::_project_column— translates Frictionlessdatetime→ BQTIMESTAMPviabigquery_typeand calls the same helper. Used by bothbuild_searchandbuild_countso the data query and the DISTINCT count agree.build_searchORDER BY now referencest.\col`` rather than the bare column — defensive against the projection alias shadowing the native TIMESTAMP at sort time. ISO 8601 happens to lex-sort the same way, so behaviour is unchanged today; the explicit reference future-proofs us against format tweaks.Trade-offs called out
STRINGfor the columns that used to be TIMESTAMP. The Frictionless schema returned bydatastore_infostill saysdatetime— we're just formatting the value at read time, not changing storage.Tests
tests/engines/bigquery/test_tables.py— SQL-shape assertions updated for the new ORDER BY (t.\col`) and the wrapped_updated_at` projection.tests/test_datastore_dump.py— already pinned to the new format.NotFoundErrormatchers flipped"not declared"→"not found"to match the shortened messages on main'supsert/search/infopaths (the delete + dump paths kept their longer messages and aren't touched).Test-infra: hermetic
AUTH_TYPEin conftestWhile debugging the test suite locally I tripped over an existing latent issue:
tests/conftest.pyalready neutralizes BigQuery env vars andCKAN_URL, but notAUTH_TYPE. A developer.envcontainingAUTH_TYPE=anonymoussilently reroutes the dict-resource create branch and skips the read-only force guard — ~7 endpoint tests start failing locally even though CI is fine (no.env).Fix in this PR: force
os.environ["AUTH_TYPE"] = "ckan"at the top ofconftest.py(notsetdefault, because the dev.envis set). The CKAN_URL line above stayssetdefaultso a CI-supplied value still wins. Bundled here because it's directly tied to the test updates.Verification
pytest -q: 333 passing, no skips, no flake.ruff check: clean.FORMAT_TIMESTAMP('%Y-%m-%dT%H:%M:%S', \col`, 'UTC') AS `col`; dump produces an identical column expression inEXPORT DATA`.Summary by CodeRabbit
Bug Fixes
Improvements
YYYY-MM-DDTHH:MM:SS) without timezone offsets or fractional seconds.