This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a DuckDB extension that adds cryptographic hash functions, HMAC calculation capabilities, and cryptographically secure random byte generation using OpenSSL and BLAKE3.
The extension is implemented in C++ and uses OpenSSL's EVP API for most cryptographic operations, with BLAKE3 provided by a vendored library:
- C++ implementation (
src/):src/crypto_extension.cpp: Main extension entry point, registers all functions with DuckDBsrc/crypto_hash.cpp: Core hash, HMAC, and random byte implementationssrc/query_farm_telemetry.cpp: Telemetry integration- Implements four DuckDB functions:
crypto_hash()- Scalar function for hashing various data typescrypto_hmac()- Scalar function for HMAC computationcrypto_hash_agg()- Aggregate function for hashing multiple rowscrypto_random_bytes()- Scalar function for generating random bytes
- Uses standard CMake
find_package(OpenSSL)to locate and link OpenSSL - Links against OpenSSL::SSL and OpenSSL::Crypto for both static and loadable extension variants
- No external code generation tools required
For debug
VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake GEN=ninja make debugBuilds:
./build/debug/duckdb- DuckDB shell with extension pre-loaded./build/debug/test/unittest- Test runner with extension linked./build/debug/extension/crypto/crypto.duckdb_extension- Loadable extension binary
For release builds:
VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake GEN=ninja make releaseBuilds:
./build/release/duckdb- DuckDB shell with extension pre-loaded./build/release/test/unittest- Test runner with extension linked./build/release/extension/crypto/crypto.duckdb_extension- Loadable extension binary
Note: Requires OpenSSL to be installed on your system.
make test_debugRuns SQL tests located in test/sql/*.test but uses the debug build.
./build/release/duckdbLaunches DuckDB with the extension already loaded.
Syntax: crypto_hash(algorithm, value) → BLOB
Computes a cryptographic hash of the input value. Supports multiple data types:
- Strings: VARCHAR, BLOB
- Integers: TINYINT, SMALLINT, INTEGER, BIGINT, HUGEINT, UTINYINT, USMALLINT, UINTEGER, UBIGINT, UHUGEINT
- Floating point: FLOAT, DOUBLE
- Other: BOOLEAN, DATE, TIME, TIMESTAMP, UUID
- Lists: Arrays of any supported fixed-length types (e.g.,
INTEGER[],VARCHAR[],BLOB[])- NULL elements inside lists are not supported
- Nested lists (lists of lists, lists of structs, etc.) are not supported
- For VARCHAR/BLOB lists, each element's length is hashed before its content to prevent length extension attacks
Syntax: crypto_hmac(algorithm, key, message) → BLOB
Computes an HMAC using the specified algorithm, key, and message. All algorithms supported except BLAKE3 requires exactly 32 bytes for the key.
Syntax: crypto_hash_agg(algorithm, value ORDER BY sort_expression) → BLOB
Aggregate function that computes a hash over multiple rows. ORDER BY is required to ensure deterministic results. Produces the same hash as crypto_hash() would for an equivalent ordered list. Returns NULL for empty result sets.
Syntax: crypto_random_bytes(length) → BLOB
Generates cryptographically secure random bytes using OpenSSL's RAND_bytes(). Length must be between 1 and 4,294,967,295 bytes (4GB - 1, the maximum BLOB size in DuckDB). This function is marked as VOLATILE so each call produces different random bytes.
The extension supports these algorithms (defined in src/crypto_hash.cpp:GetDigestMap()):
- blake3 - 32 bytes (separate vendored library, not from OpenSSL)
- blake2b-512 - 64 bytes
- keccak224, keccak256, keccak384, keccak512 - mapped to SHA3 variants
- md4, md5 - 16 bytes (deprecated, may not work on some systems)
- sha1 - 20 bytes
- sha2-224, sha2-256, sha2-384, sha2-512 - SHA-2 family
- sha3-224, sha3-256, sha3-384, sha3-512 - SHA-3 family
All functions (crypto_hash(), crypto_hmac(), and crypto_hash_agg()) support all these algorithms.
Note: Keccak is mapped to SHA3 in OpenSSL. True Keccak (pre-standardization) differs slightly from SHA3.
- Add the new algorithm to
GetDigestMap()insrc/crypto_hash.cpp - Map it to the appropriate OpenSSL EVP_MD function (e.g.,
{"sha2-512", []() { return EVP_sha512(); }}) - Add test vectors to
test/sql/crypto.testand/ortest/sql/crypto_hash.test - Update README.md with the new algorithm
- The
LookupAlgorithm()function insrc/crypto_extension.cpphandles algorithm lookup for the main hash function
- Update
CryptoScalarHashFun()insrc/crypto_extension.cppto handle the new type - Ensure the type's binary representation is hashable
- Add corresponding test cases in
test/sql/crypto.test - Update README.md to document the new type support
- For aggregate support, update
RegisterHashAggType()calls inLoadInternal()
- BLAKE3: Handled separately using the vendored BLAKE3 library (
blake3.h), not through OpenSSL - OpenSSL algorithms: Use the EVP API through
GetDigestMap()which returns a lambda that calls the appropriateEVP_*()function - Algorithm lookup:
LookupAlgorithm()insrc/crypto_extension.cppreturnsnullptrfor BLAKE3, or the EVP_MD pointer for OpenSSL algorithms
- Lists are hashed element-by-element in order
- For VARCHAR/BLOB elements: each element is hashed as
[8-byte length][content]to prevent length extension attacks - For fixed-length types: only the raw binary data is hashed
- List hashing code is in
HashListElementBlake3()andHashListElementEVP()helper functions
crypto_hash_agg()usesHashAggregateStateto maintain state across rows- Requires ORDER BY clause - enforced by checking for combining operations
- Produces identical output to
crypto_hash()on an equivalent ordered list - Supports same algorithms and types as scalar function
The C++ implementation throws DuckDB exceptions:
InvalidInputException: For invalid algorithm names, unsupported types, NULL list elements, invalid random byte lengthsInternalException: For OpenSSL operation failures (context creation, digest operations, random byte generation)
Exceptions are caught by DuckDB's executor and presented to the user.
The extension has comprehensive SQL-based tests:
test/sql/crypto.test- Main test suite covering all functions, algorithms, data types, and error casestest/sql/crypto_hash.test- Focused test suite for hash algorithms with known test vectors
Tests are run using:
make test_debug # Run tests with debug buildTest coverage includes:
- All hash algorithms with known test vectors
- All supported data types (integers, floats, booleans, dates, UUIDs, etc.)
- List hashing with different element types
- HMAC computation with all algorithms
- Aggregate hashing with ORDER BY requirements
- Random byte generation with various lengths
- Error cases (invalid algorithms, unsupported types, NULL list elements, etc.)
The repository uses the DuckDB extension template's CI system:
- Build configuration is in
extension_config.cmake - CI tools are in
extension-ci-tools/(git submodule) - The Makefile includes
extension-ci-tools/makefiles/duckdb_extension.Makefile