Skip to content

Latest commit

 

History

History
1865 lines (1208 loc) · 86.6 KB

File metadata and controls

1865 lines (1208 loc) · 86.6 KB

CHANGELOG

v1.1.0 (2025-11-25)

Bug Fixes

  • Missing S3_EXPORTS_PREFIX and S3_BACKUPS_PREFIX (93b8bb5)

Chores

  • Implement just command for migrating from a sqlite to a postgres catalog on ducklake (81e082f)

  • Normalize etl related commands, create tl only commands, and global etl, ingest, and tl commands (c7dca94)

  • deps: Bump up duckdb and switch dbt-duckdb to stable non-git version (2c6bb02)

Code Style

  • Remove new line (8b2a148)

  • Wrap sql script path in double quotes (0894362)

Documentation

  • Update file structure and dlctl commands to match new postgres catalog backups (0febe4a)

Features

  • Add env vars required to get lakehouse connectivity (9cf6bf5)

  • Migrate ducklake catalog from sqlite to postgres (c077595)

  • Migrate the dlctl backup from sqlite to postgres support (0fe2c6d)

Refactoring

  • Change engine_db variable scope (not used anywhere else) (2fad9de)

  • Extract sql script to a separate file (66e282c)

  • Reformat inline sql into fewer columns (5a81765)

  • Remove unused imports (acaccd0)

  • Rename env var for postgres root password (17b82a0)

v1.0.1 (2025-10-21)

Bug Fixes

  • Kuzu overwrite would fail when other files remained in the directory (186a774)

Chores

  • Add overwrite to graph load commands for idempotency (8de97c9)

  • Bump up datalab locked version to 1.0.0 (latest) (d58130b)

  • Just command to setup dev tools (345aecf)

  • Setup nbstripout so that notebooks are not committed with any output (3bdef36)

  • Strip outputs from existing notebooks (cf439e0)

v1.0.0 (2025-10-20)

Bug Fixes

  • Add default value to env vars on dbt models (eef59ff)

  • Add missing curl and jq dependencies (99ce192)

  • Add missing curl dependency (0d4609f)

  • Add missing env vars (a5e153d)

  • Add missing graph load command, add spacing and comment sections (f9150bc)

  • Add universe apt repo (ce5e18a)

  • Change to correct context path (4534a27)

  • Correct command to ml server (29dfbf7)

  • Destroy services was not switching to the appropriate docker context (3aee07e)

  • Enable prevent_destroy so we can tune cores and memory without risking destruction in the future (ae7947d)

  • Expired password by default and dhcp hostname broadcast (2e5bc08)

  • Force build to ensure image is rebuilt when required (d84cc77)

  • Incorrect create user script path (22e6e0f)

  • Indentation (c7f1b17)

  • Mark env var secrets as sensitive (c52f940)

  • Mark gitlab token as sensitive (3832bb1)

  • Missing add-apt-repository (2e2cf53)

  • Missing backslash (3ab832d)

  • Overcommit gitlab resources with 4 cores and 8 GiB without ballooning, for stability (48622fe)

  • Portainer would collide with minio when running locally (fcc0093)

  • Preinstall missing jq (4c80525)

  • Pushing large images would fail without checksum_disabled on the registry storage (eae3d6f)

  • Re-enable prevent_destroy (94a5d69)

  • Script executable permissions (17bfd3b)

  • Set env file to the ci project dir (2642e03)

  • Set options that reconfigure couldn't using gitlab-rails console (d4e88cd)

  • Should be secret_key, not a duplicate access_key (e10b44e)

  • Should be the node name, not the endpoint (09816d4)

  • Tfstate backups have a timestamp before the backup extension (74261fe)

  • Use env var instead of input, return empty string on null (f7ea5db)

  • Use env var to expose external kafka advertised listener (355c787)

  • Variables were not being expanded (5c97801)

  • Wrong env filename (0d26f03)

Chores

  • Add gitlab bucket to default buckets (f56252c)

  • Add mlflow bucket to default foundation layer minio buckets (63a6ab9)

  • Bump up mlflow from 3.2.0 to 3.4.0 (cd3760d)

  • Enable prevent_destroy for docker and gitlab vms (0b1e914)

  • Ignore terraform state files and tfvars (4e3bc78)

  • Increase cores and memory for gitlab, reducing memory for docker-apps (b65292e)

  • Increase the number of concurrent jobs (602b85a)

  • Minimal s3 config example (22bddc6)

  • Normalize tags to use layer number and name (ea5ef9d)

  • Postgres root password config (34bfb06)

  • deps: Add http provider (0f90970)

  • deps: Update uv.lock with the latest datalab version (5f3c233)

Continuous Integration

  • Add explicit stage to postgres template (840c4f8)

  • Add explicit stages block (b7ee333)

  • Add explicit stages to kafka and ollama templates (b56bd57)

  • Add missing provision stage (f2091c6)

  • Add missing stage (abe9df5)

  • Automated apps deployment (39facd0)

  • Change entry point to force trigger (9a86ada)

  • Change entry point to force trigger (093a8a9)

  • Create_db now returns a credentials.env file that expires in 15 min (ed66f69)

  • Deploy job only runs when services are changed (cc857df)

  • Docker compose stack update (6785af2)

  • Drop intermediary job (d3649c4)

  • Ensure both topics are created (6668b22)

  • Fix changes rule with matches to files (08c6d1d)

  • Fix syntax for manual triggering conditions (33d667b)

  • Implement kafka topic and topic consumer group creation (cd3f8ea)

  • Implement ollama model pull (2956c09)

  • Improve logging messages (30f9b31)

  • Kafka and ollama provisioner templates to use on external projects (e1acd62)

  • Manual trigger option for services and apps deployment (685ecda)

  • Missing variable name for update (cadf82c)

  • Posgres provisioning job template for including on other projects (2c3008e)

  • Postgres user and db creation, kafka initial setup (1cd1ec2)

  • Provision kafka topics and groups explicitly for the mlserver app (fdaeaa0)

  • Refactor so variables are on top (bfb8847)

  • Refactor with more general job names (c04c149)

  • Remove redundant when manual that was blocking the job (d4f7305)

  • Remove unrequired sleep (1bab0dc)

  • Replace before_script with custom ubuntu image (6271461)

  • Reproduce rules from apps deploy job (f40fde4)

  • Rollback to production topic and group names (2dc2e6b)

  • Soft fail when db exists (82415ba)

  • Stub for ollama jobs (17ee3de)

  • Switch back to inline scripts for postgres (59f7e81)

  • Testing docker access (92c35d1)

  • Testing file list (f6257bd)

  • Testing topic and group provisioning (92eba7a)

  • Trigger on changes to the deploy template (01e1048)

  • Upsert behavior for PSQL_SECRETS (efc00ba)

  • fix: Missing backslash (8627b5c)

  • fix: Remove -it from docker command (eadcdab)

  • fix: Single line command (024fc4a)

  • fix: Try again (eab2623)

  • refactor: Clarify log message (1e2a976)

  • refactor: Improve description (c2f5472)

  • refactor: Rename job to init consumer (bcaedb5)

Documentation

  • Update requirements and quick start (b1ef974)

  • Update with latest workflows and add missing documentation (ba48b98)

  • Add missing requirements. - Add missing ml/ and infra/ components. - Add missing shared/ modules.

    • Add PosgreSQL dotenv config. - Add missing dotenv configs: datalab, DuckLake, Kuzu, Ollama, MLflow, and Kafka. - Add docs for CLI test and docs commands. - Add docs for ML CLI commands. - Add just commands documentation.

Features

  • Add confirmation task, destroy services but never volumes (43fa345)

  • Add custom image for gitlab runner (91c86ae)

  • Add GITLAB_TOKEN to CI/CD variables (2602d5b)

  • Add missing docker configs to use the nvidia gpu (a005a68)

  • Add missing registry config and configure a remote docker gitlab runner (81784cd)

  • Add open webui and switch to plain http on portainer (4c23e64)

  • Add volume to open webui for persistence (0d4c883)

  • Basic Docker VM provisioning (untested) (1f2f7fd)

  • Basic gitlab-terraform project to handle CI/CD variables (1e04c04)

  • Basic Terraform project with stored state using S3 backend (d14b5cb)

  • Basic Terraform setup for provisioning an LXC running MinIO on Proxmox (60cb6b2)

  • Change to official open-webui image and preconfigure the ollama endpoint (fd56c92)

  • Configs now done via gitlab.rb directly, added container registry (1a79081)

  • Data lab infra config check tasks (b861d95)

  • Disable usage tracking and user creation, remove unused gitlab-env, and fix indentation (4c2333a)

  • Dockerized ml server (544427f)

  • Dotenv loading into gitlab ci/cd vars now working (1dd5691)

  • Env var configurable topics and consumer groups (87d7b55)

  • Extract MinIO environment variables from the install script into Terraform and produce random passwords (d6ab46e)

  • First gitlab working deployment, and docker and gitlab now split into separate tf files (09a7ca0)

  • Gpu passthrough for docker-shared VM (12eecef)

  • Improve error control and add credentials printing task (406f5b8)

  • Improve postgres workflow for credentials and db creation (ac7c573)

  • Optional NVIDIA driver install for Docker VMs (cloud-config now a template) (8a2abfb)

  • Overall task cleanup, add infra provisioning for services layer and destruction tasks (f88158c)

  • Postgres deployment (f8f7db2)

  • Preconfigure gitlab container registry as an insecure registry for all docker vms (5520d35)

  • Refactor docker-compose.yml into the services layer compose file, adding portainer and limiting minio to the dev profile (fcb43e0)

BREAKING CHANGE: There no longer is a docker-compose.yml, as it will be integrated into infra/services/docker/compose.yml with MinIO available only under the dev project.

  • S3 config variables (required by gitlab) (a544409)

  • Set a fixed port for MinIO's console (78741fe)

  • Simplify showing credentials and add validation (d0a7d36)

  • Simplify the way the custom docker context is accessed (57977f4)

  • Tcp listening for remote access (a0caf5a)

  • Update PSQL_SECRETS env var (0dbd79d)

Refactoring

  • Better defaults, less redundant title comments (6c3f42b)

  • Explicitly use true/false for masked and improve formatting (4e63668)

  • Extract scripts from templates (cdf5283)

  • Fix linting issues (01bdae2)

  • Make it clear that create database can safely fail (9fff4c1)

  • Move services docker files into its own directory (2ae6fee)

  • Normalize comment title formatting (9b255d3)

  • Remove explicit user (9d2118a)

  • Rename the container resource to minio (3c41783)

  • Rename to gitlab (c2c383c)

  • Rollback to inline scripts, move templates to root of dotci (f0d12b8)

  • Split docker vm passwords into multiple outputs again (23b4cbe)

  • Split script into multiple lines for psql_create_db (10444ff)

  • Switch to single rootfs volume and improve resource naming (f773714)

v0.7.0 (2025-08-28)

Bug Fixes

  • Lakehouse is now a singleton, to avoid initialization when running the help command (ca5a7ea)

  • Normalize loggers to use loguru via an intercept handler (d18f572)

  • Shift should be drift, and count plot should be stacked (62aefb2)

Chores

  • Add a default task that lists all just tasks (897c520)

  • Add missing help message and fix the one for ml monitor plot (7b17e14)

Features

  • Improve performance of REST API by moving Kafka payload queueing to the background (2fe859d)

Refactoring

  • Drop unused matplotlib imports (48b9e31)

  • Remove unused imports (f1f8a1f)

v0.6.0 (2025-08-25)

Bug Fixes

  • Attempt to solve group coordinator errors (25e9cf1)

  • Capture asyncio cancel exception (5f5f07f)

  • Consumer task was meant to be awaited from inside the loop (0bede44)

  • Correct model uri scheme (a6c589b)

  • Dataframe was being forced through the model loaded using mlflow.pyfunc.load, so now we handle multiple input types (4d9541e)

  • Handle failed runs and drop unrequired columns from logged inputs (29f9259)

  • Kafka now runs and initializes properly (b94dfc4)

  • Mlflow healthcheck, switch to kafka's official image (cea3edf)

  • Model needs to be initialized every time, otherwise there is a memory leak (e53c85f)

  • Move mlflow.db to root since db directory didn't exist (3fed7f1)

  • Ollama will now default to CPU when GPU is not available (a13fd72)

This will, most likely, make it unusable, but at least it won't stop the other services from starting and working as expected.

  • Positive label probability selection (00d738e)

  • Queue logic incompatible with list logic, always flush in the end (9d59f90)

  • Requests cache was causing memory overload (b734e08)

  • Schema name, remove unused tasks (e1944fa)

  • Train/test split now separate from cross-validation (train only) (11b448e)

  • Transform failed when other datasets were not ingested (029e8c4)

  • Update to new lakehouse schema (1240dc9)

  • Update to new ml types and lakehouse schema (063d28b)

Chores

  • Add a second topic for updating inference results with user feedback (6abb221)

  • Add config for new stage catalog with secure storage (1009792)

  • Add config for pairs of topic and expected consumer group (6224c6c)

  • Add config for stage catalog with secure storage (7d3fbb9)

  • Add kafka config section (175474f)

  • Add name to each asyncio task (dad0fbe)

  • Create justfile with tasks from previous and upcoming videos (9289e70)

  • Delete unused test module (04964ef)

  • Reduce sample fraction (a9ab9a7)

  • Rename insert/update to result/feedback to match new event topics (4ae3ef1)

  • Setup mlflow service with sqlite and s3 (3a0f1ca)

  • deps: Add anyascii and inflection for a more robust sanitization, add just for task running, add xgboost for ML project (17cbf48)

  • deps: Add faker to create random dates (0814e54)

  • deps: Add fastapi and uvicorn for the ml server (5907a38)

  • deps: Add joblib to use Memory for caching (1676db1)

  • deps: Add kafka official library (2de36e8)

  • deps: Add mlflow (5d0b94a)

  • deps: Add pip so its version is properly detected by mlflow during model logging (63edc5a)

  • deps: Add scikit-learn (204411b)

  • deps: Add sentence transformers for text embedding (0a852d8)

  • deps: Downgrade from 3.0.3 to 3.0.2 due to mlflow compatibility (8e2d1bd)

  • deps: Replace confluent-kafka with aiokafka (a052aa8)

Features

  • Add 3-folds (9c58bfb)

  • Add create_at timestamp that defaults to the current date (6badaf2)

  • Add custom MLflow user for tracking (4818df8)

  • Add model logging to mlflow_end_run (54027f2)

  • Add monitor compute task (36d2b91)

  • Add monitor dataset to mlops etl pipeline (c227f62)

  • Add monitor plot task (27bd6bb)

  • Add reload option to use during development (5e175ad)

  • Add sample fraction parameter (813f031)

  • Add train and test tasks, update ETL task with transformation (54e1adb)

  • Apache Kafka server (b2acb5d)

  • Basic training pipelines and CLI command (e27cc5d)

  • Check for curl and add -f to ensure the task fails when status code is >= 400 (b02c5a1)

  • Default to 3-folds, since it is now supported (daf3db9)

  • Drop tasks for mlflow model server (images are too bloated) (c20ec10)

  • Enable artifacts proxy and install boto3 as a dependency (8e844a6)

  • End-to-end kafka producer/consumer implementation (c1fbbc6)

  • Endpoint to flush inference log, refactor inference request to handle A/B/n testing (8ba2a7e)

  • Feature pipelines for TF-IDF and sentence transformers (70c4182)

  • Feedback is now an array and created_at keeps track of time (19d0527)

  • Generic ml dataset loader function (f364327)

  • Health check endpoint, and refactor insert/update to results/feedback for clarity (d00f612)

  • Implement dataset transformation with train/test split and 5-folds and 10-folds (3e552e0)

  • Implement scaffolding for monitoring statistics computation class and count stat (6d25335)

  • Inference API request (ccbc242)

  • Inference feedback update workflow (9127a92)

  • Load ml inference results for a date range (d6df078)

  • Load monitor stats (5969151)

  • Ml monitor is now ml monitor compute and since/until are unspecificed by default (5e66a95)

  • Ml monitor plot command (9dc514c)

  • Ml server start command (3261c06)

  • Mlflow docker image building, container running and server testing tasks (054f499)

  • Mlflow tracking URI env var (5511986)

  • Mlops train per method and features (added), split everything into individual tasks (1e897f3)

  • Monitoring metrics for prediction and feature drift, estimated performance, and user evaluation (72286b6)

  • Now always the latest model is used to build the docker image (6fbb824)

  • Output is now probabilities, so threshold can be set externally (18d814a)

  • Payloads is now types and inference request contains one or multiple models (d7b31b7)

  • Plotting for model monitoring statistics (f14d51e)

  • Query latest snapshot_id (version) (779aa61)

  • Replace slugify with custom sanitization alternative (cf437c3)

  • Rolling prediction drift computation (29c439a)

  • Safer attach (if not exists only) (0ebf16d)

  • Scaffolding for inference simulation and monitoring (5d78f5e)

  • Scaffolding for ML CLI and workflow functions (647559f)

  • Scaffolding for ml server (3f3d05d)

  • Schema and count functions, remove redundant initialization code (same as generate_init_sql) (b816c9b)

  • Separate tracking from training logic, and use PandasDataset instead of custom dataset (80b895a)

  • Start consumer thread with ml server (7523cfb)

  • Support for 3-fold dataset loading and CV training (106a6ef)

  • Support for catalogs with secure storage (e96ce72)

  • Switch to expression api and make since/until optional, and implement loading for monitor stats (368d039)

  • Task to generate init SQL, also used on check fail, rename mlflow model server tasks, add mlops-serve task (a79c379)

  • Task to run inference simulation using monitor dataset (fa48f2c)

  • Tasks to test inference and logging requests (cc1aa31)

  • Time tracking logging utils (d59abe4)

  • Train and test set loaders for document datasets (9bbd7c9)

  • Transformation for monitor depression dataset (5fb5d78)

  • Update to new ml types, fix lakehouse collision with transform, and improve API to make flush usable externally (68ef918)

  • Working inference simulation (6a3fcab)

Refactoring

  • Cleaner variable names and paths (1d52213)

  • Correct and improve log messages (c8fc5ca)

  • Easier naming for MLOps tasks (6ac1853)

  • Extract color functions into a shared module (6a859ef)

  • Extract prediction from server to its own module using joblib Memory for cache (60467b7)

  • Lakehouse logging is now the default (de16c77)

  • Move server health check to server module (c63cd22)

  • Normalize method names and group by context (c01be6c)

  • Normalize ml dataset column names and target type (bee2af9)

  • Remove redundant case for computing the S3 prefix (b836af2)

  • Remove unneeded echo (5e6dbb9)

  • Rename all init containers with init suffix (99e3939)

  • Rename n_folds to k_folds (b3ac002)

  • Set random model section log level to debug (228b805)

v0.5.0 (2025-08-05)

Bug Fixes

  • Any positive ESI is now considered competition, and is separate from intensity (25844f1)

  • Log file relative path to cwd failed when not directly contained using Path (e4f5b62)

Chores

  • Commit notebook generated during video recording (454d0dd)

  • deps: Add adjustText to optionally fix rendering of overlapping node labels (36cbc33)

  • deps: Add geopandas to plot maps (62d5ef1)

  • deps: Add jupyterlab, matplotlib, and networkx for graph data science (e29c08f)

  • deps: Remove unneeded adjustText and add scipy back as a requirement for networkx layout computation (76ef5d4)

Features

  • Add CLI support for computing the CON score (8c94f6e)

  • Add edge arrows and node colors per label (ed56184)

  • Add graph analytics module, starting with a CON score (ff1f926)

  • Add graph transparency and improve labels (02dc859)

  • Add scale to arrow placement, add optional visualization weight (9190d2c)

  • Compare communities and components, study economical pressure (afceea8)

  • Competiton network analysis, including community and weak component analysis (62e54fd)

  • Create a basic graph theme matching DLT (3210fa5)

  • Dominating and weaker economy individual analysis (986a2d6)

  • Edge direction now based on common exports, from highest to lowest total amound (77325bd)

  • Improve graph plotting and add map plotting (266dfca)

  • Networkx graph plot helper to use with notebooks (a36b6c9)

  • Revisted the whole notebook, restructuring and adding depth where needed (6a3dcb1)

  • Script to easily convert Jupyter Notebooks to markdown (4b0c792)

  • Set label w/ prop per node type and render label wo/ overlapping (8c0b6fb)

  • Setup notebook for graph data science (1d96e63)

  • Support for loading Parquet into DuckLake from Python (4035f63)

  • Trade alignment analysis (80d5ef1)

  • Trade alignment analysis (cont) (da6e848)

Refactoring

  • Different score reset strategy (d4d7d9d)

  • No longer setting flags for dominating and weaker (d8013c4)

  • Remove unused import (65defb1)

  • Replace os.path ops with Path ops (84c73a9)

  • Use kuzu extension instead of kz (d815cef)

  • Use ref instead of hardcoded FQN (ba6de1a)

v0.4.0 (2025-07-16)

Bug Fixes

  • Add missing schema configs for new econ comp models (c4daafb)

  • Edges needed to be defined based on node_id, which required these changes (398ba70)

  • Remove inexistent property (918f23a)

  • Remove not null tests where they were not required (43efc61)

  • Remove product parent relationship, as there is no multi-level data here (2d26651)

  • Remove repeated country pairs in reverse order (1f2f867)

  • Required aggregation per country and product, disregarding partner (635dc72)

  • Types and missing null strings (40a79d7)

Chores

  • Add cypher script to compute music_taste graph stats (7a0a48d)

  • Add env var for econ comp graph db (3e34e80)

  • Configs for analytics mart (40dee56)

  • Re-enable requests-cache with streaming (62c7dff)

  • Rename KuzuDBs to match new single-file format (0e797ae)

  • Simplify music taste graph stats script (5b964fb)

  • Upgrade explorer script to work with kuzu 0.11.0 (36f6cf7)

  • deps: Add humanize to print byte sizes in human-readable format (6238484)

  • deps: Add requests cache dep (b7c5fd5)

  • deps: Add tqdm dep for tracking download progress (5e2ba51)

  • deps: Bump up kuzu to 0.11.0 (74f2f4f)

  • deps: Bump up version inside uv.lock (7124ff4)

Documentation

  • Fill-in the missing schema models for analytics, and econ_comp nodes and edges (aa65fcd)

Features

  • Add model selection CLI option to test cmd (499bac0)

  • Aggregated view for 2020-2023 trade covering recent years (c579742)

  • Cli command to expunge/clean cache (f412b51)

  • Complete dataset template for The Atlas of Economic Complexity (6e2cb9c)

  • Country and product nodes, product-country export and import edges, and product parent edges (cca6d5c)

  • Country-country ESI calculation (0ca0346)

  • Datacite working downloader (bf09fb1)

  • Ingest country classification data (09c3ac7)

  • Logic changed to account for the last 3 years in data instead of a fixed range (8599498)

  • Move cache to shared level and add expunge function and requests cache (805511f)

  • Rename 2020-2023 to latest 3y and add schema for country-country metrics (af044f8)

  • Select top 5% ESI country-country relations for edges (3356e4f)

  • Skip cache for downloads and display progress bar (039e08a)

  • Split ingestion into multiple modules and add dataset templates (8e3c6b8)

  • Stage transformations for TAoEC (6e082e3)

  • Support for cache usage statistic printing (436391b)

  • Support for loading econ_comp graph (93396df)

Performance Improvements

  • Increase chunk size and make sure temp files are cleaned even when the script is stopped (39943df)

Refactoring

  • Log debug message containing produced context (5917a15)

  • Rename context to entities when referring to entity nodes (ff6e0df)

Testing

  • Ensure ESI is within a 0..1 range (d1ef5ce)

v0.3.0 (2025-07-08)

Bug Fixes

  • Add error control to the GraphRAG chain (4f015ca)

Chores

  • deps: Add colorama to color error messages (389a8a1)

Features

  • Graph rag CLI options for interactive and direct querying (8f54d81)

Refactoring

v0.2.0 (2025-07-04)

Bug Fixes

  • Correct logic for deleting vector index if exists (516b677)

Chores

  • Add missing word in prompt (2001d8d)

  • Container names will now use the default naming schema (6d267b8)

  • Ensure predictable table indexing order (4547bd3)

  • Graph retriever and context assembler class scaffolds (eae806d)

  • Make sure kuzudb-explorer is using a fixed image version (0.10.0 currently) (80c8aca)

  • Path combination and scaffolding for hydrating (1c7db62)

  • Prefix log message is now debug-level (de7d708)

  • Print version from pyproject.toml via CLI argument (2fa5b86)

  • Remove unused semantic-release config (1692e14)

This option was set in the wrong location, so it did nothing. We don't need it.

  • Replace default nomic-embed-text ollama model with phi4:latest (ee324f1)

  • Setup ollama service and add env var for default model install (4af078b)

  • deps: Add ollama dependency (4d1608d)

  • deps: Add pytest to dev deps and configure default CLI options (baabcd5)

  • deps: Langchain with ollama support, and a prompt helper library (4565ec9)

  • deps: Langchain-kuzu (eed603d)

  • deps: More-itertools (ecb7f9c)

Continuous Integration

  • Add missing version to semantic-version command (c6facd1)

  • Fix call to semantic release using a function (d577a45)

  • Fix changelog_file config location (b5bb8d7)

  • Fix pyproject.toml version setting for semantic release (db96d22)

  • Remove redundant build option, already set on pyproject.toml (e8f6d6b)

Documentation

  • Add knn method info to clarify the max_distance param (0fdf01f)

Features

  • Add file logging by default (and option to disable) (2f9a36e)

  • Add final answer pipeline and improve interactive mode (58bff5a)

  • Basic prompt for graph RAG and langchain scaffolding (50173de)

  • Combined knn step for context assembler (33b20ab)

  • Context assembly based on ANN, paths to neighbors, and random walks from neighbors (9323352)

  • Cypher friendly schema format (87f8171)

  • First working NER implementation based on langchain-kuzu (a743062)

  • Graphrag is now a LangChain Runnable and components became methods (cd04d33)

  • Knn query support (2bca4a0)

  • Knn, shortest paths sampler and random walk computation for context assembler (22d4f0a)

  • Kuzudb-explorer launcher script now handles different paths (4dc65a9)

  • Lazy singleton S3 resource and bucket connection (63388a1)

  • Ollama service with gemma3 and nomic-embed-text (83b68dd)

  • Path hydration and bulk description (97ea465)

  • Return paths as interleavings of node_id and rel label (17b790a)

  • Support for indexing embeddings (c687f81)

  • graph.ops: Automatically add a custom embeddings column to all node tables (1900f21)

Closes #2

  • graph.ops: Produce node schema with properties names and types (291d42f)

Performance Improvements

  • Migrated from KuzuQAChain to a custom strategy still based on langchain-kuzu (ebce585)

Refactoring

  • Change property match to WHERE cond and lower the temperature (f0f9198)

Testing

  • Correct paths_df fixture and add missing exclude_props (c167b0c)

  • Invoke test for GraphRAG runnable (f724224)

  • Move graph db check to global fixtures (d2963e3)

  • Print final chain output (40f2d14)

  • Setup ops and paths_df to test path_descriptions() (3f3c160)

  • Tests will only print logs to stderr and always use debug level (fafb3bf)

v0.1.0 (2025-06-25)

Bug Fixes

  • Add node_id to all nodes (f927dcd)

  • Batch should be column, not parameters (73eeb9e)

  • Condition for ignoring files during deletion (9da0e0f)

The manifest.json was being deleted by mistake.

  • Correct name for placeholder models (fa07609)

feat: implement all missing edge models

  • Ducklake integration using dev version for upcoming dbt-duckdb 1.4.1 (effe0d7)

  • Duplicate alias for source_id and target_id columns (d6b6790)

  • Ensure tags are checked out (d833c89)

  • Generate sequential node ID globally for all nodes (8d019ac)

  • Genre loading queries (abc6833)

refactor: reorganize models into stage and marts

feat: support for edge loading (untested)

  • Genre nodes become a single table to ensure uniqueness (7ec7f03)

  • Incorrect S3 secret variable (6fa7394)

  • Missing description for playcount (c505c9c)

  • Missing node ID dataset-based prefix (bfecf9f)

  • Missing nodes prefix on ref table (58b04f5)

  • Missing underscore after prefix (3a0d6d9)

  • No longer defaulting to upstream dependencies (6d2d68a)

  • Regression introduced by removing key_parts (668a31c)

  • Removed extra bracket in log message (782bcc9)

  • Should be alias, not name (d03e079)

  • Should be list of list, not list of tuple (c3b7419)

  • Sqlite prefix missing (78bae87)

  • Switch to single table for genre nodes (6d5dd1f)

  • Update graph loading process based on new config schema (eebc677)

  • Update prune to use class prefix (49ca20f)

  • Using map instead of list per node embedding (e6f1caf)

fix: add missing schema alter to add embedding property to all nodes

  • Wrapper to copy from data mart via a temporary file (3cd0268)

  • Wrong column name in schema (af2a693)

  • Wrong filename case, should be RO, not ro (905a303)

  • Wrong model name in schema (588f3bc)

  • Wrong reference, missing schema prefix (701fb1d)

  • Wrong variable order in log message (40cb055)

Chores

  • Add description and pandas dep (b7c40d0)

  • Add DUCKLAKE_PATH to .env (3ab2bee)

  • Add kuzu as a dependency (f1e2a5c)

  • Add S3 prefix for exports (88ee16c)

  • Add solid background to diagram (7499f46)

  • Add torch, torch-sparse, and torch-geometric deps (5ddd0f8)

  • Better schema name organization for graphs (948759a)

  • Click and minio deps (c6e450f)

  • Default to eu-west-1, as MinIO also defaulted to it (6afa282)

  • Delete example models (7012b5b)

  • Fix version for python-semantic-release to match deps (3ff93d9)

  • Github dark mode background color (ca97d44)

  • Gitignore vscode directory (a2fcabd)

  • Initial log message for export (1207e93)

  • Initial log string is now a welcome string (0edefc6)

  • Make sure we start from 0.1.0, not 1.0.0 (cb2b7c5)

  • Remove unused dep (ba09c90)

  • Remove unused deps and update docs referring to them (7da0d24)

  • Replace with official GHA for python-semantic-release (7301029)

  • Script to launch temporary docker container with KuzuDB Explorer for a database (52b617a)

  • Setup dltctl CLI tool (replaces Makefile) (405d800)

  • Simplify node and edge schemas, using Gremlin-like notation (887dddc)

  • Solid background in individual rectangles (8f68204)

  • Switch to a multi-database marts config (1cf615c)

  • Temporarily removed (bd26438)

Schema was outdated and was blocking dbt run.

  • Update config to match multi-database marts (d47f96f)

  • Won't use the extra command in favor of one entry point (698a3d1)

  • deps: Move python-semantic-release to dev deps (9df3ed6)

Documentation

  • Add graph and shared (866b37c)

  • Add specification for exports pruning (e42acbf)

  • Dependency management development instructions (2f9393f)

  • Duckdb init script description (94b1959)

  • End-to-end documentation (190ef1b)

  • Fix section links (e00a537)

  • Latest.json is now manifest.json (6cdb057)

  • Remove suffix from info boxes (170ebff)

  • Requirements, quick start, architecture diagram (8009821)

  • Schemas for nodes and edges of the music graph (409b4cb)

  • Structured sections for README (5a6c128)

  • Update schema for the DSN and MSDSL datasets (9a56aaa)

  • Update storage layout (f60d021)

  • Update storage layout (1383bfd)

  • Update storage layout and specification for the ingest command (b66cffe)

  • Using generic dark background (85476f1)

Features

  • Add CLI args for read only and to reset (dbfefa8)

Container is now kept between sessions, unless explicitly reset.

  • Backup restore can now specify source date (62a52af)

  • Basic support for dbt run via dlctl transform (b3f6240)

  • Catalog backup and restore (cb87830)

refactor: prefix is now set when instancing Storage

feat: Storage can now upload/download files or a directory

  • Create directories for DuckDB databases (3976b19)

This way we can set the marts databases to be stored under local/marts/.

  • Dbt debug option (f64313e)

  • Debug option controlling log level (9e5f030)

  • Directory structure for a DuckLake lakehouse (87ed579)

  • Dlctl tools generate-init-sql (c24ca39)

This will output into local/init.sql. The scripts/init.example.sql or the gitignored scripts/init.sql are no longer used.

docs: add a help message to all commands

  • Duckdb CLI init script to connect to the lakehouse (788691f)

  • Error control for empty results (d5e1d65)

  • Exception capture for KuzuOps (6a2a498)

  • Export scripts that output parquet (218bfeb)

  • Frp node embedding over KuzuDB (68a7514)

  • Improve backups listing (8c5284d)

  • Load all genres tables based on shared macro (ed8197b)

  • Ls and prune for ingest and exports (e9a3505)

fix: add missing manifest to exports

fix: ignored file filtering

fix: add prefix logic to upload manifest

  • Minio docker service (00f2dab)

  • Node embedding computation and graph DB update (579b477)

  • Option to use latest export when loading a graph (3b00439)

refactor: exports now stored using the same directory structure as marts

  • Qol for CLI parameters and defaults, and logging (c6ee19b)

  • Quality of life for explorer startup and exit (4562ab6)

  • Replace export scripts with a load method from the new graph package (8c356d6)

  • Replace load_dotenv with proper validation via environs (aacb1a3)

refactor: centralize storage and environment variable loading into shared packages

refactor: improve function naming and arguments

feat: set placeholder upload as optional

refactor: rename lastest.json to manifest.json

feat: storage now implements an env var loader with latest file paths

  • Schema name without the 'main_' prefix (37b0bff)

  • Setup semantic releases (ef856fd)

  • Strip schema name from table name (fc83dfa)

  • Stubs for node computation embedding command (f220fc2)

  • Support file downloading from object storage (eff8a8b)

  • Support for kaggle and hugging face ingestion (39f8fb5)

  • Support for manual dataset ingestion (3b6c3d1)

  • Support for running a subset of models during transform (4986aee)

Performance Improvements

  • Switch to UNPIVOT strategy (51059e2)

Refactoring

  • Better naming scheme for graph schemas, and node and edge tables (1716d17)

  • Cleanup file to avoid inline comments (498534d)

  • Con is now conn (cc99988)

  • Embedding batch updates now handled directly by NodeEmbedder (5dc6e51)

  • Genres/nodes and edges are now stored in the graphs mart (9963909)

docs: schemas updated with node and edge information and basic testing

  • Graph manager is now ops (c84dbf5)

  • Improved docs and better naming for DuckLake DBs (0b8097d)

  • Latest export is now default, but re-exporting can be forced (54324c3)

  • Log exception message without stack trace (1d52f40)

  • Name source and target columns (b0c83fd)

feat: cast node IDs to integer

  • Nodes and edges directories to match graph DB loading format (86ef29e)

feat: million song dataset, spotify and lastfm transformation

feat: improve deezer genres and edges mart table schemas

  • Overall simplification of the explore graph script (94b0bb0)

  • Qol, log message in lower case after colon (76cccc8)

  • Qol, log message now includes epoch (51c96cd)

  • Remove source from edges (ca15947)

  • Remove uneeded echos (a5f86a0)

  • Rename models to include a schema prefix (dd263ab)

feat: implement missing node models

  • Rename music graph back to music_taste (d0d7a57)

  • S3 access key and secret renamed to reflect common naming schema (d00a4c9)

  • Table materialization is now default (35dc856)

  • Taking advantage of parents accessor (1038678)

  • Tools and utils moved to shared (3900df1)

feat: init SQL can now be returned as a string

fix: lakehouse relied on an init script that's no longer there

Using generate_init_sql to produce a string with the required SQL instead.

chore: uncommended code that didn't run due to KuzuDB bug

  • Util is now templates for clarity (7518c06)

chore: groups no longer invokable without arguments

This had been added for better performance, but did nothing.

refactor: split export into standalone feature

Extracted from graph load and integrated into the existing export command (renamed from exports to export).

Testing

  • Column only contains positive integers (c4459fb)

  • List/array not null or empty (3ff573f)

  • Make sure node IDs are globally unique (5d9a1ff)