- Missing S3_EXPORTS_PREFIX and S3_BACKUPS_PREFIX
(
93b8bb5)
-
Implement just command for migrating from a sqlite to a postgres catalog on ducklake (
81e082f) -
Normalize etl related commands, create tl only commands, and global etl, ingest, and tl commands (
c7dca94) -
deps: Bump up duckdb and switch dbt-duckdb to stable non-git version (
2c6bb02)
- Update file structure and dlctl commands to match new postgres catalog backups
(
0febe4a)
-
Add env vars required to get lakehouse connectivity (
9cf6bf5) -
Migrate ducklake catalog from sqlite to postgres (
c077595) -
Migrate the dlctl backup from sqlite to postgres support (
0fe2c6d)
-
Change engine_db variable scope (not used anywhere else) (
2fad9de) -
Extract sql script to a separate file (
66e282c) -
Reformat inline sql into fewer columns (
5a81765) -
Remove unused imports (
acaccd0) -
Rename env var for postgres root password (
17b82a0)
- Kuzu overwrite would fail when other files remained in the directory
(
186a774)
-
Add overwrite to graph load commands for idempotency (
8de97c9) -
Bump up datalab locked version to 1.0.0 (latest) (
d58130b) -
Just command to setup dev tools (
345aecf) -
Setup nbstripout so that notebooks are not committed with any output (
3bdef36) -
Strip outputs from existing notebooks (
cf439e0)
-
Add default value to env vars on dbt models (
eef59ff) -
Add missing curl and jq dependencies (
99ce192) -
Add missing curl dependency (
0d4609f) -
Add missing env vars (
a5e153d) -
Add missing graph load command, add spacing and comment sections (
f9150bc) -
Add universe apt repo (
ce5e18a) -
Change to correct context path (
4534a27) -
Correct command to ml server (
29dfbf7) -
Destroy services was not switching to the appropriate docker context (
3aee07e) -
Enable prevent_destroy so we can tune cores and memory without risking destruction in the future (
ae7947d) -
Expired password by default and dhcp hostname broadcast (
2e5bc08) -
Force build to ensure image is rebuilt when required (
d84cc77) -
Incorrect create user script path (
22e6e0f) -
Indentation (
c7f1b17) -
Mark env var secrets as sensitive (
c52f940) -
Mark gitlab token as sensitive (
3832bb1) -
Missing add-apt-repository (
2e2cf53) -
Missing backslash (
3ab832d) -
Overcommit gitlab resources with 4 cores and 8 GiB without ballooning, for stability (
48622fe) -
Portainer would collide with minio when running locally (
fcc0093) -
Preinstall missing jq (
4c80525) -
Pushing large images would fail without checksum_disabled on the registry storage (
eae3d6f) -
Re-enable prevent_destroy (
94a5d69) -
Script executable permissions (
17bfd3b) -
Set env file to the ci project dir (
2642e03) -
Set options that reconfigure couldn't using gitlab-rails console (
d4e88cd) -
Should be secret_key, not a duplicate access_key (
e10b44e) -
Should be the node name, not the endpoint (
09816d4) -
Tfstate backups have a timestamp before the backup extension (
74261fe) -
Use env var instead of input, return empty string on null (
f7ea5db) -
Use env var to expose external kafka advertised listener (
355c787) -
Variables were not being expanded (
5c97801) -
Wrong env filename (
0d26f03)
-
Add gitlab bucket to default buckets (
f56252c) -
Add mlflow bucket to default foundation layer minio buckets (
63a6ab9) -
Bump up mlflow from 3.2.0 to 3.4.0 (
cd3760d) -
Enable prevent_destroy for docker and gitlab vms (
0b1e914) -
Ignore terraform state files and tfvars (
4e3bc78) -
Increase cores and memory for gitlab, reducing memory for docker-apps (
b65292e) -
Increase the number of concurrent jobs (
602b85a) -
Minimal s3 config example (
22bddc6) -
Normalize tags to use layer number and name (
ea5ef9d) -
Postgres root password config (
34bfb06) -
deps: Add http provider (
0f90970) -
deps: Update uv.lock with the latest datalab version (
5f3c233)
-
Add explicit stage to postgres template (
840c4f8) -
Add explicit stages block (
b7ee333) -
Add explicit stages to kafka and ollama templates (
b56bd57) -
Add missing provision stage (
f2091c6) -
Add missing stage (
abe9df5) -
Automated apps deployment (
39facd0) -
Change entry point to force trigger (
9a86ada) -
Change entry point to force trigger (
093a8a9) -
Create_db now returns a credentials.env file that expires in 15 min (
ed66f69) -
Deploy job only runs when services are changed (
cc857df) -
Docker compose stack update (
6785af2) -
Drop intermediary job (
d3649c4) -
Ensure both topics are created (
6668b22) -
Fix changes rule with matches to files (
08c6d1d) -
Fix syntax for manual triggering conditions (
33d667b) -
Implement kafka topic and topic consumer group creation (
cd3f8ea) -
Implement ollama model pull (
2956c09) -
Improve logging messages (
30f9b31) -
Kafka and ollama provisioner templates to use on external projects (
e1acd62) -
Manual trigger option for services and apps deployment (
685ecda) -
Missing variable name for update (
cadf82c) -
Posgres provisioning job template for including on other projects (
2c3008e) -
Postgres user and db creation, kafka initial setup (
1cd1ec2) -
Provision kafka topics and groups explicitly for the mlserver app (
fdaeaa0) -
Refactor so variables are on top (
bfb8847) -
Refactor with more general job names (
c04c149) -
Remove redundant when manual that was blocking the job (
d4f7305) -
Remove unrequired sleep (
1bab0dc) -
Replace before_script with custom ubuntu image (
6271461) -
Reproduce rules from apps deploy job (
f40fde4) -
Rollback to production topic and group names (
2dc2e6b) -
Soft fail when db exists (
82415ba) -
Stub for ollama jobs (
17ee3de) -
Switch back to inline scripts for postgres (
59f7e81) -
Testing docker access (
92c35d1) -
Testing file list (
f6257bd) -
Testing topic and group provisioning (
92eba7a) -
Trigger on changes to the deploy template (
01e1048) -
Upsert behavior for PSQL_SECRETS (
efc00ba) -
fix: Missing backslash (
8627b5c) -
fix: Remove -it from docker command (
eadcdab) -
fix: Single line command (
024fc4a) -
fix: Try again (
eab2623) -
refactor: Clarify log message (
1e2a976) -
refactor: Improve description (
c2f5472) -
refactor: Rename job to init consumer (
bcaedb5)
-
Update requirements and quick start (
b1ef974) -
Update with latest workflows and add missing documentation (
ba48b98) -
Add missing requirements. - Add missing ml/ and infra/ components. - Add missing shared/ modules.
- Add PosgreSQL dotenv config. - Add missing dotenv configs: datalab, DuckLake, Kuzu, Ollama, MLflow, and Kafka. - Add docs for CLI test and docs commands. - Add docs for ML CLI commands. - Add just commands documentation.
-
Add confirmation task, destroy services but never volumes (
43fa345) -
Add custom image for gitlab runner (
91c86ae) -
Add GITLAB_TOKEN to CI/CD variables (
2602d5b) -
Add missing docker configs to use the nvidia gpu (
a005a68) -
Add missing registry config and configure a remote docker gitlab runner (
81784cd) -
Add open webui and switch to plain http on portainer (
4c23e64) -
Add volume to open webui for persistence (
0d4c883) -
Basic Docker VM provisioning (untested) (
1f2f7fd) -
Basic gitlab-terraform project to handle CI/CD variables (
1e04c04) -
Basic Terraform project with stored state using S3 backend (
d14b5cb) -
Basic Terraform setup for provisioning an LXC running MinIO on Proxmox (
60cb6b2) -
Change to official open-webui image and preconfigure the ollama endpoint (
fd56c92) -
Configs now done via gitlab.rb directly, added container registry (
1a79081) -
Data lab infra config check tasks (
b861d95) -
Disable usage tracking and user creation, remove unused gitlab-env, and fix indentation (
4c2333a) -
Dockerized ml server (
544427f) -
Dotenv loading into gitlab ci/cd vars now working (
1dd5691) -
Env var configurable topics and consumer groups (
87d7b55) -
Extract MinIO environment variables from the install script into Terraform and produce random passwords (
d6ab46e) -
First gitlab working deployment, and docker and gitlab now split into separate tf files (
09a7ca0) -
Gpu passthrough for docker-shared VM (
12eecef) -
Improve error control and add credentials printing task (
406f5b8) -
Improve postgres workflow for credentials and db creation (
ac7c573) -
Optional NVIDIA driver install for Docker VMs (cloud-config now a template) (
8a2abfb) -
Overall task cleanup, add infra provisioning for services layer and destruction tasks (
f88158c) -
Postgres deployment (
f8f7db2) -
Preconfigure gitlab container registry as an insecure registry for all docker vms (
5520d35) -
Refactor docker-compose.yml into the services layer compose file, adding portainer and limiting minio to the dev profile (
fcb43e0)
BREAKING CHANGE: There no longer is a docker-compose.yml, as it will be integrated into infra/services/docker/compose.yml with MinIO available only under the dev project.
-
S3 config variables (required by gitlab) (
a544409) -
Set a fixed port for MinIO's console (
78741fe) -
Simplify showing credentials and add validation (
d0a7d36) -
Simplify the way the custom docker context is accessed (
57977f4) -
Tcp listening for remote access (
a0caf5a) -
Update PSQL_SECRETS env var (
0dbd79d)
-
Better defaults, less redundant title comments (
6c3f42b) -
Explicitly use true/false for masked and improve formatting (
4e63668) -
Extract scripts from templates (
cdf5283) -
Fix linting issues (
01bdae2) -
Make it clear that create database can safely fail (
9fff4c1) -
Move services docker files into its own directory (
2ae6fee) -
Normalize comment title formatting (
9b255d3) -
Remove explicit user (
9d2118a) -
Rename the container resource to minio (
3c41783) -
Rename to gitlab (
c2c383c) -
Rollback to inline scripts, move templates to root of dotci (
f0d12b8) -
Split docker vm passwords into multiple outputs again (
23b4cbe) -
Split script into multiple lines for psql_create_db (
10444ff) -
Switch to single rootfs volume and improve resource naming (
f773714)
-
Lakehouse is now a singleton, to avoid initialization when running the help command (
ca5a7ea) -
Normalize loggers to use loguru via an intercept handler (
d18f572) -
Shift should be drift, and count plot should be stacked (
62aefb2)
-
Add a default task that lists all just tasks (
897c520) -
Add missing help message and fix the one for ml monitor plot (
7b17e14)
- Improve performance of REST API by moving Kafka payload queueing to the background
(
2fe859d)
-
Attempt to solve group coordinator errors (
25e9cf1) -
Capture asyncio cancel exception (
5f5f07f) -
Consumer task was meant to be awaited from inside the loop (
0bede44) -
Correct model uri scheme (
a6c589b) -
Dataframe was being forced through the model loaded using mlflow.pyfunc.load, so now we handle multiple input types (
4d9541e) -
Handle failed runs and drop unrequired columns from logged inputs (
29f9259) -
Kafka now runs and initializes properly (
b94dfc4) -
Mlflow healthcheck, switch to kafka's official image (
cea3edf) -
Model needs to be initialized every time, otherwise there is a memory leak (
e53c85f) -
Move mlflow.db to root since db directory didn't exist (
3fed7f1) -
Ollama will now default to CPU when GPU is not available (
a13fd72)
This will, most likely, make it unusable, but at least it won't stop the other services from starting and working as expected.
-
Positive label probability selection (
00d738e) -
Queue logic incompatible with list logic, always flush in the end (
9d59f90) -
Requests cache was causing memory overload (
b734e08) -
Schema name, remove unused tasks (
e1944fa) -
Train/test split now separate from cross-validation (train only) (
11b448e) -
Transform failed when other datasets were not ingested (
029e8c4) -
Update to new lakehouse schema (
1240dc9) -
Update to new ml types and lakehouse schema (
063d28b)
-
Add a second topic for updating inference results with user feedback (
6abb221) -
Add config for new stage catalog with secure storage (
1009792) -
Add config for pairs of topic and expected consumer group (
6224c6c) -
Add config for stage catalog with secure storage (
7d3fbb9) -
Add kafka config section (
175474f) -
Add name to each asyncio task (
dad0fbe) -
Create justfile with tasks from previous and upcoming videos (
9289e70) -
Delete unused test module (
04964ef) -
Reduce sample fraction (
a9ab9a7) -
Rename insert/update to result/feedback to match new event topics (
4ae3ef1) -
Setup mlflow service with sqlite and s3 (
3a0f1ca) -
deps: Add anyascii and inflection for a more robust sanitization, add just for task running, add xgboost for ML project (
17cbf48) -
deps: Add faker to create random dates (
0814e54) -
deps: Add fastapi and uvicorn for the ml server (
5907a38) -
deps: Add joblib to use Memory for caching (
1676db1) -
deps: Add kafka official library (
2de36e8) -
deps: Add mlflow (
5d0b94a) -
deps: Add pip so its version is properly detected by mlflow during model logging (
63edc5a) -
deps: Add scikit-learn (
204411b) -
deps: Add sentence transformers for text embedding (
0a852d8) -
deps: Downgrade from 3.0.3 to 3.0.2 due to mlflow compatibility (
8e2d1bd) -
deps: Replace confluent-kafka with aiokafka (
a052aa8)
-
Add 3-folds (
9c58bfb) -
Add create_at timestamp that defaults to the current date (
6badaf2) -
Add custom MLflow user for tracking (
4818df8) -
Add model logging to mlflow_end_run (
54027f2) -
Add monitor compute task (
36d2b91) -
Add monitor dataset to mlops etl pipeline (
c227f62) -
Add monitor plot task (
27bd6bb) -
Add reload option to use during development (
5e175ad) -
Add sample fraction parameter (
813f031) -
Add train and test tasks, update ETL task with transformation (
54e1adb) -
Apache Kafka server (
b2acb5d) -
Basic training pipelines and CLI command (
e27cc5d) -
Check for curl and add -f to ensure the task fails when status code is >= 400 (
b02c5a1) -
Default to 3-folds, since it is now supported (
daf3db9) -
Drop tasks for mlflow model server (images are too bloated) (
c20ec10) -
Enable artifacts proxy and install boto3 as a dependency (
8e844a6) -
End-to-end kafka producer/consumer implementation (
c1fbbc6) -
Endpoint to flush inference log, refactor inference request to handle A/B/n testing (
8ba2a7e) -
Feature pipelines for TF-IDF and sentence transformers (
70c4182) -
Feedback is now an array and created_at keeps track of time (
19d0527) -
Generic ml dataset loader function (
f364327) -
Health check endpoint, and refactor insert/update to results/feedback for clarity (
d00f612) -
Implement dataset transformation with train/test split and 5-folds and 10-folds (
3e552e0) -
Implement scaffolding for monitoring statistics computation class and count stat (
6d25335) -
Inference API request (
ccbc242) -
Inference feedback update workflow (
9127a92) -
Load ml inference results for a date range (
d6df078) -
Load monitor stats (
5969151) -
Ml monitor is now ml monitor compute and since/until are unspecificed by default (
5e66a95) -
Ml monitor plot command (
9dc514c) -
Ml server start command (
3261c06) -
Mlflow docker image building, container running and server testing tasks (
054f499) -
Mlflow tracking URI env var (
5511986) -
Mlops train per method and features (added), split everything into individual tasks (
1e897f3) -
Monitoring metrics for prediction and feature drift, estimated performance, and user evaluation (
72286b6) -
Now always the latest model is used to build the docker image (
6fbb824) -
Output is now probabilities, so threshold can be set externally (
18d814a) -
Payloads is now types and inference request contains one or multiple models (
d7b31b7) -
Plotting for model monitoring statistics (
f14d51e) -
Query latest snapshot_id (version) (
779aa61) -
Replace slugify with custom sanitization alternative (
cf437c3) -
Rolling prediction drift computation (
29c439a) -
Safer attach (if not exists only) (
0ebf16d) -
Scaffolding for inference simulation and monitoring (
5d78f5e) -
Scaffolding for ML CLI and workflow functions (
647559f) -
Scaffolding for ml server (
3f3d05d) -
Schema and count functions, remove redundant initialization code (same as generate_init_sql) (
b816c9b) -
Separate tracking from training logic, and use PandasDataset instead of custom dataset (
80b895a) -
Start consumer thread with ml server (
7523cfb) -
Support for 3-fold dataset loading and CV training (
106a6ef) -
Support for catalogs with secure storage (
e96ce72) -
Switch to expression api and make since/until optional, and implement loading for monitor stats (
368d039) -
Task to generate init SQL, also used on check fail, rename mlflow model server tasks, add mlops-serve task (
a79c379) -
Task to run inference simulation using monitor dataset (
fa48f2c) -
Tasks to test inference and logging requests (
cc1aa31) -
Time tracking logging utils (
d59abe4) -
Train and test set loaders for document datasets (
9bbd7c9) -
Transformation for monitor depression dataset (
5fb5d78) -
Update to new ml types, fix lakehouse collision with transform, and improve API to make flush usable externally (
68ef918) -
Working inference simulation (
6a3fcab)
-
Cleaner variable names and paths (
1d52213) -
Correct and improve log messages (
c8fc5ca) -
Easier naming for MLOps tasks (
6ac1853) -
Extract color functions into a shared module (
6a859ef) -
Extract prediction from server to its own module using joblib Memory for cache (
60467b7) -
Lakehouse logging is now the default (
de16c77) -
Move server health check to server module (
c63cd22) -
Normalize method names and group by context (
c01be6c) -
Normalize ml dataset column names and target type (
bee2af9) -
Remove redundant case for computing the S3 prefix (
b836af2) -
Remove unneeded echo (
5e6dbb9) -
Rename all init containers with init suffix (
99e3939) -
Rename n_folds to k_folds (
b3ac002) -
Set random model section log level to debug (
228b805)
-
Any positive ESI is now considered competition, and is separate from intensity (
25844f1) -
Log file relative path to cwd failed when not directly contained using Path (
e4f5b62)
-
Commit notebook generated during video recording (
454d0dd) -
deps: Add adjustText to optionally fix rendering of overlapping node labels (
36cbc33) -
deps: Add geopandas to plot maps (
62d5ef1) -
deps: Add jupyterlab, matplotlib, and networkx for graph data science (
e29c08f) -
deps: Remove unneeded adjustText and add scipy back as a requirement for networkx layout computation (
76ef5d4)
-
Add CLI support for computing the CON score (
8c94f6e) -
Add edge arrows and node colors per label (
ed56184) -
Add graph analytics module, starting with a CON score (
ff1f926) -
Add graph transparency and improve labels (
02dc859) -
Add scale to arrow placement, add optional visualization weight (
9190d2c) -
Compare communities and components, study economical pressure (
afceea8) -
Competiton network analysis, including community and weak component analysis (
62e54fd) -
Create a basic graph theme matching DLT (
3210fa5) -
Dominating and weaker economy individual analysis (
986a2d6) -
Edge direction now based on common exports, from highest to lowest total amound (
77325bd) -
Improve graph plotting and add map plotting (
266dfca) -
Networkx graph plot helper to use with notebooks (
a36b6c9) -
Revisted the whole notebook, restructuring and adding depth where needed (
6a3dcb1) -
Script to easily convert Jupyter Notebooks to markdown (
4b0c792) -
Set label w/ prop per node type and render label wo/ overlapping (
8c0b6fb) -
Setup notebook for graph data science (
1d96e63) -
Support for loading Parquet into DuckLake from Python (
4035f63) -
Trade alignment analysis (
80d5ef1) -
Trade alignment analysis (cont) (
da6e848)
-
Different score reset strategy (
d4d7d9d) -
No longer setting flags for dominating and weaker (
d8013c4) -
Remove unused import (
65defb1) -
Replace os.path ops with Path ops (
84c73a9) -
Use kuzu extension instead of kz (
d815cef) -
Use ref instead of hardcoded FQN (
ba6de1a)
-
Add missing schema configs for new econ comp models (
c4daafb) -
Edges needed to be defined based on node_id, which required these changes (
398ba70) -
Remove inexistent property (
918f23a) -
Remove not null tests where they were not required (
43efc61) -
Remove product parent relationship, as there is no multi-level data here (
2d26651) -
Remove repeated country pairs in reverse order (
1f2f867) -
Required aggregation per country and product, disregarding partner (
635dc72) -
Types and missing null strings (
40a79d7)
-
Add cypher script to compute music_taste graph stats (
7a0a48d) -
Add env var for econ comp graph db (
3e34e80) -
Configs for analytics mart (
40dee56) -
Re-enable requests-cache with streaming (
62c7dff) -
Rename KuzuDBs to match new single-file format (
0e797ae) -
Simplify music taste graph stats script (
5b964fb) -
Upgrade explorer script to work with kuzu 0.11.0 (
36f6cf7) -
deps: Add humanize to print byte sizes in human-readable format (
6238484) -
deps: Add requests cache dep (
b7c5fd5) -
deps: Add tqdm dep for tracking download progress (
5e2ba51) -
deps: Bump up kuzu to 0.11.0 (
74f2f4f) -
deps: Bump up version inside uv.lock (
7124ff4)
- Fill-in the missing schema models for analytics, and econ_comp nodes and edges
(
aa65fcd)
-
Add model selection CLI option to test cmd (
499bac0) -
Aggregated view for 2020-2023 trade covering recent years (
c579742) -
Cli command to expunge/clean cache (
f412b51) -
Complete dataset template for The Atlas of Economic Complexity (
6e2cb9c) -
Country and product nodes, product-country export and import edges, and product parent edges (
cca6d5c) -
Country-country ESI calculation (
0ca0346) -
Datacite working downloader (
bf09fb1) -
Ingest country classification data (
09c3ac7) -
Logic changed to account for the last 3 years in data instead of a fixed range (
8599498) -
Move cache to shared level and add expunge function and requests cache (
805511f) -
Rename 2020-2023 to latest 3y and add schema for country-country metrics (
af044f8) -
Select top 5% ESI country-country relations for edges (
3356e4f) -
Skip cache for downloads and display progress bar (
039e08a) -
Split ingestion into multiple modules and add dataset templates (
8e3c6b8) -
Stage transformations for TAoEC (
6e082e3) -
Support for cache usage statistic printing (
436391b) -
Support for loading econ_comp graph (
93396df)
- Increase chunk size and make sure temp files are cleaned even when the script is stopped
(
39943df)
-
Log debug message containing produced context (
5917a15) -
Rename context to entities when referring to entity nodes (
ff6e0df)
- Ensure ESI is within a 0..1 range
(
d1ef5ce)
- Add error control to the GraphRAG chain
(
4f015ca)
- deps: Add colorama to color error messages
(
389a8a1)
- Graph rag CLI options for interactive and direct querying
(
8f54d81)
- Remove unused import
(
c5bfb82)
- Correct logic for deleting vector index if exists
(
516b677)
-
Add missing word in prompt (
2001d8d) -
Container names will now use the default naming schema (
6d267b8) -
Ensure predictable table indexing order (
4547bd3) -
Graph retriever and context assembler class scaffolds (
eae806d) -
Make sure kuzudb-explorer is using a fixed image version (0.10.0 currently) (
80c8aca) -
Path combination and scaffolding for hydrating (
1c7db62) -
Prefix log message is now debug-level (
de7d708) -
Print version from pyproject.toml via CLI argument (
2fa5b86) -
Remove unused semantic-release config (
1692e14)
This option was set in the wrong location, so it did nothing. We don't need it.
-
Replace default nomic-embed-text ollama model with phi4:latest (
ee324f1) -
Setup ollama service and add env var for default model install (
4af078b) -
deps: Add ollama dependency (
4d1608d) -
deps: Add pytest to dev deps and configure default CLI options (
baabcd5) -
deps: Langchain with ollama support, and a prompt helper library (
4565ec9) -
deps: Langchain-kuzu (
eed603d) -
deps: More-itertools (
ecb7f9c)
-
Add missing version to semantic-version command (
c6facd1) -
Fix call to semantic release using a function (
d577a45) -
Fix changelog_file config location (
b5bb8d7) -
Fix pyproject.toml version setting for semantic release (
db96d22) -
Remove redundant build option, already set on pyproject.toml (
e8f6d6b)
- Add knn method info to clarify the max_distance param
(
0fdf01f)
-
Add file logging by default (and option to disable) (
2f9a36e) -
Add final answer pipeline and improve interactive mode (
58bff5a) -
Basic prompt for graph RAG and langchain scaffolding (
50173de) -
Combined knn step for context assembler (
33b20ab) -
Context assembly based on ANN, paths to neighbors, and random walks from neighbors (
9323352) -
Cypher friendly schema format (
87f8171) -
First working NER implementation based on langchain-kuzu (
a743062) -
Graphrag is now a LangChain Runnable and components became methods (
cd04d33) -
Knn query support (
2bca4a0) -
Knn, shortest paths sampler and random walk computation for context assembler (
22d4f0a) -
Kuzudb-explorer launcher script now handles different paths (
4dc65a9) -
Lazy singleton S3 resource and bucket connection (
63388a1) -
Ollama service with gemma3 and nomic-embed-text (
83b68dd) -
Path hydration and bulk description (
97ea465) -
Return paths as interleavings of node_id and rel label (
17b790a) -
Support for indexing embeddings (
c687f81) -
graph.ops: Automatically add a custom embeddings column to all node tables (
1900f21)
Closes #2
- graph.ops: Produce node schema with properties names and types
(
291d42f)
- Migrated from KuzuQAChain to a custom strategy still based on langchain-kuzu
(
ebce585)
- Change property match to WHERE cond and lower the temperature
(
f0f9198)
-
Correct paths_df fixture and add missing exclude_props (
c167b0c) -
Invoke test for GraphRAG runnable (
f724224) -
Move graph db check to global fixtures (
d2963e3) -
Print final chain output (
40f2d14) -
Setup ops and paths_df to test path_descriptions() (
3f3c160) -
Tests will only print logs to stderr and always use debug level (
fafb3bf)
-
Add node_id to all nodes (
f927dcd) -
Batch should be column, not parameters (
73eeb9e) -
Condition for ignoring files during deletion (
9da0e0f)
The manifest.json was being deleted by mistake.
- Correct name for placeholder models
(
fa07609)
feat: implement all missing edge models
-
Ducklake integration using dev version for upcoming dbt-duckdb 1.4.1 (
effe0d7) -
Duplicate alias for source_id and target_id columns (
d6b6790) -
Ensure tags are checked out (
d833c89) -
Generate sequential node ID globally for all nodes (
8d019ac) -
Genre loading queries (
abc6833)
refactor: reorganize models into stage and marts
feat: support for edge loading (untested)
-
Genre nodes become a single table to ensure uniqueness (
7ec7f03) -
Incorrect S3 secret variable (
6fa7394) -
Missing description for playcount (
c505c9c) -
Missing node ID dataset-based prefix (
bfecf9f) -
Missing nodes prefix on ref table (
58b04f5) -
Missing underscore after prefix (
3a0d6d9) -
No longer defaulting to upstream dependencies (
6d2d68a) -
Regression introduced by removing key_parts (
668a31c) -
Removed extra bracket in log message (
782bcc9) -
Should be alias, not name (
d03e079) -
Should be list of list, not list of tuple (
c3b7419) -
Sqlite prefix missing (
78bae87) -
Switch to single table for genre nodes (
6d5dd1f) -
Update graph loading process based on new config schema (
eebc677) -
Update prune to use class prefix (
49ca20f) -
Using map instead of list per node embedding (
e6f1caf)
fix: add missing schema alter to add embedding property to all nodes
-
Wrapper to copy from data mart via a temporary file (
3cd0268) -
Wrong column name in schema (
af2a693) -
Wrong filename case, should be RO, not ro (
905a303) -
Wrong model name in schema (
588f3bc) -
Wrong reference, missing schema prefix (
701fb1d) -
Wrong variable order in log message (
40cb055)
-
Add description and pandas dep (
b7c40d0) -
Add DUCKLAKE_PATH to .env (
3ab2bee) -
Add kuzu as a dependency (
f1e2a5c) -
Add S3 prefix for exports (
88ee16c) -
Add solid background to diagram (
7499f46) -
Add torch, torch-sparse, and torch-geometric deps (
5ddd0f8) -
Better schema name organization for graphs (
948759a) -
Click and minio deps (
c6e450f) -
Default to eu-west-1, as MinIO also defaulted to it (
6afa282) -
Delete example models (
7012b5b) -
Fix version for python-semantic-release to match deps (
3ff93d9) -
Github dark mode background color (
ca97d44) -
Gitignore vscode directory (
a2fcabd) -
Initial log message for export (
1207e93) -
Initial log string is now a welcome string (
0edefc6) -
Make sure we start from 0.1.0, not 1.0.0 (
cb2b7c5) -
Remove unused dep (
ba09c90) -
Remove unused deps and update docs referring to them (
7da0d24) -
Replace with official GHA for python-semantic-release (
7301029) -
Script to launch temporary docker container with KuzuDB Explorer for a database (
52b617a) -
Setup dltctl CLI tool (replaces Makefile) (
405d800) -
Simplify node and edge schemas, using Gremlin-like notation (
887dddc) -
Solid background in individual rectangles (
8f68204) -
Switch to a multi-database marts config (
1cf615c) -
Temporarily removed (
bd26438)
Schema was outdated and was blocking dbt run.
-
Update config to match multi-database marts (
d47f96f) -
Won't use the extra command in favor of one entry point (
698a3d1) -
deps: Move python-semantic-release to dev deps (
9df3ed6)
-
Add graph and shared (
866b37c) -
Add specification for exports pruning (
e42acbf) -
Dependency management development instructions (
2f9393f) -
Duckdb init script description (
94b1959) -
End-to-end documentation (
190ef1b) -
Fix section links (
e00a537) -
Latest.json is now manifest.json (
6cdb057) -
Remove suffix from info boxes (
170ebff) -
Requirements, quick start, architecture diagram (
8009821) -
Schemas for nodes and edges of the music graph (
409b4cb) -
Structured sections for README (
5a6c128) -
Update schema for the DSN and MSDSL datasets (
9a56aaa) -
Update storage layout (
f60d021) -
Update storage layout (
1383bfd) -
Update storage layout and specification for the ingest command (
b66cffe) -
Using generic dark background (
85476f1)
- Add CLI args for read only and to reset
(
dbfefa8)
Container is now kept between sessions, unless explicitly reset.
-
Backup restore can now specify source date (
62a52af) -
Basic support for dbt run via dlctl transform (
b3f6240) -
Catalog backup and restore (
cb87830)
refactor: prefix is now set when instancing Storage
feat: Storage can now upload/download files or a directory
- Create directories for DuckDB databases
(
3976b19)
This way we can set the marts databases to be stored under local/marts/.
-
Dbt debug option (
f64313e) -
Debug option controlling log level (
9e5f030) -
Directory structure for a DuckLake lakehouse (
87ed579) -
Dlctl tools generate-init-sql (
c24ca39)
This will output into local/init.sql. The scripts/init.example.sql or the gitignored scripts/init.sql are no longer used.
docs: add a help message to all commands
-
Duckdb CLI init script to connect to the lakehouse (
788691f) -
Error control for empty results (
d5e1d65) -
Exception capture for KuzuOps (
6a2a498) -
Export scripts that output parquet (
218bfeb) -
Frp node embedding over KuzuDB (
68a7514) -
Improve backups listing (
8c5284d) -
Load all genres tables based on shared macro (
ed8197b) -
Ls and prune for ingest and exports (
e9a3505)
fix: add missing manifest to exports
fix: ignored file filtering
fix: add prefix logic to upload manifest
-
Minio docker service (
00f2dab) -
Node embedding computation and graph DB update (
579b477) -
Option to use latest export when loading a graph (
3b00439)
refactor: exports now stored using the same directory structure as marts
-
Qol for CLI parameters and defaults, and logging (
c6ee19b) -
Quality of life for explorer startup and exit (
4562ab6) -
Replace export scripts with a load method from the new graph package (
8c356d6) -
Replace load_dotenv with proper validation via environs (
aacb1a3)
refactor: centralize storage and environment variable loading into shared packages
refactor: improve function naming and arguments
feat: set placeholder upload as optional
refactor: rename lastest.json to manifest.json
feat: storage now implements an env var loader with latest file paths
-
Schema name without the 'main_' prefix (
37b0bff) -
Setup semantic releases (
ef856fd) -
Strip schema name from table name (
fc83dfa) -
Stubs for node computation embedding command (
f220fc2) -
Support file downloading from object storage (
eff8a8b) -
Support for kaggle and hugging face ingestion (
39f8fb5) -
Support for manual dataset ingestion (
3b6c3d1) -
Support for running a subset of models during transform (
4986aee)
- Switch to UNPIVOT strategy
(
51059e2)
-
Better naming scheme for graph schemas, and node and edge tables (
1716d17) -
Cleanup file to avoid inline comments (
498534d) -
Con is now conn (
cc99988) -
Embedding batch updates now handled directly by NodeEmbedder (
5dc6e51) -
Genres/nodes and edges are now stored in the graphs mart (
9963909)
docs: schemas updated with node and edge information and basic testing
-
Graph manager is now ops (
c84dbf5) -
Improved docs and better naming for DuckLake DBs (
0b8097d) -
Latest export is now default, but re-exporting can be forced (
54324c3) -
Log exception message without stack trace (
1d52f40) -
Name source and target columns (
b0c83fd)
feat: cast node IDs to integer
- Nodes and edges directories to match graph DB loading format
(
86ef29e)
feat: million song dataset, spotify and lastfm transformation
feat: improve deezer genres and edges mart table schemas
-
Overall simplification of the explore graph script (
94b0bb0) -
Qol, log message in lower case after colon (
76cccc8) -
Qol, log message now includes epoch (
51c96cd) -
Remove source from edges (
ca15947) -
Remove uneeded echos (
a5f86a0) -
Rename models to include a schema prefix (
dd263ab)
feat: implement missing node models
-
Rename music graph back to music_taste (
d0d7a57) -
S3 access key and secret renamed to reflect common naming schema (
d00a4c9) -
Table materialization is now default (
35dc856) -
Taking advantage of parents accessor (
1038678) -
Tools and utils moved to shared (
3900df1)
feat: init SQL can now be returned as a string
fix: lakehouse relied on an init script that's no longer there
Using generate_init_sql to produce a string with the required SQL instead.
chore: uncommended code that didn't run due to KuzuDB bug
- Util is now templates for clarity
(
7518c06)
chore: groups no longer invokable without arguments
This had been added for better performance, but did nothing.
refactor: split export into standalone feature
Extracted from graph load and integrated into the existing export command (renamed from exports to export).