Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
07dd467
feat(duckdb): Add DuckDB transpiler for VTL execution (#477)
javihern98 Feb 3, 2026
0a9cb84
Merge origin/main into duckdb/main
javihern98 Feb 3, 2026
a1417d0
Duckdb/structure refactoring (#491)
javihern98 Feb 6, 2026
51b0015
Merge branch 'duckdb/main' of github.com:Meaningful-Data/vtlengine in…
javihern98 Feb 9, 2026
e60f1ce
Added env variable VTL_MAX_TEMP_DIRECTORY_SIZE to handle temp directo…
javihern98 Feb 11, 2026
ceb59b1
Merge branch 'main' of github.com:Meaningful-Data/vtlengine into duck…
javihern98 Feb 13, 2026
902ec94
Implemented base AST to SQL Query formatter (#516)
mla2001 Feb 18, 2026
0bcc81c
Merged main into duckdb_main (#536)
mla2001 Feb 25, 2026
0a21074
Minor fix
mla2001 Feb 25, 2026
d44800f
Merge remote-tracking branch 'origin/main' into duckdb/main
mla2001 Feb 25, 2026
fffc4aa
Bump main 1.6.0rc4 into duckdb/main (#566)
mla2001 Mar 5, 2026
50a9904
Fix #568: (Duckdb) Fix all remaining DuckDB errors unrelated to Time …
mla2001 Mar 12, 2026
a5cd914
Implement 476: (Duckdb) Implement hierarchy operators (#601)
mla2001 Mar 17, 2026
25b4330
Fix #603: Custom STRUCT types for TimePeriod and TimeInterval (#604)
javihern98 Mar 18, 2026
fd476c3
Update #476 (#605)
mla2001 Mar 18, 2026
279f3d8
Fix #519: Implement DuckDB time operators (#606)
javihern98 Mar 18, 2026
f372145
Implement #475: (DuckDB) Implement SDMX loading (#608)
javihern98 Mar 18, 2026
73ab4f3
Reconcile duckdb/main with main and remove s3fs dependency (#614)
javihern98 Mar 20, 2026
963c1bf
Merge remote-tracking branch 'origin/main' into merge-main-into-duckdb
javihern98 Mar 20, 2026
6a99cc6
Merge pull request #615 from Meaningful-Data/merge-main-into-duckdb
javihern98 Mar 20, 2026
983b338
Remove S3 URI support from pandas backend
javihern98 Mar 20, 2026
2e969ed
Document S3 URI support via DuckDB backend in run() docstring
javihern98 Mar 20, 2026
8432946
Merge pull request #616 from Meaningful-Data/remove-s3-pandas-path
javihern98 Mar 20, 2026
d5ef70a
Route all test patterns through DuckDB backend when configured
javihern98 Mar 20, 2026
b4da442
Merge pull request #618 from Meaningful-Data/cr-duckdb-test-routing
javihern98 Mar 20, 2026
b3dd930
Route all remaining test patterns through run() API
javihern98 Mar 20, 2026
098fa86
Revert "Route all remaining test patterns through run() API"
javihern98 Mar 20, 2026
791a2bb
Route all remaining test patterns through run() API (#619)
javihern98 Mar 23, 2026
72d303b
Final checks and tests handling for Duckdb (#623)
mla2001 Apr 10, 2026
3fa9227
Fix #657: Clean Duckdb transpiler (#671)
mla2001 Apr 29, 2026
413a868
Merged main into duckdb/main (#678)
albertohernandez1995 Apr 29, 2026
4f41319
Merge main into duckdb main (#683)
mla2001 May 5, 2026
9fb196c
Merge origin/main into duckdb/main to reconcile topology
javihern98 May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Testing

on:
push:
branches: [ "main", "dev" ]
branches: [ "main", "duckdb/main", "dev" ]
pull_request:
branches: [ "main", "dev" ]
branches: [ "main", "duckdb/main", "dev" ]
workflow_dispatch:

permissions:
Expand Down Expand Up @@ -53,7 +53,13 @@ jobs:
run: poetry run ruff check --output-format=github
- name: Run type checks
run: poetry run mypy --show-error-codes --pretty
- name: Run tests
- name: Run tests with pandas backend
env:
VTL_ENGINE_BACKEND: pandas
run: poetry run pytest -n auto --verbose --tb=short --durations=10
- name: Run tests with duckdb backend
env:
VTL_ENGINE_BACKEND: duckdb
run: poetry run pytest --cov=vtlengine -n auto --verbose --tb=short --strict-markers --strict-config --durations=10
- name: Check coverage
run: poetry run coverage report --fail-under=90
run: poetry run coverage report --fail-under=85
16 changes: 12 additions & 4 deletions .github/workflows/ubuntu_test_24_04.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Ubuntu 24.04 Tests

on:
push:
branches: [ "main", "dev" ]
branches: [ "main", "duckdb/main", "dev" ]
pull_request:
branches: [ "main", "dev" ]
branches: [ "main", "duckdb/main", "dev" ]

permissions:
contents: read
Expand Down Expand Up @@ -37,6 +37,7 @@ jobs:
python3-jsonschema \
python3-networkx \
python3-sqlglot \
python3-psutil \
python3-pytest \
cmake \
g++ \
Expand All @@ -49,7 +50,7 @@ jobs:
sdmxschemas==1.0.0 \
parsy==2.2 \
msgspec==0.19.0 \
duckdb==1.1 \
duckdb==1.4.1 \
pysdmx==1.9.0

- name: Download ANTLR4 C++ runtime
Expand All @@ -65,5 +66,12 @@ jobs:
- name: Install C++ parser
run: pip install --break-system-packages --no-deps .cpp-wheel/*.whl

- name: Run tests
- name: Run tests (pandas backend)
env:
VTL_ENGINE_BACKEND: pandas
run: pytest --verbose --tb=short --strict-markers --strict-config --durations=10

- name: Run tests (duckdb backend)
env:
VTL_ENGINE_BACKEND: duckdb
run: pytest --verbose --tb=short --strict-markers --strict-config --durations=10
4 changes: 2 additions & 2 deletions .github/workflows/version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Version Consistency Check

on:
push:
branches: [ main ]
branches: [ main, "duckdb/main" ]
pull_request:
branches: [ main ]
branches: [ main, "duckdb/main" ]

permissions:
contents: read
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,6 @@ build/
# Claude Code settings
.claude/*
!.claude/CLAUDE.md

# Third-party files that we want to ignore
third_party/*
1,113 changes: 32 additions & 1,081 deletions poetry.lock

Large diffs are not rendered by default.

5 changes: 1 addition & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,9 @@ dependencies = [
"pyarrow>=14.0,<25.0",
"numpy>=2.0.2,<2.1; python_version < '3.10'",
"numpy>=2.2.0,<2.5; python_version >= '3.10'",
"psutil>=7.2,<8.0"
]

[project.optional-dependencies]
s3 = ["s3fs>=2022.11.0"]
all = ["s3fs>=2022.11.0"]

[project.urls]
Repository = 'https://github.com/Meaningful-Data/vtlengine'
Documentation = 'https://docs.vtlengine.meaningfuldata.eu'
Expand Down
61 changes: 34 additions & 27 deletions src/vtlengine/API/_InternalApi.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
)

from vtlengine import AST as AST
from vtlengine.__extras_check import __check_s3_extra
from vtlengine.AST import Assignment, DPRuleset, HRuleset, Operator, PersistentAssignment, Start
from vtlengine.AST.ASTString import ASTString
from vtlengine.DataTypes import SCALAR_TYPES
Expand Down Expand Up @@ -77,15 +76,9 @@ def _extract_data_type(component: Dict[str, Any]) -> Tuple[str, Any]:
Raises:
InputValidationException: If the data type key or value is invalid
"""
if "type" in component:
key = "type"
value = component["type"]
else:
key = "data_type"
value = component["data_type"]

check_key(key, _SCALAR_TYPE_KEYS, value)
return key, SCALAR_TYPES[value]
key = "type" if "type" in component else "data_type"
check_key(key, _SCALAR_TYPE_KEYS, component[key])
return key, SCALAR_TYPES[component[key]]


def _load_dataset_from_structure(
Expand Down Expand Up @@ -211,25 +204,27 @@ def _load_single_datapoint(
plain CSV, SDMX-CSV, and SDMX-ML file formats.

Args:
datapoint: Path or S3 URI to the datapoint file.
datapoint: Path to the datapoint file.
sdmx_mappings: Optional mapping from SDMX URNs to VTL dataset names.
"""
if not isinstance(datapoint, (str, Path)):
raise InputValidationException(
code="0-1-1-2", input=datapoint, message="Input must be a Path or an S3 URI"
code="0-1-1-2", input=datapoint, message="Input must be a Path"
)
# Handling of str values
if isinstance(datapoint, str):
if "s3://" in datapoint:
__check_s3_extra()
dataset_name = datapoint.split("/")[-1].removesuffix(".csv")
return {dataset_name: datapoint}
# Converting to Path object if it is not an S3 URI
raise InputValidationException(
code="0-1-1-2",
input=datapoint,
message="S3 URIs are only supported with use_duckdb=True.",
)
# Converting to Path object
try:
datapoint = Path(datapoint)
except Exception:
raise InputValidationException(
code="0-1-1-2", input=datapoint, message="Input must refer to a Path or an S3 URI"
code="0-1-1-2", input=datapoint, message="Input must refer to a Path"
)
# Validation of Path object
if not datapoint.exists():
Expand Down Expand Up @@ -274,7 +269,7 @@ def _load_datapoints_path(
happens in load_datapoints() which supports both formats.

Args:
datapoints: Dict, List, or single Path/S3 URI with datapoints.
datapoints: Dict, List, or single Path with datapoints.
sdmx_mappings: Optional mapping from SDMX URNs to VTL dataset names.

Returns:
Expand All @@ -294,11 +289,17 @@ def _load_datapoints_path(
raise InputValidationException(
code="0-1-1-2",
input=datapoint,
message="Datapoints dictionary values must be Paths or S3 URIs.",
message="Datapoints dictionary values must be Paths.",
)

# Convert string to Path if not S3 or URL
if isinstance(datapoint, str) and "s3://" not in datapoint and not _is_url(datapoint):
if isinstance(datapoint, str) and _is_s3_uri(datapoint):
raise InputValidationException(
code="0-1-1-2",
input=datapoint,
message="S3 URIs are only supported with use_duckdb=True.",
)
if isinstance(datapoint, str) and not _is_url(datapoint):
datapoint = Path(datapoint)

# Validate file exists
Expand Down Expand Up @@ -522,14 +523,14 @@ def load_datasets_with_data(
not isinstance(v, (str, Path)) for v in datapoints.values()
):
raise InputValidationException(
"Invalid datapoints. All values in the dictionary must be Paths or S3 URIs, "
"Invalid datapoints. All values in the dictionary must be Paths, "
"or all values must be Pandas Dataframes."
)

# Handling Individual, List or Dict of Paths, S3 URIs, or URLs
# Handling Individual, List or Dict of Paths or URLs
# At this point, datapoints is narrowed to exclude None and Dict[str, DataFrame]
# All file types (CSV, SDMX) are returned as paths for lazy loading
# URLs are preserved as strings (like S3 URIs)
# URLs are preserved as strings
datapoints_paths = _load_datapoints_path(
cast(Union[Dict[str, Union[str, Path]], List[Union[str, Path]], str, Path], datapoints),
sdmx_mappings=sdmx_mappings,
Expand Down Expand Up @@ -741,10 +742,11 @@ def _check_output_folder(output_folder: Union[str, Path]) -> None:
"""
if isinstance(output_folder, str):
if "s3://" in output_folder:
__check_s3_extra()
if not output_folder.endswith("/"):
raise DataLoadError("0-3-1-2", folder=str(output_folder))
return
raise InputValidationException(
code="0-1-1-2",
input=output_folder,
message="S3 URIs are only supported with use_duckdb=True.",
)
try:
output_folder = Path(output_folder)
except Exception:
Expand Down Expand Up @@ -900,6 +902,11 @@ def ast_to_sdmx(ast: AST.Start, agency_id: str, id: str, version: str) -> Transf
return transformation_scheme


def _is_s3_uri(value: Any) -> bool:
"""Check if a value is an S3 URI."""
return isinstance(value, str) and "s3://" in value


def _is_url(value: Any) -> bool:
"""
Check if a value is an HTTP/HTTPS URL.
Expand Down
Loading
Loading