Skip to content

Implement DWG converter endpoint#1

Open
willy1989cv wants to merge 6 commits into
mainfrom
feat/dwg-converter
Open

Implement DWG converter endpoint#1
willy1989cv wants to merge 6 commits into
mainfrom
feat/dwg-converter

Conversation

@willy1989cv

@willy1989cv willy1989cv commented May 31, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Added DWG file preview conversion feature that generates PNG previews from uploaded or linked DWG resources
    • Exposed new API endpoint /api/3/action/convert_dwg for accessing preview generation
    • Implemented preview caching to improve performance
    • Added configurable settings for preview generation (timeouts, image dimensions, cache location)
  • Documentation

    • Expanded README with comprehensive setup and configuration guidance for the preview feature
  • Chores

    • Updated project dependencies
    • Updated repository ignore rules for build artifacts

@coderabbitai

coderabbitai Bot commented May 31, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a complete DWG-to-PNG preview feature to the Malmö CKAN extension, enabling dynamic generation and caching of preview images from uploaded or external DWG files via external ODA converter and Python rendering libraries.

Changes

DWG Preview PNG Conversion

Layer / File(s) Summary
Project setup and dependencies
.gitignore, requirements.txt, setup.py
Python cache ignore patterns, new runtime dependencies (requests, ezdxf, matplotlib, Pillow), and updated setuptools install_requires.
Configuration framework
ckanext/malmo/dwg_preview/__init__.py, ckanext/malmo/dwg_preview/config.py
Package public API export, defaults constants, PreviewConfig frozen dataclass with from_ckan_config() factory, and typed config parsers with validation and bounds enforcement.
ODA DWG-to-DXF conversion
ckanext/malmo/dwg_preview/oda.py
Subprocess wrapper that stages input, builds ODA command with optional xvfb-run, executes with timeout handling, validates DXF output, and converts failures to CKAN ValidationError.
DXF-to-PNG rendering and layout selection
ckanext/malmo/dwg_preview/render.py
Main render_dxf_to_png entry point: loads DXF, discovers and prioritizes layout candidates by viewport/entity counts, renders via ezdxf and matplotlib with configurable margins, validates coverage against thresholds, conditionally re-renders with tighter bounds, and measures final coverage via pixel scanning.
Preview caching utilities
ckanext/malmo/dwg_preview/cache.py
SHA-256 content hashing, deterministic cache path construction from resource id and file hash, cache validity checks, and atomic cache writes via temp file and os.replace.
Service orchestration and resource handling
ckanext/malmo/dwg_preview/service.py
build_preview_payload orchestrator: validates and fetches CKAN resources, applies DWG detection heuristics, stages DWG from uploads or HTTP sources with size enforcement, computes content hash, serves cached PNG or generates new preview via DWG→DXF→PNG pipeline, returns payload with PNG bytes and metadata.
CKAN action and plugin wiring
ckanext/malmo/logic/action.py, ckanext/malmo/plugin.py
Side-effect-free CKAN action convert_dwg that delegates to service, and plugin implementing IBlueprint with action registration and get_blueprint() method.
HTTP endpoint and Flask route
ckanext/malmo/views.py
Blueprint exposing /api/3/action/convert_dwg handling GET/POST, normalizing payloads, building CKAN context, calling action, returning preview bytes with proper headers, and mapping ValidationError to HTTP 409 or exceptions to HTTP 500.
Feature documentation
README.md
Comprehensive docs for DWG Preview: pipeline overview, runtime requirements (ODA on PATH, xvfb, rendering libraries), CKAN configuration options (timeouts, limits, rendering, cache), and Docker setup with optional local ODA asset override.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A conversion factory born from dust and dreams,
DWG to PNG flows through rendering streams,
Layouts dance with coverage and care,
While caches keep previews beyond compare!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement DWG converter endpoint' accurately summarizes the main change: adding a new CKAN action endpoint to convert DWG files to PNG previews.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dwg-converter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@willy1989cv willy1989cv requested a review from mpolidori May 31, 2026 20:41

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
ckanext/malmo/views.py (1)

49-50: ⚡ Quick win

Escape the preview filename before putting it in Content-Disposition.

payload["filename"] comes from resource metadata, so quotes, CR/LF, or non-ASCII characters can generate a broken header here. Normalizing the fallback filename and adding an RFC 5987 filename* value avoids malformed responses for perfectly valid resource names.

Suggested fix
+from urllib.parse import quote
+
@@
 def dwg_preview_convert() -> flask.Response:
@@
     response = flask.Response(payload["content"], mimetype=payload["mimetype"])
-    response.headers["Content-Disposition"] = f'inline; filename="{payload["filename"]}"'
+    response.headers["Content-Disposition"] = _build_content_disposition(payload["filename"])
     response.headers["Cache-Control"] = "private, no-store, max-age=0"
     return response
@@
 def _validation_error_response(error: ValidationError) -> tuple[flask.Response, int]:
@@
     )
+
+
+def _build_content_disposition(filename: str) -> str:
+    fallback = (
+        filename.replace("\\", "_")
+        .replace('"', "_")
+        .replace("\r", "_")
+        .replace("\n", "_")
+    )
+    return f'inline; filename="{fallback}"; filename*=UTF-8\'\'{quote(filename)}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/views.py` around lines 49 - 50, The Content-Disposition header
uses raw payload["filename"] which may contain quotes, CR/LF or non-ASCII chars;
sanitize and normalize it before use: strip/control-character and quote chars
from payload["filename"], produce an ASCII fallback via
unicodedata.normalize('NFKD') and encoding errors='ignore' (or a safe fallback
like "preview"), percent-encode the original UTF-8 name per RFC 5987 for
filename* (e.g. "UTF-8''%..." form), and set
response.headers["Content-Disposition"] to include both a safe quoted filename
fallback and the RFC5987 filename* alternative (while keeping the existing
response creation code that uses payload["content"] and payload["mimetype"]).
Ensure these changes are applied where response is created and header is set
(the response variable and Content-Disposition assignment).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ckanext/malmo/dwg_preview.py`:
- Around line 138-148: resource["url"] is passed directly into _download_to_path
which allows SSRF via redirects or hostnames resolving to
loopback/RFC1918/metadata addresses; add server-side destination validation
before calling _download_to_path and enforce it for every redirect hop by either
(a) resolving the URL hostname(s) with socket.getaddrinfo/ipaddress and
rejecting addresses in loopback/private/link-local/multicast/reserved ranges, or
(b) disabling automatic redirects and validating the Location header host for
each redirect then resolving and checking its IPs before following; implement
the check as a reusable helper (e.g., validate_download_target(url) used where
resource_url is read) and update _download_to_path or its requests usage to
ensure per-redirect validation is applied.
- Line 141: The current log call logs raw external URLs (log.info("Downloading
external DWG resource=%s url=%s", resource.get("id"), resource_url)) which can
leak presigned query strings or credentials; change it to avoid logging the full
resource_url by either logging only resource.get("id") or a redacted URL (strip
query string and any userinfo/credentials and optionally keep only host/path)
before passing to log.info; implement a small helper (e.g.,
redact_url(resource_url) or redacted_resource_url variable) that uses
urllib.parse to remove .query and .username/.password and replace the existing
log.info call to use the redacted value or just the resource id.

In `@README.md`:
- Line 3: Replace the garbled "Malm?" string in the README text ("Customizations
for the City of Malm? CKAN instance.") with the correct "Malmö" and ensure the
README.md is saved with UTF-8 encoding (or re-encode the file to UTF-8) so the
"ö" character is preserved; update the literal in README.md and verify the
repo/file encoding settings to prevent future character corruption.
- Around line 43-47: Update the README entries for
ckanext.malmo.dwg_preview_download_timeout and
ckanext.malmo.dwg_preview_max_download_bytes to include their default values
from the implementation: DEFAULT_DOWNLOAD_TIMEOUT_SECONDS (30 seconds) and
DEFAULT_MAX_DOWNLOAD_BYTES (100 * 1024 * 1024 bytes). Locate the two bullet
items describing these config keys and append the default values and units
(e.g., "default: 30s" and "default: 100 MB") so the docs directly reflect the
constants DEFAULT_DOWNLOAD_TIMEOUT_SECONDS and DEFAULT_MAX_DOWNLOAD_BYTES.

In `@requirements.txt`:
- Around line 1-4: Update requirements.txt to pin package versions for security
and reproducibility: change the requests entry to "requests>=2.33.0,<3" and
PyMuPDF to "PyMuPDF>=1.26.7,<2", and also pin html2text and ezdxf to specific
stable ranges (for example a recent compatible release like
"html2text>=2020.1.16,<2024" and "ezdxf>=0.18.3,<1") so installs are
deterministic and avoid known vulnerabilities; ensure you replace the unpinned
package lines (html2text, requests, ezdxf, PyMuPDF) with these pinned specifiers
in requirements.txt.

In `@setup.py`:
- Around line 18-21: The install_requires list in setup.py currently lists
unpinned packages ('html2text', 'requests', 'ezdxf', 'PyMuPDF'); update the
install_requires entry to pin each dependency to a safe, reproducible version
range (e.g., using exact or range pins like >= and <) for 'html2text',
'requests', 'ezdxf', and 'PyMuPDF' to prevent pulling insecure or breaking
releases—choose vetted versions consistent with your CKAN compatibility and
mirror the locks used in requirements.txt or the verification script.

---

Nitpick comments:
In `@ckanext/malmo/views.py`:
- Around line 49-50: The Content-Disposition header uses raw payload["filename"]
which may contain quotes, CR/LF or non-ASCII chars; sanitize and normalize it
before use: strip/control-character and quote chars from payload["filename"],
produce an ASCII fallback via unicodedata.normalize('NFKD') and encoding
errors='ignore' (or a safe fallback like "preview"), percent-encode the original
UTF-8 name per RFC 5987 for filename* (e.g. "UTF-8''%..." form), and set
response.headers["Content-Disposition"] to include both a safe quoted filename
fallback and the RFC5987 filename* alternative (while keeping the existing
response creation code that uses payload["content"] and payload["mimetype"]).
Ensure these changes are applied where response is created and header is set
(the response variable and Content-Disposition assignment).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8be10b1-157c-4052-905c-b0f2b6288c40

📥 Commits

Reviewing files that changed from the base of the PR and between f4c6695 and 94fdcf2.

📒 Files selected for processing (8)
  • .gitignore
  • README.md
  • ckanext/malmo/dwg_preview.py
  • ckanext/malmo/logic/action.py
  • ckanext/malmo/plugin.py
  • ckanext/malmo/views.py
  • requirements.txt
  • setup.py

Comment thread ckanext/malmo/dwg_preview.py Outdated
Comment thread ckanext/malmo/dwg_preview.py Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread requirements.txt Outdated
Comment on lines +1 to +4
html2text
requests
ezdxf
PyMuPDF

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check latest versions from PyPI and scan for vulnerabilities

echo "=== Checking latest PyPI versions ==="
for pkg in html2text requests ezdxf PyMuPDF; do
  echo -n "$pkg: "
  curl -s "https://pypi.org/pypi/$pkg/json" | jq -r '.info.version'
done

echo -e "\n=== Scanning for security advisories ==="
for pkg in html2text requests ezdxf PyMuPDF; do
  echo "--- $pkg ---"
  gh api graphql -f query="
  {
    securityVulnerabilities(first: 5, ecosystem: PIP, package: \"$pkg\") {
      nodes {
        advisory {
          summary
          severity
          publishedAt
        }
        vulnerableVersionRange
        firstPatchedVersion {
          identifier
        }
      }
    }
  }" | jq '.data.securityVulnerabilities.nodes[] | select(.vulnerableVersionRange != null)'
done

Repository: datopian/ckanext-malmo

Length of output: 1955


Pin dependency versions in requirements.txt (security + reproducibility).

requirements.txt lines 1-4 are unpinned, so installs can pull in vulnerable/non-reproducible versions.

  • requests: security advisories are fixed starting at 2.33.0 → use requests>=2.33.0,<3
  • PyMuPDF: vulnerability fixed starting at 1.26.7 (range >=1.26.5,<1.26.7) → use PyMuPDF>=1.26.7,<2
  • Also pin html2text and ezdxf (no advisories returned by the scan, but unpinned versions still break reproducibility)
🧰 Tools
🪛 OSV Scanner (2.3.8)

[HIGH] 2-2: requests 2.9.2: undefined

(PYSEC-2018-28)


[HIGH] 2-2: requests 2.9.2: undefined

(PYSEC-2023-74)


[HIGH] 2-2: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs

(GHSA-9hjg-9r4m-mvj7)


[HIGH] 2-2: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False

(GHSA-9wx4-h78v-vm56)


[HIGH] 2-2: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function

(GHSA-gc5v-m9x4-r6x2)


[HIGH] 2-2: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests

(GHSA-j8r2-6x86-q33q)


[HIGH] 2-2: requests 2.9.2: Insufficiently Protected Credentials in Requests

(GHSA-x84v-xcm2-53pg)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 1 - 4, Update requirements.txt to pin package
versions for security and reproducibility: change the requests entry to
"requests>=2.33.0,<3" and PyMuPDF to "PyMuPDF>=1.26.7,<2", and also pin
html2text and ezdxf to specific stable ranges (for example a recent compatible
release like "html2text>=2020.1.16,<2024" and "ezdxf>=0.18.3,<1") so installs
are deterministic and avoid known vulnerabilities; ensure you replace the
unpinned package lines (html2text, requests, ezdxf, PyMuPDF) with these pinned
specifiers in requirements.txt.

Comment thread setup.py Outdated
Comment on lines +18 to +21
'html2text',
'requests',
'ezdxf',
'PyMuPDF',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin dependency versions in install_requires to ensure secure and reproducible installs.

The install_requires list lacks version constraints for all four dependencies, creating the same security and reproducibility risks flagged in requirements.txt:

  • Security: Unpinned requests can install versions with known CVEs (credentials leaks, session handling flaws).
  • Reproducibility: Different environments may resolve different versions.
  • Stability: Breaking changes in dependencies can silently break the extension.

Since setup.py is the authoritative source for package installation, version pins here are critical.

🔒 Recommended: Add version constraints
     install_requires=[
-        'html2text',
-        'requests',
-        'ezdxf',
-        'PyMuPDF',
+        'html2text>=2023.10.0,<2024',
+        'requests>=2.32.0,<3',
+        'ezdxf>=1.1.0,<2',
+        'PyMuPDF>=1.23.0,<2',
     ],

These are example constraints based on typical versions; verify the latest secure releases and compatibility with your CKAN environment before applying.

Note: The verification script in the requirements.txt review will help identify the latest secure versions for all four dependencies.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'html2text',
'requests',
'ezdxf',
'PyMuPDF',
'html2text>=2023.10.0,<2024',
'requests>=2.32.0,<3',
'ezdxf>=1.1.0,<2',
'PyMuPDF>=1.23.0,<2',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@setup.py` around lines 18 - 21, The install_requires list in setup.py
currently lists unpinned packages ('html2text', 'requests', 'ezdxf', 'PyMuPDF');
update the install_requires entry to pin each dependency to a safe, reproducible
version range (e.g., using exact or range pins like >= and <) for 'html2text',
'requests', 'ezdxf', and 'PyMuPDF' to prevent pulling insecure or breaking
releases—choose vetted versions consistent with your CKAN compatibility and
mirror the locks used in requirements.txt or the verification script.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ckanext/malmo/views.py (1)

49-51: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize filename in Content-Disposition header to prevent header injection.

The filename value is derived from user-controlled resource metadata (name or URL). If it contains quotes, newlines, or non-ASCII characters, it could cause header injection or parsing issues. RFC 6266 recommends escaping or using filename* for non-ASCII.

🔒 Proposed fix
+import re
+
+def _sanitize_filename(filename: str) -> str:
+    """Sanitize filename for Content-Disposition header."""
+    # Remove or replace problematic characters
+    sanitized = re.sub(r'[\r\n\x00]', '', filename)
+    sanitized = sanitized.replace('"', '\\"')
+    # Limit to ASCII printable for basic filename parameter
+    sanitized = ''.join(c if 32 <= ord(c) < 127 else '_' for c in sanitized)
+    return sanitized or 'preview.png'
+
 # In convert_dwg():
-    response.headers["Content-Disposition"] = f'inline; filename="{payload["filename"]}"'
+    safe_filename = _sanitize_filename(payload["filename"])
+    response.headers["Content-Disposition"] = f'inline; filename="{safe_filename}"'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/views.py` around lines 49 - 51, The Content-Disposition header
uses payload["filename"] directly which risks header injection and parsing
issues; sanitize and encode it before setting the header:
strip/control-character-clean the filename (remove CR/LF and other control
chars), replace or remove double quotes and backslashes, fall back to a safe
basename or generated safe name if empty, and percent-encode non-ASCII bytes to
set a RFC6266-compliant filename* value; then set
response.headers["Content-Disposition"] with a safe quoted filename and also
include filename* with the UTF-8 percent-encoded value so non-ASCII names are
handled safely (apply these changes around the response creation where response,
payload and Content-Disposition are used).
♻️ Duplicate comments (2)
requirements.txt (1)

1-5: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pin all dependency versions to prevent security vulnerabilities and ensure reproducibility.

This concern was already raised in previous reviews. The unpinned dependencies expose the installation to known vulnerabilities:

  • Pillow (line 5): Static analysis flags 9 CRITICAL CVEs including arbitrary code execution (GHSA-3f63-hfp8-52jq), buffer overflow (GHSA-44wm-f244-xhp3), and DoS attacks
  • requests (line 2): Known security advisories for credential leaks and session handling flaws
  • All dependencies: Unpinned versions break reproducibility across environments

Recommend pinning to secure versions with upper bounds, e.g., Pillow>=10.0.0,<11 and requests>=2.32.0,<3.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 1 - 5, The requirements.txt currently lists
unpinned packages (html2text, requests, ezdxf, matplotlib, Pillow) which risks
security issues and non-reproducible installs; update each entry to use a pinned
safe minimum plus an upper bound (for example use Pillow>=10.0.0,<11 and
requests>=2.32.0,<3) and choose vetted versions for html2text, ezdxf and
matplotlib similarly (e.g., >=stable_version,<next_major) so every package line
in requirements.txt is version-pinned with an upper bound to lock dependencies
and mitigate CVEs.
setup.py (1)

17-23: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pin all install_requires versions to ensure secure and reproducible installs.

This concern was already raised in previous reviews. The install_requires list lacks version constraints, creating the same security and reproducibility risks flagged in requirements.txt. Since setup.py is the authoritative source for package installation, version pins here are critical.

Apply version constraints matching the secure ranges identified for requirements.txt.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@setup.py` around lines 17 - 23, The install_requires list in setup.py
currently lacks version pins; update the install_requires array to include the
same constrained versions/ranges used in requirements.txt (e.g., pin html2text,
requests, ezdxf, matplotlib, Pillow to the approved secure versions), ensuring
each package in install_requires has the matching version specifier (== or >=,<=
as used in requirements.txt) so setup.py becomes the authoritative, reproducible
source for installs.
🧹 Nitpick comments (7)
ckanext/malmo/dwg_preview/service.py (1)

94-99: 💤 Low value

Add exception chaining for better debugging.

The re-raised ValidationError loses the original exception context, which can complicate debugging. While the user-facing message is intentionally simplified, preserving the chain helps maintainers trace issues.

♻️ Proposed fix
     try:
         return toolkit.get_action("resource_show")(context, {"id": resource_id})
     except NotFound:
-        raise ValidationError({"id": ["Resource does not exist"]})
+        raise ValidationError({"id": ["Resource does not exist"]}) from None
     except NotAuthorized:
-        raise ValidationError({"id": ["User cannot view this resource"]})
+        raise ValidationError({"id": ["User cannot view this resource"]}) from None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/service.py` around lines 94 - 99, The except blocks
around toolkit.get_action("resource_show")(context, {"id": resource_id}) should
preserve the original exception context: catch NotFound and NotAuthorized as e
(e.g., except NotFound as e:) and re-raise the ValidationError using exception
chaining (raise ValidationError({"id": ["Resource does not exist"]}) from e and
raise ValidationError({"id": ["User cannot view this resource"]}) from e) so the
original traceback is retained for debugging while keeping the simplified
user-facing messages.

Source: Linters/SAST tools

ckanext/malmo/dwg_preview/oda.py (1)

87-100: ⚡ Quick win

Consider adding exception chaining for better debugging.

The TimeoutExpired and OSError exception handlers raise ValidationError without preserving the exception chain. Adding from err would improve debugging by preserving the full stack trace.

♻️ Proposed fix
     except subprocess.TimeoutExpired:
-        raise ValidationError({"conversion": [f"Conversion exceeded the timeout of {timeout} seconds"]})
+        raise ValidationError({"conversion": [f"Conversion exceeded the timeout of {timeout} seconds"]}) from None
     except OSError as err:
-        raise ValidationError({"conversion": [f"Conversion process failed to start: {err}"]})
+        raise ValidationError({"conversion": [f"Conversion process failed to start: {err}"]}) from err
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/oda.py` around lines 87 - 100, The except blocks in
_run_subprocess swallow the original exceptions; modify them to use exception
chaining so the original trace is preserved: when catching
subprocess.TimeoutExpired (the variable e or similar) re-raise the
ValidationError using "from" (e.g., raise ValidationError({...}) from e) and do
the same for the OSError handler (raise ValidationError({...}) from err) so both
TimeoutExpired and OSError are chained into the new ValidationError.
ckanext/malmo/dwg_preview/render.py (5)

308-342: ⚡ Quick win

Consider adding exception chaining for better debugging.

Line 312 raises ValidationError without preserving the exception chain. Adding from err would improve debugging.

♻️ Proposed fix
     except ImportError as err:
-        raise ValidationError({"converter": [f"Image validation dependency is not installed: {err}"]})
+        raise ValidationError({"converter": [f"Image validation dependency is not installed: {err}"]}) from err
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/render.py` around lines 308 - 342, In
_measure_rendered_preview, when catching ImportError as err you should re-raise
the ValidationError using exception chaining so the original ImportError is
preserved for debugging; locate the except ImportError as err block and raise
ValidationError({"converter": [f"Image validation dependency is not installed:
{err}"]}) from err to maintain the exception chain and aid troubleshooting.

56-81: ⚡ Quick win

Consider adding exception chaining for better debugging.

Lines 61 and 73 raise ValidationError without preserving the exception chain. Adding from err would improve debugging.

♻️ Proposed fix
     except ImportError as err:
-        raise ValidationError({"converter": [f"DXF renderer dependency is not installed: {err}"]})
+        raise ValidationError({"converter": [f"DXF renderer dependency is not installed: {err}"]}) from err
 
     try:
         document = ezdxf.readfile(dxf_path)
         log.info("Loaded DXF document path=%s using fast read path", dxf_path)
         return document
     except Exception as fast_err:
         log.warning("Fast DXF load failed for %s, retrying recovery path: %s", dxf_path, fast_err)
 
     try:
         document, auditor = recover.readfile(dxf_path)
     except Exception as err:
-        raise ValidationError({"conversion": [f"Generated DXF could not be parsed: {err}"]})
+        raise ValidationError({"conversion": [f"Generated DXF could not be parsed: {err}"]}) from err
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/render.py` around lines 56 - 81, In
_load_dxf_document, the ValidationError raises lose the original exceptions'
context; update the two raise statements that handle ImportError (caught as err)
and the recover.readfile Exception (caught as err) to use exception chaining
(raise ValidationError(...) from err) so the original traceback is preserved for
debugging; locate these raises in the _load_dxf_document function where
ImportError is caught and where recover.readfile failure is handled and add
"from err" to each raise.

135-221: ⚡ Quick win

Consider adding exception chaining for better debugging.

Lines 151 and 219 raise ValidationError without preserving the exception chain. Adding from err would improve debugging.

♻️ Proposed fix
     except ImportError as err:
-        raise ValidationError({"converter": [f"PNG rendering dependency is not installed: {err}"]})
+        raise ValidationError({"converter": [f"PNG rendering dependency is not installed: {err}"]}) from err
 
     # ... rendering logic ...
     
     except ValidationError:
         raise
     except Exception as err:
-        raise ValidationError({"conversion": [f'DXF raster rendering failed for layout "{layout_name}": {err}']})
+        raise ValidationError({"conversion": [f'DXF raster rendering failed for layout "{layout_name}": {err}']}) from err
     finally:
         plt.close(figure)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/render.py` around lines 135 - 221, The except
blocks in _render_layout catch errors into err but re-raise ValidationError
without exception chaining; update the two raises inside the ImportError handler
(the one that currently raises ValidationError({"converter": [...]}) ) and the
final generic except Exception as err (the one that raises
ValidationError({"conversion": [...]}) ) to include "from err" so the original
exception is preserved when raising ValidationError.

321-321: ⚖️ Poor tradeoff

Hardcoded pixel threshold may cause incorrect coverage for certain drawing styles.

The threshold < 245 on line 321 determines which pixels count as "occupied" versus "empty" when measuring preview coverage. This hardcoded value assumes white/near-white backgrounds and dark content, but may produce incorrect coverage measurements for:

  • Drawings with light gray backgrounds
  • Drawings with very light line colors (e.g., light blue, light gray)
  • Drawings rendered with non-standard color schemes

This could cause valid previews to be incorrectly rejected as "too sparse" or vice versa.

Consider making this threshold configurable via PreviewConfig if users report coverage detection issues with their specific DWG files. For now, the current value is reasonable for typical black-on-white CAD drawings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/render.py` at line 321, Replace the hardcoded pixel
threshold check "if pixels[x, y] < 245:" with a configurable threshold pulled
from the preview configuration (e.g., PreviewConfig.coverage_threshold or
preview_config.coverage_threshold) while keeping 245 as the default; update the
code that computes coverage to reference that config value (replace uses of the
literal 245 with the config symbol) and ensure PreviewConfig is extended to
expose a sensible default and be passed into the function that reads "pixels" so
callers can override the threshold when needed.

240-247: 💤 Low value

Consider logging skipped entities for better observability.

The try-except-continue block on lines 245-246 silently skips entities that don't have a dxftype() method. While this defensive handling prevents crashes, logging these exceptions at debug level would help diagnose issues with malformed DXF files.

♻️ Proposed improvement
 def _iter_entity_types(layout: Any) -> list[str]:
     types: list[str] = []
     for entity in layout:
         try:
             types.append(str(entity.dxftype()).upper())
         except Exception:
+            log.debug("Skipping entity without dxftype in layout iteration")
             continue
     return types
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/render.py` around lines 240 - 247, The function
_iter_entity_types silently skips entities that raise in entity.dxftype();
update it to log skipped entities at debug level so malformed DXF data is
observable: catch the exception as e, obtain a module logger (e.g.,
logging.getLogger(__name__) if not already present), and call logger.debug with
the exception and a brief identifier for the problematic entity (e.g.,
repr(entity) or type(entity)) before continuing; keep the existing behavior of
appending valid dxftype() values to types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ckanext/malmo/dwg_preview/config.py`:
- Around line 78-87: The image dimension config lacks an upper bound and can be
set to extremely large values; update the two calls to _get_int that set
image_width (using DEFAULT_IMAGE_WIDTH) and image_height (using
DEFAULT_IMAGE_HEIGHT) to pass maximum=8192, and modify the _get_int function
implementation/signature to accept and validate a maximum parameter (raising or
clipping on violation) so matplotlib usage in render.py (figure.figsize) cannot
receive unbounded sizes.
- Around line 94-123: The listed coverage/ratio configs (min_content_coverage,
max_initial_coverage, min_occupied_width_ratio, min_occupied_height_ratio) must
be validated to be ≤ 1.0; update the calls in config initialization to pass
maximum=1.0 for those _get_float invocations and modify the _get_float function
signature/implementation to accept a maximum parameter and raise a clear
configuration error when the parsed float > maximum (in addition to existing
minimum checks) so invalid proportions fail fast with a descriptive message.

---

Outside diff comments:
In `@ckanext/malmo/views.py`:
- Around line 49-51: The Content-Disposition header uses payload["filename"]
directly which risks header injection and parsing issues; sanitize and encode it
before setting the header: strip/control-character-clean the filename (remove
CR/LF and other control chars), replace or remove double quotes and backslashes,
fall back to a safe basename or generated safe name if empty, and percent-encode
non-ASCII bytes to set a RFC6266-compliant filename* value; then set
response.headers["Content-Disposition"] with a safe quoted filename and also
include filename* with the UTF-8 percent-encoded value so non-ASCII names are
handled safely (apply these changes around the response creation where response,
payload and Content-Disposition are used).

---

Duplicate comments:
In `@requirements.txt`:
- Around line 1-5: The requirements.txt currently lists unpinned packages
(html2text, requests, ezdxf, matplotlib, Pillow) which risks security issues and
non-reproducible installs; update each entry to use a pinned safe minimum plus
an upper bound (for example use Pillow>=10.0.0,<11 and requests>=2.32.0,<3) and
choose vetted versions for html2text, ezdxf and matplotlib similarly (e.g.,
>=stable_version,<next_major) so every package line in requirements.txt is
version-pinned with an upper bound to lock dependencies and mitigate CVEs.

In `@setup.py`:
- Around line 17-23: The install_requires list in setup.py currently lacks
version pins; update the install_requires array to include the same constrained
versions/ranges used in requirements.txt (e.g., pin html2text, requests, ezdxf,
matplotlib, Pillow to the approved secure versions), ensuring each package in
install_requires has the matching version specifier (== or >=,<= as used in
requirements.txt) so setup.py becomes the authoritative, reproducible source for
installs.

---

Nitpick comments:
In `@ckanext/malmo/dwg_preview/oda.py`:
- Around line 87-100: The except blocks in _run_subprocess swallow the original
exceptions; modify them to use exception chaining so the original trace is
preserved: when catching subprocess.TimeoutExpired (the variable e or similar)
re-raise the ValidationError using "from" (e.g., raise ValidationError({...})
from e) and do the same for the OSError handler (raise ValidationError({...})
from err) so both TimeoutExpired and OSError are chained into the new
ValidationError.

In `@ckanext/malmo/dwg_preview/render.py`:
- Around line 308-342: In _measure_rendered_preview, when catching ImportError
as err you should re-raise the ValidationError using exception chaining so the
original ImportError is preserved for debugging; locate the except ImportError
as err block and raise ValidationError({"converter": [f"Image validation
dependency is not installed: {err}"]}) from err to maintain the exception chain
and aid troubleshooting.
- Around line 56-81: In _load_dxf_document, the ValidationError raises lose the
original exceptions' context; update the two raise statements that handle
ImportError (caught as err) and the recover.readfile Exception (caught as err)
to use exception chaining (raise ValidationError(...) from err) so the original
traceback is preserved for debugging; locate these raises in the
_load_dxf_document function where ImportError is caught and where
recover.readfile failure is handled and add "from err" to each raise.
- Around line 135-221: The except blocks in _render_layout catch errors into err
but re-raise ValidationError without exception chaining; update the two raises
inside the ImportError handler (the one that currently raises
ValidationError({"converter": [...]}) ) and the final generic except Exception
as err (the one that raises ValidationError({"conversion": [...]}) ) to include
"from err" so the original exception is preserved when raising ValidationError.
- Line 321: Replace the hardcoded pixel threshold check "if pixels[x, y] < 245:"
with a configurable threshold pulled from the preview configuration (e.g.,
PreviewConfig.coverage_threshold or preview_config.coverage_threshold) while
keeping 245 as the default; update the code that computes coverage to reference
that config value (replace uses of the literal 245 with the config symbol) and
ensure PreviewConfig is extended to expose a sensible default and be passed into
the function that reads "pixels" so callers can override the threshold when
needed.
- Around line 240-247: The function _iter_entity_types silently skips entities
that raise in entity.dxftype(); update it to log skipped entities at debug level
so malformed DXF data is observable: catch the exception as e, obtain a module
logger (e.g., logging.getLogger(__name__) if not already present), and call
logger.debug with the exception and a brief identifier for the problematic
entity (e.g., repr(entity) or type(entity)) before continuing; keep the existing
behavior of appending valid dxftype() values to types.

In `@ckanext/malmo/dwg_preview/service.py`:
- Around line 94-99: The except blocks around
toolkit.get_action("resource_show")(context, {"id": resource_id}) should
preserve the original exception context: catch NotFound and NotAuthorized as e
(e.g., except NotFound as e:) and re-raise the ValidationError using exception
chaining (raise ValidationError({"id": ["Resource does not exist"]}) from e and
raise ValidationError({"id": ["User cannot view this resource"]}) from e) so the
original traceback is retained for debugging while keeping the simplified
user-facing messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d8cfd43-20b0-45d4-be6f-7f37acbfbcd6

📥 Commits

Reviewing files that changed from the base of the PR and between 94fdcf2 and 8612c92.

📒 Files selected for processing (12)
  • README.md
  • ckanext/malmo/dwg_preview/__init__.py
  • ckanext/malmo/dwg_preview/cache.py
  • ckanext/malmo/dwg_preview/config.py
  • ckanext/malmo/dwg_preview/oda.py
  • ckanext/malmo/dwg_preview/render.py
  • ckanext/malmo/dwg_preview/service.py
  • ckanext/malmo/logic/action.py
  • ckanext/malmo/plugin.py
  • ckanext/malmo/views.py
  • requirements.txt
  • setup.py
✅ Files skipped from review due to trivial changes (2)
  • ckanext/malmo/dwg_preview/init.py
  • README.md

Comment on lines +78 to +87
image_width=_get_int(
"ckanext.malmo.dwg_preview_image_width",
DEFAULT_IMAGE_WIDTH,
minimum=256,
),
image_height=_get_int(
"ckanext.malmo.dwg_preview_image_height",
DEFAULT_IMAGE_HEIGHT,
minimum=256,
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add maximum bounds to image dimension config to prevent memory exhaustion.

The image_width and image_height config values are passed directly to matplotlib's figure.figsize (render.py line 152) with only minimum validation. Maliciously or mistakenly configured large values (e.g., 100000x100000) could cause out-of-memory crashes.

Add maximum validation (e.g., maximum=8192) to both _get_int calls to cap resource usage at reasonable limits.

🛡️ Proposed fix to add maximum bounds
         image_width=_get_int(
             "ckanext.malmo.dwg_preview_image_width",
             DEFAULT_IMAGE_WIDTH,
             minimum=256,
+            maximum=8192,
         ),
         image_height=_get_int(
             "ckanext.malmo.dwg_preview_image_height",
             DEFAULT_IMAGE_HEIGHT,
             minimum=256,
+            maximum=8192,
         ),

Update _get_int signature and implementation to accept and enforce the maximum parameter.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/config.py` around lines 78 - 87, The image
dimension config lacks an upper bound and can be set to extremely large values;
update the two calls to _get_int that set image_width (using
DEFAULT_IMAGE_WIDTH) and image_height (using DEFAULT_IMAGE_HEIGHT) to pass
maximum=8192, and modify the _get_int function implementation/signature to
accept and validate a maximum parameter (raising or clipping on violation) so
matplotlib usage in render.py (figure.figsize) cannot receive unbounded sizes.

Comment on lines +94 to +123
min_content_coverage=_get_float(
"ckanext.malmo.dwg_preview_min_content_coverage",
DEFAULT_MIN_CONTENT_COVERAGE,
minimum=0.00001,
),
max_initial_coverage=_get_float(
"ckanext.malmo.dwg_preview_max_initial_coverage",
DEFAULT_MAX_INITIAL_COVERAGE,
minimum=0.001,
),
retry_render_margin=_get_float(
"ckanext.malmo.dwg_preview_retry_render_margin",
DEFAULT_RETRY_RENDER_MARGIN,
minimum=0.0,
),
lineweight_scaling=_get_float(
"ckanext.malmo.dwg_preview_lineweight_scaling",
DEFAULT_LINEWEIGHT_SCALING,
minimum=0.1,
),
min_occupied_width_ratio=_get_float(
"ckanext.malmo.dwg_preview_min_occupied_width_ratio",
DEFAULT_MIN_OCCUPIED_WIDTH_RATIO,
minimum=0.0,
),
min_occupied_height_ratio=_get_float(
"ckanext.malmo.dwg_preview_min_occupied_height_ratio",
DEFAULT_MIN_OCCUPIED_HEIGHT_RATIO,
minimum=0.0,
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate that coverage and ratio config values are ≤ 1.0.

Coverage and ratio fields represent proportions and should not exceed 1.0:

  • min_content_coverage (line 94-98)
  • max_initial_coverage (line 99-103)
  • min_occupied_width_ratio (line 114-118)
  • min_occupied_height_ratio (line 119-123)

Values > 1.0 would cause confusing validation failures downstream. Add maximum validation to fail fast with clear config errors.

✅ Proposed fix to validate upper bounds
         min_content_coverage=_get_float(
             "ckanext.malmo.dwg_preview_min_content_coverage",
             DEFAULT_MIN_CONTENT_COVERAGE,
             minimum=0.00001,
+            maximum=1.0,
         ),
         max_initial_coverage=_get_float(
             "ckanext.malmo.dwg_preview_max_initial_coverage",
             DEFAULT_MAX_INITIAL_COVERAGE,
             minimum=0.001,
+            maximum=1.0,
         ),
         # ... similar for retry_render_margin if it's also a ratio ...
         min_occupied_width_ratio=_get_float(
             "ckanext.malmo.dwg_preview_min_occupied_width_ratio",
             DEFAULT_MIN_OCCUPIED_WIDTH_RATIO,
             minimum=0.0,
+            maximum=1.0,
         ),
         min_occupied_height_ratio=_get_float(
             "ckanext.malmo.dwg_preview_min_occupied_height_ratio",
             DEFAULT_MIN_OCCUPIED_HEIGHT_RATIO,
             minimum=0.0,
+            maximum=1.0,
         ),

Update _get_float signature and implementation to accept and enforce the maximum parameter.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ckanext/malmo/dwg_preview/config.py` around lines 94 - 123, The listed
coverage/ratio configs (min_content_coverage, max_initial_coverage,
min_occupied_width_ratio, min_occupied_height_ratio) must be validated to be ≤
1.0; update the calls in config initialization to pass maximum=1.0 for those
_get_float invocations and modify the _get_float function
signature/implementation to accept a maximum parameter and raise a clear
configuration error when the parsed float > maximum (in addition to existing
minimum checks) so invalid proportions fail fast with a descriptive message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants