Implement DWG converter endpoint#1
Conversation
📝 WalkthroughWalkthroughThis PR adds a complete DWG-to-PNG preview feature to the Malmö CKAN extension, enabling dynamic generation and caching of preview images from uploaded or external DWG files via external ODA converter and Python rendering libraries. ChangesDWG Preview PNG Conversion
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
ckanext/malmo/views.py (1)
49-50: ⚡ Quick winEscape the preview filename before putting it in
Content-Disposition.
payload["filename"]comes from resource metadata, so quotes, CR/LF, or non-ASCII characters can generate a broken header here. Normalizing the fallback filename and adding an RFC 5987filename*value avoids malformed responses for perfectly valid resource names.Suggested fix
+from urllib.parse import quote + @@ def dwg_preview_convert() -> flask.Response: @@ response = flask.Response(payload["content"], mimetype=payload["mimetype"]) - response.headers["Content-Disposition"] = f'inline; filename="{payload["filename"]}"' + response.headers["Content-Disposition"] = _build_content_disposition(payload["filename"]) response.headers["Cache-Control"] = "private, no-store, max-age=0" return response @@ def _validation_error_response(error: ValidationError) -> tuple[flask.Response, int]: @@ ) + + +def _build_content_disposition(filename: str) -> str: + fallback = ( + filename.replace("\\", "_") + .replace('"', "_") + .replace("\r", "_") + .replace("\n", "_") + ) + return f'inline; filename="{fallback}"; filename*=UTF-8\'\'{quote(filename)}'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/views.py` around lines 49 - 50, The Content-Disposition header uses raw payload["filename"] which may contain quotes, CR/LF or non-ASCII chars; sanitize and normalize it before use: strip/control-character and quote chars from payload["filename"], produce an ASCII fallback via unicodedata.normalize('NFKD') and encoding errors='ignore' (or a safe fallback like "preview"), percent-encode the original UTF-8 name per RFC 5987 for filename* (e.g. "UTF-8''%..." form), and set response.headers["Content-Disposition"] to include both a safe quoted filename fallback and the RFC5987 filename* alternative (while keeping the existing response creation code that uses payload["content"] and payload["mimetype"]). Ensure these changes are applied where response is created and header is set (the response variable and Content-Disposition assignment).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ckanext/malmo/dwg_preview.py`:
- Around line 138-148: resource["url"] is passed directly into _download_to_path
which allows SSRF via redirects or hostnames resolving to
loopback/RFC1918/metadata addresses; add server-side destination validation
before calling _download_to_path and enforce it for every redirect hop by either
(a) resolving the URL hostname(s) with socket.getaddrinfo/ipaddress and
rejecting addresses in loopback/private/link-local/multicast/reserved ranges, or
(b) disabling automatic redirects and validating the Location header host for
each redirect then resolving and checking its IPs before following; implement
the check as a reusable helper (e.g., validate_download_target(url) used where
resource_url is read) and update _download_to_path or its requests usage to
ensure per-redirect validation is applied.
- Line 141: The current log call logs raw external URLs (log.info("Downloading
external DWG resource=%s url=%s", resource.get("id"), resource_url)) which can
leak presigned query strings or credentials; change it to avoid logging the full
resource_url by either logging only resource.get("id") or a redacted URL (strip
query string and any userinfo/credentials and optionally keep only host/path)
before passing to log.info; implement a small helper (e.g.,
redact_url(resource_url) or redacted_resource_url variable) that uses
urllib.parse to remove .query and .username/.password and replace the existing
log.info call to use the redacted value or just the resource id.
In `@README.md`:
- Line 3: Replace the garbled "Malm?" string in the README text ("Customizations
for the City of Malm? CKAN instance.") with the correct "Malmö" and ensure the
README.md is saved with UTF-8 encoding (or re-encode the file to UTF-8) so the
"ö" character is preserved; update the literal in README.md and verify the
repo/file encoding settings to prevent future character corruption.
- Around line 43-47: Update the README entries for
ckanext.malmo.dwg_preview_download_timeout and
ckanext.malmo.dwg_preview_max_download_bytes to include their default values
from the implementation: DEFAULT_DOWNLOAD_TIMEOUT_SECONDS (30 seconds) and
DEFAULT_MAX_DOWNLOAD_BYTES (100 * 1024 * 1024 bytes). Locate the two bullet
items describing these config keys and append the default values and units
(e.g., "default: 30s" and "default: 100 MB") so the docs directly reflect the
constants DEFAULT_DOWNLOAD_TIMEOUT_SECONDS and DEFAULT_MAX_DOWNLOAD_BYTES.
In `@requirements.txt`:
- Around line 1-4: Update requirements.txt to pin package versions for security
and reproducibility: change the requests entry to "requests>=2.33.0,<3" and
PyMuPDF to "PyMuPDF>=1.26.7,<2", and also pin html2text and ezdxf to specific
stable ranges (for example a recent compatible release like
"html2text>=2020.1.16,<2024" and "ezdxf>=0.18.3,<1") so installs are
deterministic and avoid known vulnerabilities; ensure you replace the unpinned
package lines (html2text, requests, ezdxf, PyMuPDF) with these pinned specifiers
in requirements.txt.
In `@setup.py`:
- Around line 18-21: The install_requires list in setup.py currently lists
unpinned packages ('html2text', 'requests', 'ezdxf', 'PyMuPDF'); update the
install_requires entry to pin each dependency to a safe, reproducible version
range (e.g., using exact or range pins like >= and <) for 'html2text',
'requests', 'ezdxf', and 'PyMuPDF' to prevent pulling insecure or breaking
releases—choose vetted versions consistent with your CKAN compatibility and
mirror the locks used in requirements.txt or the verification script.
---
Nitpick comments:
In `@ckanext/malmo/views.py`:
- Around line 49-50: The Content-Disposition header uses raw payload["filename"]
which may contain quotes, CR/LF or non-ASCII chars; sanitize and normalize it
before use: strip/control-character and quote chars from payload["filename"],
produce an ASCII fallback via unicodedata.normalize('NFKD') and encoding
errors='ignore' (or a safe fallback like "preview"), percent-encode the original
UTF-8 name per RFC 5987 for filename* (e.g. "UTF-8''%..." form), and set
response.headers["Content-Disposition"] to include both a safe quoted filename
fallback and the RFC5987 filename* alternative (while keeping the existing
response creation code that uses payload["content"] and payload["mimetype"]).
Ensure these changes are applied where response is created and header is set
(the response variable and Content-Disposition assignment).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d8be10b1-157c-4052-905c-b0f2b6288c40
📒 Files selected for processing (8)
.gitignoreREADME.mdckanext/malmo/dwg_preview.pyckanext/malmo/logic/action.pyckanext/malmo/plugin.pyckanext/malmo/views.pyrequirements.txtsetup.py
| html2text | ||
| requests | ||
| ezdxf | ||
| PyMuPDF |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check latest versions from PyPI and scan for vulnerabilities
echo "=== Checking latest PyPI versions ==="
for pkg in html2text requests ezdxf PyMuPDF; do
echo -n "$pkg: "
curl -s "https://pypi.org/pypi/$pkg/json" | jq -r '.info.version'
done
echo -e "\n=== Scanning for security advisories ==="
for pkg in html2text requests ezdxf PyMuPDF; do
echo "--- $pkg ---"
gh api graphql -f query="
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: \"$pkg\") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}" | jq '.data.securityVulnerabilities.nodes[] | select(.vulnerableVersionRange != null)'
doneRepository: datopian/ckanext-malmo
Length of output: 1955
Pin dependency versions in requirements.txt (security + reproducibility).
requirements.txt lines 1-4 are unpinned, so installs can pull in vulnerable/non-reproducible versions.
requests: security advisories are fixed starting at 2.33.0 → userequests>=2.33.0,<3PyMuPDF: vulnerability fixed starting at 1.26.7 (range>=1.26.5,<1.26.7) → usePyMuPDF>=1.26.7,<2- Also pin
html2textandezdxf(no advisories returned by the scan, but unpinned versions still break reproducibility)
🧰 Tools
🪛 OSV Scanner (2.3.8)
[HIGH] 2-2: requests 2.9.2: undefined
(PYSEC-2018-28)
[HIGH] 2-2: requests 2.9.2: undefined
(PYSEC-2023-74)
[HIGH] 2-2: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs
[HIGH] 2-2: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False
[HIGH] 2-2: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function
[HIGH] 2-2: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests
[HIGH] 2-2: requests 2.9.2: Insufficiently Protected Credentials in Requests
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@requirements.txt` around lines 1 - 4, Update requirements.txt to pin package
versions for security and reproducibility: change the requests entry to
"requests>=2.33.0,<3" and PyMuPDF to "PyMuPDF>=1.26.7,<2", and also pin
html2text and ezdxf to specific stable ranges (for example a recent compatible
release like "html2text>=2020.1.16,<2024" and "ezdxf>=0.18.3,<1") so installs
are deterministic and avoid known vulnerabilities; ensure you replace the
unpinned package lines (html2text, requests, ezdxf, PyMuPDF) with these pinned
specifiers in requirements.txt.
| 'html2text', | ||
| 'requests', | ||
| 'ezdxf', | ||
| 'PyMuPDF', |
There was a problem hiding this comment.
Pin dependency versions in install_requires to ensure secure and reproducible installs.
The install_requires list lacks version constraints for all four dependencies, creating the same security and reproducibility risks flagged in requirements.txt:
- Security: Unpinned
requestscan install versions with known CVEs (credentials leaks, session handling flaws). - Reproducibility: Different environments may resolve different versions.
- Stability: Breaking changes in dependencies can silently break the extension.
Since setup.py is the authoritative source for package installation, version pins here are critical.
🔒 Recommended: Add version constraints
install_requires=[
- 'html2text',
- 'requests',
- 'ezdxf',
- 'PyMuPDF',
+ 'html2text>=2023.10.0,<2024',
+ 'requests>=2.32.0,<3',
+ 'ezdxf>=1.1.0,<2',
+ 'PyMuPDF>=1.23.0,<2',
],These are example constraints based on typical versions; verify the latest secure releases and compatibility with your CKAN environment before applying.
Note: The verification script in the requirements.txt review will help identify the latest secure versions for all four dependencies.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 'html2text', | |
| 'requests', | |
| 'ezdxf', | |
| 'PyMuPDF', | |
| 'html2text>=2023.10.0,<2024', | |
| 'requests>=2.32.0,<3', | |
| 'ezdxf>=1.1.0,<2', | |
| 'PyMuPDF>=1.23.0,<2', |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@setup.py` around lines 18 - 21, The install_requires list in setup.py
currently lists unpinned packages ('html2text', 'requests', 'ezdxf', 'PyMuPDF');
update the install_requires entry to pin each dependency to a safe, reproducible
version range (e.g., using exact or range pins like >= and <) for 'html2text',
'requests', 'ezdxf', and 'PyMuPDF' to prevent pulling insecure or breaking
releases—choose vetted versions consistent with your CKAN compatibility and
mirror the locks used in requirements.txt or the verification script.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ckanext/malmo/views.py (1)
49-51:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSanitize filename in Content-Disposition header to prevent header injection.
The
filenamevalue is derived from user-controlled resource metadata (name or URL). If it contains quotes, newlines, or non-ASCII characters, it could cause header injection or parsing issues. RFC 6266 recommends escaping or usingfilename*for non-ASCII.🔒 Proposed fix
+import re + +def _sanitize_filename(filename: str) -> str: + """Sanitize filename for Content-Disposition header.""" + # Remove or replace problematic characters + sanitized = re.sub(r'[\r\n\x00]', '', filename) + sanitized = sanitized.replace('"', '\\"') + # Limit to ASCII printable for basic filename parameter + sanitized = ''.join(c if 32 <= ord(c) < 127 else '_' for c in sanitized) + return sanitized or 'preview.png' + # In convert_dwg(): - response.headers["Content-Disposition"] = f'inline; filename="{payload["filename"]}"' + safe_filename = _sanitize_filename(payload["filename"]) + response.headers["Content-Disposition"] = f'inline; filename="{safe_filename}"'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/views.py` around lines 49 - 51, The Content-Disposition header uses payload["filename"] directly which risks header injection and parsing issues; sanitize and encode it before setting the header: strip/control-character-clean the filename (remove CR/LF and other control chars), replace or remove double quotes and backslashes, fall back to a safe basename or generated safe name if empty, and percent-encode non-ASCII bytes to set a RFC6266-compliant filename* value; then set response.headers["Content-Disposition"] with a safe quoted filename and also include filename* with the UTF-8 percent-encoded value so non-ASCII names are handled safely (apply these changes around the response creation where response, payload and Content-Disposition are used).
♻️ Duplicate comments (2)
requirements.txt (1)
1-5:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPin all dependency versions to prevent security vulnerabilities and ensure reproducibility.
This concern was already raised in previous reviews. The unpinned dependencies expose the installation to known vulnerabilities:
- Pillow (line 5): Static analysis flags 9 CRITICAL CVEs including arbitrary code execution (GHSA-3f63-hfp8-52jq), buffer overflow (GHSA-44wm-f244-xhp3), and DoS attacks
- requests (line 2): Known security advisories for credential leaks and session handling flaws
- All dependencies: Unpinned versions break reproducibility across environments
Recommend pinning to secure versions with upper bounds, e.g.,
Pillow>=10.0.0,<11andrequests>=2.32.0,<3.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` around lines 1 - 5, The requirements.txt currently lists unpinned packages (html2text, requests, ezdxf, matplotlib, Pillow) which risks security issues and non-reproducible installs; update each entry to use a pinned safe minimum plus an upper bound (for example use Pillow>=10.0.0,<11 and requests>=2.32.0,<3) and choose vetted versions for html2text, ezdxf and matplotlib similarly (e.g., >=stable_version,<next_major) so every package line in requirements.txt is version-pinned with an upper bound to lock dependencies and mitigate CVEs.setup.py (1)
17-23:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPin all
install_requiresversions to ensure secure and reproducible installs.This concern was already raised in previous reviews. The
install_requireslist lacks version constraints, creating the same security and reproducibility risks flagged inrequirements.txt. Sincesetup.pyis the authoritative source for package installation, version pins here are critical.Apply version constraints matching the secure ranges identified for
requirements.txt.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@setup.py` around lines 17 - 23, The install_requires list in setup.py currently lacks version pins; update the install_requires array to include the same constrained versions/ranges used in requirements.txt (e.g., pin html2text, requests, ezdxf, matplotlib, Pillow to the approved secure versions), ensuring each package in install_requires has the matching version specifier (== or >=,<= as used in requirements.txt) so setup.py becomes the authoritative, reproducible source for installs.
🧹 Nitpick comments (7)
ckanext/malmo/dwg_preview/service.py (1)
94-99: 💤 Low valueAdd exception chaining for better debugging.
The re-raised
ValidationErrorloses the original exception context, which can complicate debugging. While the user-facing message is intentionally simplified, preserving the chain helps maintainers trace issues.♻️ Proposed fix
try: return toolkit.get_action("resource_show")(context, {"id": resource_id}) except NotFound: - raise ValidationError({"id": ["Resource does not exist"]}) + raise ValidationError({"id": ["Resource does not exist"]}) from None except NotAuthorized: - raise ValidationError({"id": ["User cannot view this resource"]}) + raise ValidationError({"id": ["User cannot view this resource"]}) from None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/service.py` around lines 94 - 99, The except blocks around toolkit.get_action("resource_show")(context, {"id": resource_id}) should preserve the original exception context: catch NotFound and NotAuthorized as e (e.g., except NotFound as e:) and re-raise the ValidationError using exception chaining (raise ValidationError({"id": ["Resource does not exist"]}) from e and raise ValidationError({"id": ["User cannot view this resource"]}) from e) so the original traceback is retained for debugging while keeping the simplified user-facing messages.Source: Linters/SAST tools
ckanext/malmo/dwg_preview/oda.py (1)
87-100: ⚡ Quick winConsider adding exception chaining for better debugging.
The
TimeoutExpiredandOSErrorexception handlers raiseValidationErrorwithout preserving the exception chain. Addingfrom errwould improve debugging by preserving the full stack trace.♻️ Proposed fix
except subprocess.TimeoutExpired: - raise ValidationError({"conversion": [f"Conversion exceeded the timeout of {timeout} seconds"]}) + raise ValidationError({"conversion": [f"Conversion exceeded the timeout of {timeout} seconds"]}) from None except OSError as err: - raise ValidationError({"conversion": [f"Conversion process failed to start: {err}"]}) + raise ValidationError({"conversion": [f"Conversion process failed to start: {err}"]}) from err🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/oda.py` around lines 87 - 100, The except blocks in _run_subprocess swallow the original exceptions; modify them to use exception chaining so the original trace is preserved: when catching subprocess.TimeoutExpired (the variable e or similar) re-raise the ValidationError using "from" (e.g., raise ValidationError({...}) from e) and do the same for the OSError handler (raise ValidationError({...}) from err) so both TimeoutExpired and OSError are chained into the new ValidationError.ckanext/malmo/dwg_preview/render.py (5)
308-342: ⚡ Quick winConsider adding exception chaining for better debugging.
Line 312 raises
ValidationErrorwithout preserving the exception chain. Addingfrom errwould improve debugging.♻️ Proposed fix
except ImportError as err: - raise ValidationError({"converter": [f"Image validation dependency is not installed: {err}"]}) + raise ValidationError({"converter": [f"Image validation dependency is not installed: {err}"]}) from err🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/render.py` around lines 308 - 342, In _measure_rendered_preview, when catching ImportError as err you should re-raise the ValidationError using exception chaining so the original ImportError is preserved for debugging; locate the except ImportError as err block and raise ValidationError({"converter": [f"Image validation dependency is not installed: {err}"]}) from err to maintain the exception chain and aid troubleshooting.
56-81: ⚡ Quick winConsider adding exception chaining for better debugging.
Lines 61 and 73 raise
ValidationErrorwithout preserving the exception chain. Addingfrom errwould improve debugging.♻️ Proposed fix
except ImportError as err: - raise ValidationError({"converter": [f"DXF renderer dependency is not installed: {err}"]}) + raise ValidationError({"converter": [f"DXF renderer dependency is not installed: {err}"]}) from err try: document = ezdxf.readfile(dxf_path) log.info("Loaded DXF document path=%s using fast read path", dxf_path) return document except Exception as fast_err: log.warning("Fast DXF load failed for %s, retrying recovery path: %s", dxf_path, fast_err) try: document, auditor = recover.readfile(dxf_path) except Exception as err: - raise ValidationError({"conversion": [f"Generated DXF could not be parsed: {err}"]}) + raise ValidationError({"conversion": [f"Generated DXF could not be parsed: {err}"]}) from err🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/render.py` around lines 56 - 81, In _load_dxf_document, the ValidationError raises lose the original exceptions' context; update the two raise statements that handle ImportError (caught as err) and the recover.readfile Exception (caught as err) to use exception chaining (raise ValidationError(...) from err) so the original traceback is preserved for debugging; locate these raises in the _load_dxf_document function where ImportError is caught and where recover.readfile failure is handled and add "from err" to each raise.
135-221: ⚡ Quick winConsider adding exception chaining for better debugging.
Lines 151 and 219 raise
ValidationErrorwithout preserving the exception chain. Addingfrom errwould improve debugging.♻️ Proposed fix
except ImportError as err: - raise ValidationError({"converter": [f"PNG rendering dependency is not installed: {err}"]}) + raise ValidationError({"converter": [f"PNG rendering dependency is not installed: {err}"]}) from err # ... rendering logic ... except ValidationError: raise except Exception as err: - raise ValidationError({"conversion": [f'DXF raster rendering failed for layout "{layout_name}": {err}']}) + raise ValidationError({"conversion": [f'DXF raster rendering failed for layout "{layout_name}": {err}']}) from err finally: plt.close(figure)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/render.py` around lines 135 - 221, The except blocks in _render_layout catch errors into err but re-raise ValidationError without exception chaining; update the two raises inside the ImportError handler (the one that currently raises ValidationError({"converter": [...]}) ) and the final generic except Exception as err (the one that raises ValidationError({"conversion": [...]}) ) to include "from err" so the original exception is preserved when raising ValidationError.
321-321: ⚖️ Poor tradeoffHardcoded pixel threshold may cause incorrect coverage for certain drawing styles.
The threshold
< 245on line 321 determines which pixels count as "occupied" versus "empty" when measuring preview coverage. This hardcoded value assumes white/near-white backgrounds and dark content, but may produce incorrect coverage measurements for:
- Drawings with light gray backgrounds
- Drawings with very light line colors (e.g., light blue, light gray)
- Drawings rendered with non-standard color schemes
This could cause valid previews to be incorrectly rejected as "too sparse" or vice versa.
Consider making this threshold configurable via
PreviewConfigif users report coverage detection issues with their specific DWG files. For now, the current value is reasonable for typical black-on-white CAD drawings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/render.py` at line 321, Replace the hardcoded pixel threshold check "if pixels[x, y] < 245:" with a configurable threshold pulled from the preview configuration (e.g., PreviewConfig.coverage_threshold or preview_config.coverage_threshold) while keeping 245 as the default; update the code that computes coverage to reference that config value (replace uses of the literal 245 with the config symbol) and ensure PreviewConfig is extended to expose a sensible default and be passed into the function that reads "pixels" so callers can override the threshold when needed.
240-247: 💤 Low valueConsider logging skipped entities for better observability.
The
try-except-continueblock on lines 245-246 silently skips entities that don't have adxftype()method. While this defensive handling prevents crashes, logging these exceptions at debug level would help diagnose issues with malformed DXF files.♻️ Proposed improvement
def _iter_entity_types(layout: Any) -> list[str]: types: list[str] = [] for entity in layout: try: types.append(str(entity.dxftype()).upper()) except Exception: + log.debug("Skipping entity without dxftype in layout iteration") continue return types🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ckanext/malmo/dwg_preview/render.py` around lines 240 - 247, The function _iter_entity_types silently skips entities that raise in entity.dxftype(); update it to log skipped entities at debug level so malformed DXF data is observable: catch the exception as e, obtain a module logger (e.g., logging.getLogger(__name__) if not already present), and call logger.debug with the exception and a brief identifier for the problematic entity (e.g., repr(entity) or type(entity)) before continuing; keep the existing behavior of appending valid dxftype() values to types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ckanext/malmo/dwg_preview/config.py`:
- Around line 78-87: The image dimension config lacks an upper bound and can be
set to extremely large values; update the two calls to _get_int that set
image_width (using DEFAULT_IMAGE_WIDTH) and image_height (using
DEFAULT_IMAGE_HEIGHT) to pass maximum=8192, and modify the _get_int function
implementation/signature to accept and validate a maximum parameter (raising or
clipping on violation) so matplotlib usage in render.py (figure.figsize) cannot
receive unbounded sizes.
- Around line 94-123: The listed coverage/ratio configs (min_content_coverage,
max_initial_coverage, min_occupied_width_ratio, min_occupied_height_ratio) must
be validated to be ≤ 1.0; update the calls in config initialization to pass
maximum=1.0 for those _get_float invocations and modify the _get_float function
signature/implementation to accept a maximum parameter and raise a clear
configuration error when the parsed float > maximum (in addition to existing
minimum checks) so invalid proportions fail fast with a descriptive message.
---
Outside diff comments:
In `@ckanext/malmo/views.py`:
- Around line 49-51: The Content-Disposition header uses payload["filename"]
directly which risks header injection and parsing issues; sanitize and encode it
before setting the header: strip/control-character-clean the filename (remove
CR/LF and other control chars), replace or remove double quotes and backslashes,
fall back to a safe basename or generated safe name if empty, and percent-encode
non-ASCII bytes to set a RFC6266-compliant filename* value; then set
response.headers["Content-Disposition"] with a safe quoted filename and also
include filename* with the UTF-8 percent-encoded value so non-ASCII names are
handled safely (apply these changes around the response creation where response,
payload and Content-Disposition are used).
---
Duplicate comments:
In `@requirements.txt`:
- Around line 1-5: The requirements.txt currently lists unpinned packages
(html2text, requests, ezdxf, matplotlib, Pillow) which risks security issues and
non-reproducible installs; update each entry to use a pinned safe minimum plus
an upper bound (for example use Pillow>=10.0.0,<11 and requests>=2.32.0,<3) and
choose vetted versions for html2text, ezdxf and matplotlib similarly (e.g.,
>=stable_version,<next_major) so every package line in requirements.txt is
version-pinned with an upper bound to lock dependencies and mitigate CVEs.
In `@setup.py`:
- Around line 17-23: The install_requires list in setup.py currently lacks
version pins; update the install_requires array to include the same constrained
versions/ranges used in requirements.txt (e.g., pin html2text, requests, ezdxf,
matplotlib, Pillow to the approved secure versions), ensuring each package in
install_requires has the matching version specifier (== or >=,<= as used in
requirements.txt) so setup.py becomes the authoritative, reproducible source for
installs.
---
Nitpick comments:
In `@ckanext/malmo/dwg_preview/oda.py`:
- Around line 87-100: The except blocks in _run_subprocess swallow the original
exceptions; modify them to use exception chaining so the original trace is
preserved: when catching subprocess.TimeoutExpired (the variable e or similar)
re-raise the ValidationError using "from" (e.g., raise ValidationError({...})
from e) and do the same for the OSError handler (raise ValidationError({...})
from err) so both TimeoutExpired and OSError are chained into the new
ValidationError.
In `@ckanext/malmo/dwg_preview/render.py`:
- Around line 308-342: In _measure_rendered_preview, when catching ImportError
as err you should re-raise the ValidationError using exception chaining so the
original ImportError is preserved for debugging; locate the except ImportError
as err block and raise ValidationError({"converter": [f"Image validation
dependency is not installed: {err}"]}) from err to maintain the exception chain
and aid troubleshooting.
- Around line 56-81: In _load_dxf_document, the ValidationError raises lose the
original exceptions' context; update the two raise statements that handle
ImportError (caught as err) and the recover.readfile Exception (caught as err)
to use exception chaining (raise ValidationError(...) from err) so the original
traceback is preserved for debugging; locate these raises in the
_load_dxf_document function where ImportError is caught and where
recover.readfile failure is handled and add "from err" to each raise.
- Around line 135-221: The except blocks in _render_layout catch errors into err
but re-raise ValidationError without exception chaining; update the two raises
inside the ImportError handler (the one that currently raises
ValidationError({"converter": [...]}) ) and the final generic except Exception
as err (the one that raises ValidationError({"conversion": [...]}) ) to include
"from err" so the original exception is preserved when raising ValidationError.
- Line 321: Replace the hardcoded pixel threshold check "if pixels[x, y] < 245:"
with a configurable threshold pulled from the preview configuration (e.g.,
PreviewConfig.coverage_threshold or preview_config.coverage_threshold) while
keeping 245 as the default; update the code that computes coverage to reference
that config value (replace uses of the literal 245 with the config symbol) and
ensure PreviewConfig is extended to expose a sensible default and be passed into
the function that reads "pixels" so callers can override the threshold when
needed.
- Around line 240-247: The function _iter_entity_types silently skips entities
that raise in entity.dxftype(); update it to log skipped entities at debug level
so malformed DXF data is observable: catch the exception as e, obtain a module
logger (e.g., logging.getLogger(__name__) if not already present), and call
logger.debug with the exception and a brief identifier for the problematic
entity (e.g., repr(entity) or type(entity)) before continuing; keep the existing
behavior of appending valid dxftype() values to types.
In `@ckanext/malmo/dwg_preview/service.py`:
- Around line 94-99: The except blocks around
toolkit.get_action("resource_show")(context, {"id": resource_id}) should
preserve the original exception context: catch NotFound and NotAuthorized as e
(e.g., except NotFound as e:) and re-raise the ValidationError using exception
chaining (raise ValidationError({"id": ["Resource does not exist"]}) from e and
raise ValidationError({"id": ["User cannot view this resource"]}) from e) so the
original traceback is retained for debugging while keeping the simplified
user-facing messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d8cfd43-20b0-45d4-be6f-7f37acbfbcd6
📒 Files selected for processing (12)
README.mdckanext/malmo/dwg_preview/__init__.pyckanext/malmo/dwg_preview/cache.pyckanext/malmo/dwg_preview/config.pyckanext/malmo/dwg_preview/oda.pyckanext/malmo/dwg_preview/render.pyckanext/malmo/dwg_preview/service.pyckanext/malmo/logic/action.pyckanext/malmo/plugin.pyckanext/malmo/views.pyrequirements.txtsetup.py
✅ Files skipped from review due to trivial changes (2)
- ckanext/malmo/dwg_preview/init.py
- README.md
| image_width=_get_int( | ||
| "ckanext.malmo.dwg_preview_image_width", | ||
| DEFAULT_IMAGE_WIDTH, | ||
| minimum=256, | ||
| ), | ||
| image_height=_get_int( | ||
| "ckanext.malmo.dwg_preview_image_height", | ||
| DEFAULT_IMAGE_HEIGHT, | ||
| minimum=256, | ||
| ), |
There was a problem hiding this comment.
Add maximum bounds to image dimension config to prevent memory exhaustion.
The image_width and image_height config values are passed directly to matplotlib's figure.figsize (render.py line 152) with only minimum validation. Maliciously or mistakenly configured large values (e.g., 100000x100000) could cause out-of-memory crashes.
Add maximum validation (e.g., maximum=8192) to both _get_int calls to cap resource usage at reasonable limits.
🛡️ Proposed fix to add maximum bounds
image_width=_get_int(
"ckanext.malmo.dwg_preview_image_width",
DEFAULT_IMAGE_WIDTH,
minimum=256,
+ maximum=8192,
),
image_height=_get_int(
"ckanext.malmo.dwg_preview_image_height",
DEFAULT_IMAGE_HEIGHT,
minimum=256,
+ maximum=8192,
),Update _get_int signature and implementation to accept and enforce the maximum parameter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ckanext/malmo/dwg_preview/config.py` around lines 78 - 87, The image
dimension config lacks an upper bound and can be set to extremely large values;
update the two calls to _get_int that set image_width (using
DEFAULT_IMAGE_WIDTH) and image_height (using DEFAULT_IMAGE_HEIGHT) to pass
maximum=8192, and modify the _get_int function implementation/signature to
accept and validate a maximum parameter (raising or clipping on violation) so
matplotlib usage in render.py (figure.figsize) cannot receive unbounded sizes.
| min_content_coverage=_get_float( | ||
| "ckanext.malmo.dwg_preview_min_content_coverage", | ||
| DEFAULT_MIN_CONTENT_COVERAGE, | ||
| minimum=0.00001, | ||
| ), | ||
| max_initial_coverage=_get_float( | ||
| "ckanext.malmo.dwg_preview_max_initial_coverage", | ||
| DEFAULT_MAX_INITIAL_COVERAGE, | ||
| minimum=0.001, | ||
| ), | ||
| retry_render_margin=_get_float( | ||
| "ckanext.malmo.dwg_preview_retry_render_margin", | ||
| DEFAULT_RETRY_RENDER_MARGIN, | ||
| minimum=0.0, | ||
| ), | ||
| lineweight_scaling=_get_float( | ||
| "ckanext.malmo.dwg_preview_lineweight_scaling", | ||
| DEFAULT_LINEWEIGHT_SCALING, | ||
| minimum=0.1, | ||
| ), | ||
| min_occupied_width_ratio=_get_float( | ||
| "ckanext.malmo.dwg_preview_min_occupied_width_ratio", | ||
| DEFAULT_MIN_OCCUPIED_WIDTH_RATIO, | ||
| minimum=0.0, | ||
| ), | ||
| min_occupied_height_ratio=_get_float( | ||
| "ckanext.malmo.dwg_preview_min_occupied_height_ratio", | ||
| DEFAULT_MIN_OCCUPIED_HEIGHT_RATIO, | ||
| minimum=0.0, | ||
| ), |
There was a problem hiding this comment.
Validate that coverage and ratio config values are ≤ 1.0.
Coverage and ratio fields represent proportions and should not exceed 1.0:
min_content_coverage(line 94-98)max_initial_coverage(line 99-103)min_occupied_width_ratio(line 114-118)min_occupied_height_ratio(line 119-123)
Values > 1.0 would cause confusing validation failures downstream. Add maximum validation to fail fast with clear config errors.
✅ Proposed fix to validate upper bounds
min_content_coverage=_get_float(
"ckanext.malmo.dwg_preview_min_content_coverage",
DEFAULT_MIN_CONTENT_COVERAGE,
minimum=0.00001,
+ maximum=1.0,
),
max_initial_coverage=_get_float(
"ckanext.malmo.dwg_preview_max_initial_coverage",
DEFAULT_MAX_INITIAL_COVERAGE,
minimum=0.001,
+ maximum=1.0,
),
# ... similar for retry_render_margin if it's also a ratio ...
min_occupied_width_ratio=_get_float(
"ckanext.malmo.dwg_preview_min_occupied_width_ratio",
DEFAULT_MIN_OCCUPIED_WIDTH_RATIO,
minimum=0.0,
+ maximum=1.0,
),
min_occupied_height_ratio=_get_float(
"ckanext.malmo.dwg_preview_min_occupied_height_ratio",
DEFAULT_MIN_OCCUPIED_HEIGHT_RATIO,
minimum=0.0,
+ maximum=1.0,
),Update _get_float signature and implementation to accept and enforce the maximum parameter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ckanext/malmo/dwg_preview/config.py` around lines 94 - 123, The listed
coverage/ratio configs (min_content_coverage, max_initial_coverage,
min_occupied_width_ratio, min_occupied_height_ratio) must be validated to be ≤
1.0; update the calls in config initialization to pass maximum=1.0 for those
_get_float invocations and modify the _get_float function
signature/implementation to accept a maximum parameter and raise a clear
configuration error when the parsed float > maximum (in addition to existing
minimum checks) so invalid proportions fail fast with a descriptive message.
Summary by CodeRabbit
New Features
/api/3/action/convert_dwgfor accessing preview generationDocumentation
Chores