You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
16
16
17
17
### Fixed
18
18
19
+
## [0.3.13] - 2025-09-07
20
+
21
+
### Added
22
+
-**Security Enhancements**:
23
+
- Implemented `sanitize_input` function in `prompt.py` to prevent prompt injection attacks by removing or escaping common injection patterns and sensitive keywords (e.g., "ignore previous instructions", "grading logic").
24
+
- Added input sanitization for student code, README content, and JSON reports to ensure safe inclusion in LLM prompts.
25
+
- Introduced random delimiters to wrap content and prevent malicious prompt manipulation.
26
+
- Added a `sleep 1` command in `build.yml` to stabilize pipeline execution by introducing a brief delay before running `entrypoint.py`.
27
+
- Added guardrail instructions in `prompt.py` to restrict LLM behavior to providing feedback based solely on provided test results, student code, and assignment instructions.
28
+
29
+
### Changed
30
+
-**Build Workflow** (`build.yml`):
31
+
- Reordered the model matrix to prioritize `claude-sonnet-4-20250514` and `google/gemma-2-9b-it`, ensuring consistency in model testing order.
32
+
- Updated `actions/checkout` from `v4` to `v5` for improved performance and compatibility.
33
+
- Changed default model from `gemini` to `gemini-2.5-flash` for improved performance and accuracy.
34
+
-**README** (`README.md`):
35
+
- Updated project description to highlight enhanced security against prompt injection attacks.
36
+
- Added details about new security features (input sanitization and random delimiters) in the "Key Features" and "Notes" sections.
37
+
- Changed input parameters (`report-files`, `student-files`, `readme-path`) to have no default values, requiring explicit configuration for clarity.
38
+
- Updated default model in input table and example workflow to `gemini-2.5-flash`.
39
+
- Added note about prompt injection mitigation, emphasizing use in controlled environments.
40
+
- Updated future enhancements to include advanced parsing for stronger prompt injection defenses.
41
+
- Added troubleshooting guidance for prompt injection anomalies.
42
+
- Updated acknowledgments to reflect contributions from `Gemini 2.5 Flash` instead of `Gemini 2.0 Flash`.
43
+
-**Prompt Logic** (`prompt.py`):
44
+
- Applied input sanitization to `longrepr`, `stderr`, student code, README content, and locale files to prevent prompt injection.
45
+
- Added guardrail instruction in `get_initial_instruction` to enforce strict adherence to feedback tasks.
46
+
- Improved type hints and function signatures for better code clarity and maintainability.
47
+
- Optimized string handling by replacing newlines with spaces and limiting input length to 10,000 characters to prevent prompt structure disruption.
48
+
-**Tests** (`test_prompt.py`):
49
+
- Improved test assertions with descriptive error messages for better debugging.
50
+
- Simplified test code by removing redundant tuple conversions and map operations.
51
+
- Updated test fixtures and assertions to align with sanitized input handling.
52
+
- Renamed test functions to follow a consistent naming convention (e.g., `test__exclude_common_contents__single` to `test_exclude_common_contents__single`).
53
+
- Updated `expected_default_gemini_model` fixture to reflect the new default model `gemini-2.5-flash`.
54
+
- Enhanced `test_collect_longrepr__compare_contents` to track found markers and report missing ones explicitly.
55
+
- Corrected minor typos and formatting in test comments and strings (e.g., standardized language terms like "Foutmelding" for Dutch).
56
+
57
+
### Fixed
58
+
- Ensured consistent handling of README content by sanitizing inputs in `assignment_instruction` and `exclude_common_contents` to prevent malicious patterns from affecting prompt generation.
59
+
- Fixed potential issues in test suite by adding explicit type checking and non-empty string validation in `test_collect_longrepr__has_list_items_len`.
60
+
- Addressed missing marker detection in `test_collect_longrepr__compare_contents` by tracking found markers instead of modifying the marker list.
0 commit comments