All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Security Enhancements:
- Implemented
sanitize_inputfunction inprompt.pyto prevent prompt injection attacks by removing or escaping common injection patterns and sensitive keywords (e.g., "ignore previous instructions", "grading logic"). - Added input sanitization for student code, README content, and JSON reports to ensure safe inclusion in LLM prompts.
- Introduced random delimiters to wrap content and prevent malicious prompt manipulation.
- Added a
sleep 1command inbuild.ymlto stabilize pipeline execution by introducing a brief delay before runningentrypoint.py. - Added guardrail instructions in
prompt.pyto restrict LLM behavior to providing feedback based solely on provided test results, student code, and assignment instructions.
- Implemented
- Build Workflow (
build.yml):- Reordered the model matrix to prioritize
claude-sonnet-4-20250514andgoogle/gemma-2-9b-it, ensuring consistency in model testing order. - Updated
actions/checkoutfromv4tov5for improved performance and compatibility. - Changed default model from
geminitogemini-2.5-flashfor improved performance and accuracy.
- Reordered the model matrix to prioritize
- README (
README.md):- Updated project description to highlight enhanced security against prompt injection attacks.
- Added details about new security features (input sanitization and random delimiters) in the "Key Features" and "Notes" sections.
- Changed input parameters (
report-files,student-files,readme-path) to have no default values, requiring explicit configuration for clarity. - Updated default model in input table and example workflow to
gemini-2.5-flash. - Added note about prompt injection mitigation, emphasizing use in controlled environments.
- Updated future enhancements to include advanced parsing for stronger prompt injection defenses.
- Added troubleshooting guidance for prompt injection anomalies.
- Updated acknowledgments to reflect contributions from
Gemini 2.5 Flashinstead ofGemini 2.0 Flash.
- Prompt Logic (
prompt.py):- Applied input sanitization to
longrepr,stderr, student code, README content, and locale files to prevent prompt injection. - Added guardrail instruction in
get_initial_instructionto enforce strict adherence to feedback tasks. - Improved type hints and function signatures for better code clarity and maintainability.
- Optimized string handling by replacing newlines with spaces and limiting input length to 10,000 characters to prevent prompt structure disruption.
- Applied input sanitization to
- Tests (
test_prompt.py):- Improved test assertions with descriptive error messages for better debugging.
- Simplified test code by removing redundant tuple conversions and map operations.
- Updated test fixtures and assertions to align with sanitized input handling.
- Renamed test functions to follow a consistent naming convention (e.g.,
test__exclude_common_contents__singletotest_exclude_common_contents__single). - Updated
expected_default_gemini_modelfixture to reflect the new default modelgemini-2.5-flash. - Enhanced
test_collect_longrepr__compare_contentsto track found markers and report missing ones explicitly. - Corrected minor typos and formatting in test comments and strings (e.g., standardized language terms like "Foutmelding" for Dutch).
- Ensured consistent handling of README content by sanitizing inputs in
assignment_instructionandexclude_common_contentsto prevent malicious patterns from affecting prompt generation. - Fixed potential issues in test suite by adding explicit type checking and non-empty string validation in
test_collect_longrepr__has_list_items_len. - Addressed missing marker detection in
test_collect_longrepr__compare_contentsby tracking found markers instead of modifying the marker list.
- Support for multiple LLM providers (Claude, Gemini, Grok, NVIDIA NIM, Perplexity) with dedicated configuration classes in
llm_configs.py. llm_client.pyfor robust API interaction with retry, timeout, and error handling.- Comprehensive integration tests in
build.ymlfor multiple models (gemini-2.5-flash,grok-code-fast,claude-sonnet-4-20250514,google/gemma-2-9b-it,sonar) using both action and environment variable inputs. - Model-to-provider mapping in
get_model_key_from_env()to handle precise model IDs.
- Updated
entrypoint.pyto support flexible model selection with fallback togemini-2.5-flash. - Modified
.dockerignoreandDockerfileto includellm_client.pyandllm_configs.py. - Updated
action.ymlinputs:report-files,student-files,readme-pathnow required;modeldefaults togemini; addedfail-expected. - Updated
GeminiConfigdefault model togemini-2.5-flashfromgemini-2.0-flash. - Updated
GrokConfigdefault model togrok-code-fastfromgrok-2-1212. - Enhanced
tests/test_entrypoint.pyto align with new fallback model (gemini-2.5-flash) and added test forINPUT_API-KEY. - Improved logging checks in
tests/test_llm_client.pyfor robustness. - Added file markers in
tests/test_integration.pyandtests/test_prompt.py.
- Resolved 404 errors in integration tests for
google/gemma-2-9b-itandsonarby adding model-to-provider mapping inget_model_key_from_env(). - Fixed test failures in
test_get_model_key_from_env__fallback_geminiandtest_get_model_key_from_env__no_model_fallback_geminiby updating expected model togemini-2.5-flash.
- Flexible LLM Selection: Added support for Claude, Grok, Nvidia NIM, and Perplexity alongside Gemini, with automatic fallback to Gemini or a single available API key if the specified model’s key is unavailable (
entrypoint.py). - C/C++ Support: Extended feedback capabilities to C/C++ assignments, using
pytestandpytest-json-reportto analyze compiled code (e.g., viactypes) in a Docker environment (README.md). - Comprehensive Testing: Added new test cases in
test_entrypoint.pyto cover single-key scenarios, Gemini fallback, empty key handling, and invalid model cases.
- Updated README: Renamed to "AI Code Tutor" to reflect C/C++ and Python support. Clarified dependencies (
pytest==8.3.5,pytest-json-report==1.5.0,pytest-xdist==3.6.1,requests==2.32.4) and API key setup withINPUT_prefix (e.g.,INPUT_GOOGLE_API_KEY). Improved YAML example for GitHub Classroom with Docker-based C/C++ testing. - Enhanced Model Selection Logic: Refactored
get_model_key_from_envinentrypoint.pyto handle missingINPUT_MODEL, prioritize specified model, and fall back to Gemini or single available key. Improved error messages for clarity. - Improved Error Handling: Added
try-exceptfor writing toGITHUB_STEP_SUMMARYinentrypoint.pyto handle permission issues in GitHub Actions. - Test Refinements: Removed
mock_env_api_keysfixture intest_entrypoint.pyfor better isolation, usingmonkeypatchdirectly. Updated tests to align with new model selection logic andValueErrorexceptions. - Code Cleanup: Streamlined
entrypoint.pyby removing redundant comments, improving type hints (e.g.,key: stringet_startwith), and organizingget_config_classlogic with a separateget_config_class_dict.
- API Key Handling: Fixed potential
KeyErrorinentrypoint.pyby usingos.getenvwith defaults, ensuring robust environment variable access. - Test Accuracy: Corrected
test_get_model_key_from_env__invalid_model_no_geminito expect valid fallback to a single available key (e.g., Claude) instead of an error.
- Docker Badges: Temporarily removed Docker Hub badges from
README.mdto align with updated deployment instructions (pending re-addition with verified image updates).
- Added logging info for the explanation language used in the entrypoint.
- Upgraded the default Claude model from
claude-3-haiku-20240307toclaude-sonnet-4-20250514. - Increased the maximum tokens for Claude API requests from 384 to 1024.
- Replaced the assertion for unexpected test failures with a
passstatement (tests now proceed without failing the workflow on unexpected errors unless explicitly expected). - Pinned dependencies:
pytestto version 8.3.5 andrequeststo version 2.32.4
- Copyright registration info
- bump
astral-sh/setup-uvto @v6 requeststo 2.32.4
- Multi-LLM Support: Added support for Gemini, Grok, Nvidia NIM, Claude, and Perplexity via
llm_client.pyandllm_configs.py. - Command-Line Interface: Introduced
main.pyfor standalone feedback generation. - Pyproject.toml: Added for dependency management with
pytestandrequests. - Testing Enhancements: Included
test_llm_client.pyandtest_llm_configs.pyfor unit testing LLM components. - Integration Testing: Added xAI Grok API integration test in
build.yml. - Docker Hub badges for version and image size in
README.md. - Explicit
elif n_failed == 0: passinentrypoint.pyfor clarity.
- Refactored AI Tutor: Replaced
ai_tutor.pywithprompt.pyfor modular prompt engineering. - Entrypoint Overhaul: Updated
entrypoint.pyto support multiple LLMs, improved error handling, and added repository context. - Docker Configuration: Updated
.dockerignoreandDockerfileto reflectprompt.py. - GitHub Workflows: Modified
build.ymlto useuvpackage manager and Python 3.11. - Testing Updates: Renamed
test_ai_tutor.pytotest_prompt.pyand updated tests forprompt.py. Adjustedtest_entrypoint.pyandtest_integration.py. - Reorganized
README.mdfor better structure and streamlined "Troubleshooting". - do not collect
longreprfrom skipped tests
- Gemini-Specific Logic: Removed from
ai_tutor.py, replaced by generic LLM client. - Deprecated Dependencies: Eliminated direct
pipdependency management in favor ofuvandpyproject.toml.
- Error Handling: Enhanced robustness in
entrypoint.pyandllm_client.pyfor file validation, API keys, and network errors. - Test Coverage: Updated
test_entrypoint.pyto useValueErrorinstead ofAssertionErrorfor invalid paths.
- Add
workflow_dispatchtrigger to the Github Actions workflow.
- Remove
GITHUB_OUTPUTfile writing. - Remove raising an additional exception when
n_failedis non zero.
- Write
feedbackto$GITHUB_STEP_SUMMARYif exists. This will show up under the Github Actions job graph.
- Will include available
stderrvalues when generating comments. Expected to be helpful sometimes.
- for
edu-basedocker, write toGITHUB_OUTPUTonly if the env var exists. If the script is running within the docker, there would be noGITHUB_OUTPUTenviroment variable.
- Manual publishing
- no
pipcache to save docker size - support for linux/386,linux/arm/v7,linux/arm/v6 to save docker size Expected Github Actions runners : AMD64 or ARM64
-
Added the ability to specify the Gemini model when using the AI Tutor GitHub Action. A new
modelinput has been added to the action, allowing users to select different Gemini models. The default model remainsgemini-2.0-flashfor backward compatibility. -
Added Norwegian support
-
Added Docker Image Build & Push to the CI/CD pipeline. Please use the following line
uses: docker://ghcr.io/github-id/action-name:tag
-
Removed major version number yaml seemingly not working in the action.
- MECE principle in comment generation
- if failed, assert the error message
fail-expectedargument to fail the test if the expected fail count not correct- Add a feature to exclude common content of README.md assignment instruction.
- Common content is marked by the starting marker
From here is common to all assignments.and the ending markerUntil here is common to all assignments.in the README.md file, surrounded by double backtick (ascii 96) characters.
- Common content is marked by the starting marker
- Add start and end markers to mutable code block in the prompt for Gemini.
- Update license to BSD 3-Clause + Do Not Harm.
- Change the default value of
fail-expectedtofalse. - Improve prompt for Gemini.
- Add header and footer to the prompt.
- Modify instruction for failed tests to "Please generate comments mutually exclusive and collectively exhaustive for the following failed test cases.".
- Add start and end markers to mutable code block.
- 'Currently Korean Only' from README.md.
- Swedish support
- move README before pytest longrepr
- Bahasa Indonesia support
- Nederands support
- Vietnamese support
- Italian support
- API key as a required input.
- Future plans & more in README.md.
- International support
- append integration step feedback output to the GITHUB_OUTPUT file of verification step.
- environment variable 'GOOGLE_API_KEY'.
- integration test
- Initially released
- append integration step feedback output to the GITHUB_OUTPUT file of verification step.
- integration test
- Initially released