This is the offical repository for "Knowledge Matters: Injecting Project and Testing Knowledge into LLM-based Unit Test Generation"
The file structure of the repository is as follows:
KTester
|—— code: The source code of KTester.
| |—— Java
| | |—— project-index-builder
| | └—— project-info-process
| |—— procedure
| |—— templates
| |—— tools
| |—— settings.py
| |—— preparation.py: code for work space preparation and project knowledge building.
| |—— generate_unit_test.py: generate unit test class for focal methods.
| └—— evaluation.py: code for running evaluations in paper.
|—— data:
| |—— dataset_info.json: Basic infomation of evaluation dataset, including projects, target classs, focal methods and file paths.
| └—— project_index: Knowledge extracted from projects in dataset.
└—— Readme.md
Environment for our experiments:
- Java: openjdk 17.0.12 2024-07-16
- Python: 3.13.0
- Maven: Apache Maven 3.9.9
Set JVM language to English:
setx _JAVA_OPTIONS "-Duser.language=en -Duser.country=US -Dfile.encoding=UTF-8"Remember to set the environment variable JAVA_HOME to the path of your Java installation.
Download Python dependencies:
pip install -r code/requirements.txtNote: If your operating system is Linux/MacOS, you should search for this line of code in the repository:
jpype.startJVM(jpype.getDefaultJVMPath(), '-Xmx4g', "-Djava.class.path=./Java/project-info-process.jar;./Java/project-index-builder.jar")and replace it with:
jpype.startJVM(jpype.getDefaultJVMPath(), '-Xmx4g', "-Djava.class.path=./Java/project-info-process.jar:./Java/project-index-builder.jar")- Download the dataset (see #dataset-and-evaluation-results).
- Rename
code/settings.py.templatetocode/settings.pyand compelete settings. - Run the following commands:
cd code
# prepare workspace
python preparation.py -W
# extract project knowledges (Running results have already in "data/project_index", you can skip it)
python preparation.py -P
# generate unit tests
python generate_unit_test.py
# run unit test and get coverage
python evaluation.py --operation coverage
# collect baseline results:
python evaluation.py --operation baselineIn addition, we provide the parameter -F <log file path> to generate logs for the entire process.
The full dataset can be downloaded from this link. It contains 3 zip files:
projects: Maven Java projetcs used in the dataset.project_lucene: Function level indexes, to construct prompts for the focal method, you should unzip this file to./data/project_index/lucene.evaluation_results: It contains prompts, test classes generated by KTester and coverage reports.