This project is not a final version of the tool.
The codebase may contain:
- experimental implementations
- incomplete features
- potential bugs
- future changes and improvements
The goal of this project is to support empirical software engineering research by analyzing how End-to-End (E2E) test smells evolve over time in software repositories.
Specifically, this tool aims to:
- detect End-to-End test smells in E2E test suites
- analyze their historical evolution across commits
- store the results in structured databases for further analysis
- use the collected structured data for empirical studies
The system combines static analysis and repository mining.
Two main components are used:
- E2E Test Smell Detection
A detector identifies smells inside test files.
- Historical Repository Mining
The commit history of each repository is analyzed to understand when smells appear and evolve.
The final output is stored in SQLite databases.
The analysis pipeline follows these steps:
Repository Dataset
│
│
▼
E2E Test Smell Detector
(static analysis)
│
│
▼
CSV Files
(typescript_analysis.csv / javascript_analysis.csv)
│
│
▼
Historical Analyzer
(PyDriller commit mining)
│
│
▼
SQLite Databases
(historical_smellsJS.db / historical_smellsTS.db)
│
│
▼
Empirical analysis
(scripts and results in /analyses)
project-root
│
├── analyses/
│ │
│ ├── e2e_smells_analyzer.py
│ ├── e2e_smells_report_plots.py
│ ├── e2e_empirical_analyzer.R
│ ├── R_scripts/
│ │ ├── main.R
│ │ └── jobs/
│ │ ├── 01_ridge_release.R
│ │ ├── 02_smells_catalog.R
│ │ ├── 03_startup_tables.R
│ │ └── 04_ownership_tables.R
│ │
│
├── history_smells-analyzerJS.py
├── history_smells-analyzerTS.py
├── requirements.txt
│
└── e2e-test-smell-analyzer/
Before running the project ensure the following tools are installed:
- Python 3.8+
- pip
- Git
- R
Download the repository e2e-test-smell-analyzer as a .zip file.
Then:
- Extract the archive
- Copy the folder into this project directory
- Open a terminal and move inside the folder
cd e2e-test-smell-analyzer
Follow the instructions contained in the README of the detector in order to install its dependencies and configure the environment.
Once configured, run the detector to generate the following files:
typescript_analysis.csv
javascript_analysis.csv
These files contain the detected E2E test smells.
Move to the root directory of this project and run:
pip install -r requirements.txt
Execute the following scripts:
python history_smells-analyzerJS.py
python history_smells-analyzerTS.py
These scripts will analyze the commit history of the repositories and collect information about the evolution of the detected test smells.
The system generates two SQLite databases:
- historical_smellsJS.db
- historical_smellsTS.db
The databases include information such as:
- repository name
- test file path
- framework
- commit SHA
- commit date
- commit author
- commit message
- smell type
- nearest and earliest future release date
- nearest and earliest future release version
- class name
- method name
- line number
These datasets can later be used for empirical analysis or statistical studies.
The tool supports several environment variables to control the analysis.
| Variable | Description |
|---|---|
E2E_TEST_MODE |
Enables test mode with a reduced dataset |
E2E_TEST_MAX_REPOS |
Maximum number of repositories to analyze |
E2E_NUM_WORKERS |
Number of parallel workers |
E2E_MAX_COMMITS |
Maximum number of commits analyzed per repository |
E2E_DB_PATH |
Path/name of the output database |
E2E_CLONE_ROOT |
Directory where repositories are cloned |
$env:E2E_TEST_MODE='1'
$env:E2E_TEST_MAX_REPOS='1'
$env:E2E_NUM_WORKERS='2'
python history_smells-analyzer.py
This configuration:
- enables test mode
- analyzes only one repository
- uses two parallel workers
This section describes the scripts available in the analyses/ folder for generating reports and visualizations from the historical databases.
The following section concerns the analysis and visualization of data for a single repository/file.
Script: analyses/e2e_smells_analyzer.py
Generates ownership/smell analytics for a test file and persists the results in SQLite.
The script uses the selected historical DB as the main output and stores structured report data in:
report_summaryreport_commit_detailsreport_developer_details
Text report generation (.txt) is secondary and only produced on request.
python analyses/e2e_smells_analyzer.py <repository> <file_name> --dataset js/ts
Generate also a .txt file (optional):
python analyses/e2e_smells_analyzer.py <repository> <file_name> --dataset js/ts --output analyses/reports/js/my_report.txt
⏩ Batch execution (all files with smells):
python analyses/e2e_smells_analyzer.py --dataset js/ts
Generate also .txt reports in batch mode (optional):
python analyses/e2e_smells_analyzer.py --dataset js/ts --write-txt
✴️ Main parameters:
<repository>: repository name as stored in the DB<file_name>: path or name of the test file--dataset: required,jsorts--db: (optional) path to the SQLite database--report-db: (optional) output DB path for report tables (default: same DB selected with--db/--dataset)--output: (optional, single mode) output.txtpath--write-txt: (optional, batch mode) also generates.txtreports
🔀 Output:
- Primary output: report data saved in SQLite report tables.
- Secondary output:
.txtfiles only when requested with--output(single mode) or--write-txt(batch mode), saved inanalyses/reports/<dataset>/.
The summary now includes these transition metrics:
Introduction commits (new smells introduced)Improving commits (smells removed)Worsening commits (smells added)
Script: analyses/e2e_smells_report_plots.py
After creating textual reports with analyses/e2e_smells_analyzer.py, this script generates a complete set of 5 charts for each report file:
- Smell distribution
- Smell evolution over time (with release lines)
- Ownership plot
- Smells vs release distance
- Smell co-occurrence heatmap
python analyses/e2e_smells_report_plots.py --report analyses/reports/ts/<report_file>.txt
⏩ Batch execution (all reports):
python analyses/e2e_smells_report_plots.py --reports-dir analyses/reports
✴️ Main parameters:
--report: (optional) path to one report.txtfile--reports-dir: (optional) root folder of report files (default:analyses/reports)--output-root: (optional) root output folder (default:analyses/plots)
🔀 Output:
Plots are saved under analyses/plots/ts/ and analyses/plots/js/, inside one folder per repository/file (<repo>_<file>).
The following section concerns statistical analyses and visualizations that aggregate data from all repositories/files, for a general study of the entire work.
analyses/e2e_smells_analyzer.py
Script:
analyses/e2e_empirical_analyzer.R
This script performs aggregated empirical analyses from historical_smellsJS.db and historical_smellsTS.db.
Main outputs include:
- ridge plots of signed distance (days) from closest release by transition type (
No-change,Initial,Improving,Worsening) - release proximity summary tables
- startup-bin tables by bad practice and variation type
- ownership/newcomer tables by bad practice and variation type
- bad-smell incidence tables (global by language and detailed by framework)
The script reads data from:
report_commit_detailshistorical_smellsreport_developer_details
in:
historical_smellsJS.dbhistorical_smellsTS.db
From project root:
Rscript analyses/e2e_empirical_analyzer.RFrom analyses/:
Rscript .\e2e_empirical_analyzer.RSelective modes:
--ridge-release-only--smells-only--startup-tables-only--ownership-tables-only--incidence-only
Incidence mode generates CSV summaries from both databases with:
- bad smell occurrences across all commits
- number of distinct test files affected by each bad smell
- split by language (JavaScript / TypeScript)
- split by framework inside each language
Optional parameters:
--js-db <path>--ts-db <path>
Orchestrator script:
analyses/R_scripts/main.R
Run all jobs sequentially:
Rscript analyses/R_scripts/main.RRun all jobs in parallel:
Rscript analyses/R_scripts/main.R --parallel --workers 2Run one specific job:
Rscript analyses/R_scripts/main.R --only ridge
Rscript analyses/R_scripts/main.R --only smells
Rscript analyses/R_scripts/main.R --only startup
Rscript analyses/R_scripts/main.R --only ownership✴️ Main options for main.R:
--parallel--workers <n>--only ridge|smells|startup|ownership--js-db <path>--ts-db <path>
analyses/plots/release_distance_ridge_js_R.pnganalyses/plots/release_distance_ridge_ts_R.pnganalyses/plots/release_cycle_proximity_summary_js_R.pnganalyses/plots/release_cycle_proximity_summary_ts_R.pnganalyses/plots/test_smells_catalog_table_R.pnganalyses/plots/startup_bins_by_bp_variation_js_R.pnganalyses/plots/startup_bins_by_bp_variation_ts_R.pnganalyses/plots/ownership_newcomer_by_bp_variation_js_R.pnganalyses/plots/ownership_newcomer_by_bp_variation_ts_R.pnganalyses/reports/bad_smells_incidence_language_R.csvanalyses/reports/bad_smells_incidence_framework_R.csv
- Vincenzo Di Nardo




