Create a KPI dashboard for quality numbers#448
Conversation
a6a2efd to
7380bb8
Compare
| env: | ||
| ANDROID_HOME: "" | ||
| ANDROID_SDK_ROOT: "" | ||
| FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true |
There was a problem hiding this comment.
We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.
Can you please also state in the commit message why this change is needed?
There was a problem hiding this comment.
Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif") | ||
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv") |
There was a problem hiding this comment.
We should keep the CSV output for direct human readibility
| f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv") | ||
|
|
||
| # Analyze: run MISRA/AUTOSAR queries and produce SARIF. | ||
| # --ram: cap at 5 GB to prevent swap thrashing on GitHub runners (7 GB total) |
There was a problem hiding this comment.
We should not make this by default. We should add an extra option for github runners that we then only enable in the CI where these parameters are changed.
| - name: Set conclusion | ||
| id: set-conclusion | ||
| run: | | ||
| if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then | ||
| echo "conclusion=success" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "conclusion=failure" >> $GITHUB_OUTPUT | ||
| fi | ||
|
|
There was a problem hiding this comment.
Why is this needed, can we try to remove this again please?
There was a problem hiding this comment.
Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.
| @@ -0,0 +1,179 @@ | |||
| # ******************************************************************************* | |||
There was a problem hiding this comment.
Can we first just take care of the code coverage please to reduce the scope of the PR.
|
|
||
| # Restore KPI history from the previous gh-pages deployment so the | ||
| # dashboard can show delta badges and trend sparklines across runs. | ||
| - name: Restore KPI history |
There was a problem hiding this comment.
I thought we agreed that we do not want history at the moment?
| # Deploy to GitHub Pages | ||
| # ------------------------------------------------------------------ | ||
| - name: Deploy quality reports to GitHub Pages | ||
| uses: peaceiris/actions-gh-pages@v4 |
There was a problem hiding this comment.
This way it is not integrated into our Sphinx build, maybe you can talk with Jochen about that
There was a problem hiding this comment.
yes that was deploys quality reports directly to gh-pages as a separate, uncoordinated publish. I made nightly_quality.yml upload quality reports as an artifact instead of deploying, then have docs.yml trigger on its completion and deploy everything in one shot.
562839a to
364db19
Compare
… only - Use subprocess.run instead of os.system for all CodeQL commands so errors are properly captured and logged. - Add --ram 5000 --timeout 20 -j 2 to database analyze to prevent OOM and hung queries on GitHub runners. - Remove CSV output; SARIF is sufficient for the CI quality report.
- Add FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var. - Add 'conclusion' output (success/failure) to match the interface of clang_tidy.yml and codeql.yml. - Add id and continue-on-error to the bazel coverage step so the job can report a conclusion even on test failures. - Gate genhtml and archive steps on run-coverage outcome so they are skipped cleanly when coverage fails. - Fix cache-save condition to also fire on scheduled (nightly) runs. - Include raw LCOV .dat file in the artifact so the quality dashboard can read coverage percentages without re-running genhtml.
Both workflows are triggered only via workflow_call from nightly_quality.yml. Each exposes artifact-name and conclusion outputs so the caller can conditionally download reports and build a unified dashboard. clang_tidy.yml: - Runs 'bazel test --config=clang-tidy //...' with continue-on-error. - Collects per-target *.AspectRulesLintClangTidy.out files and generates an HTML summary with error/warning counts and a findings table. codeql.yml: - Runs 'bazel run --config=codeql //quality/static_analysis:codeql_lint'. - Collects SARIF output and generates an HTML summary from it. - Sets a 180-minute job timeout to guard against hung analyses.
Runs every night at midnight UTC (and on workflow_dispatch). Executes coverage, codeql, and clang-tidy in parallel as reusable workflow calls, then deploys all reports plus a unified KPI dashboard to GitHub Pages.
quality/dashboard/generate_dashboard.py: - Parses CodeQL SARIF files, clang-tidy *.AspectRulesLintClangTidy.out files, and LCOV .dat data into a single Jinja2-rendered HTML page. - Maintains a quality_history.json for KPI trend tracking across runs - Writes a GitHub Actions step summary with markdown KPI tables. quality/dashboard/dashboard.html.j2: - Dark-themed single-page dashboard with tabbed panels for CodeQL, Clang-Tidy and Coverage. - Sortable/filterable findings tables, coverage progress bars, and a run-history table with trend indicators.
364db19 to
bff61c4
Compare
Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under
latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.Issue: SWP-262453