Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .github/workflows/execution-report-heartbeat.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: Execution Report Heartbeat

on:
workflow_dispatch:
inputs:
lookback_hours:
description: "Report lookback window in hours."
required: false
type: string
default: "36"
fail_workflow_on_alert:
description: "Fail this workflow when an alert is emitted."
required: false
type: choice
default: "true"
options:
- "true"
- "false"
schedule:
- cron: "25 23 * * 1-5"

env:
GCP_PROJECT_ID: longbridgequant
GCP_WORKLOAD_IDENTITY_PROVIDER: projects/252919773759/locations/global/workloadIdentityPools/github-actions/providers/github-main
GCP_WORKLOAD_IDENTITY_SERVICE_ACCOUNT: longbridge-platform-deploy@longbridgequant.iam.gserviceaccount.com

jobs:
heartbeat:
name: Check ${{ matrix.target.label }} execution report heartbeat
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
target:
- label: PAPER
environment: longbridge-paper
- label: HK
environment: longbridge-hk
- label: SG
environment: longbridge-sg
permissions:
contents: read
id-token: write
environment: ${{ matrix.target.environment }}
env:
RUNTIME_HEARTBEAT_NAME: LongBridgePlatform ${{ matrix.target.label }}
RUNTIME_HEARTBEAT_REPORT_PLATFORM: longbridge
RUNTIME_HEARTBEAT_ACCOUNT_SCOPE: ${{ vars.ACCOUNT_REGION }}
RUNTIME_HEARTBEAT_REQUIRED_SERVICES: ${{ vars.RUNTIME_HEARTBEAT_REQUIRED_SERVICES || vars.CLOUD_RUN_SERVICE }}
RUNTIME_HEARTBEAT_GCS_URIS: ${{ vars.RUNTIME_HEARTBEAT_GCS_URIS || vars.EXECUTION_REPORT_GCS_URI }}
RUNTIME_HEARTBEAT_LOOKBACK_HOURS: ${{ inputs.lookback_hours || vars.RUNTIME_HEARTBEAT_LOOKBACK_HOURS || '36' }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tighten the default heartbeat window

With the scheduled job running only once per weekday at 23:25 UTC, the 36-hour default can let yesterday's accepted report satisfy today's check. For example, if the PAPER/US service wrote a report after Monday's close and then misses all Tuesday scheduler runs, Tuesday's heartbeat still finds that report within 36 hours and returns OK, so the missed daily completion is not alerted. Use a window that cannot include the prior trading day's completion, or derive the expected window per market/day.

Useful? React with 👍 / 👎.

RUNTIME_HEARTBEAT_FAIL_WORKFLOW_ON_ALERT: ${{ inputs.fail_workflow_on_alert || vars.RUNTIME_HEARTBEAT_FAIL_WORKFLOW_ON_ALERT || 'true' }}
RUNTIME_HEARTBEAT_ACCEPT_STATUSES: ${{ vars.RUNTIME_HEARTBEAT_ACCEPT_STATUSES }}
RUNTIME_HEARTBEAT_REJECT_STATUSES: ${{ vars.RUNTIME_HEARTBEAT_REJECT_STATUSES }}
CLOUD_RUN_SERVICE: ${{ vars.CLOUD_RUN_SERVICE }}
GLOBAL_TELEGRAM_CHAT_ID: ${{ vars.GLOBAL_TELEGRAM_CHAT_ID }}
CRISIS_ALERT_TELEGRAM_CHAT_IDS: ${{ vars.CRISIS_ALERT_TELEGRAM_CHAT_IDS }}
CRISIS_ALERT_TELEGRAM_API_BASE_URL: ${{ vars.CRISIS_ALERT_TELEGRAM_API_BASE_URL }}
TELEGRAM_TOKEN: ${{ secrets.TELEGRAM_TOKEN }}
CRISIS_ALERT_TELEGRAM_BOT_TOKEN: ${{ secrets.CRISIS_ALERT_TELEGRAM_BOT_TOKEN }}
steps:
- name: Checkout repository
uses: actions/checkout@v6

- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v3
with:
workload_identity_provider: ${{ env.GCP_WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ env.GCP_WORKLOAD_IDENTITY_SERVICE_ACCOUNT }}

- name: Set up gcloud
uses: google-github-actions/setup-gcloud@v3

- name: Check recent execution report
run: python scripts/execution_report_heartbeat.py
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,17 @@ The scheduled guard runs every 30 minutes. For a missed-run heartbeat, set
for that Environment. The default leaves the heartbeat check off to avoid false
alerts outside active market windows.

`Execution Report Heartbeat` (`.github/workflows/execution-report-heartbeat.yml`)
is the stricter completion check. It runs once per GitHub Environment on
weekdays after the expected market windows and verifies that a recent runtime
report exists under that Environment's `EXECUTION_REPORT_GCS_URI`. It reads the
latest report JSON and alerts if no recent report exists or the recent reports
have rejected statuses such as `error`. The deploy service account needs object
read/list access on the report bucket.
Each matrix job checks its own Environment service. Override
`RUNTIME_HEARTBEAT_REQUIRED_SERVICES` only when an Environment intentionally
monitors a different service list.

### Deployment unit and naming

- `QuantPlatformKit` is only a shared dependency; Cloud Run still deploys `LongBridgePlatform` itself.
Expand Down Expand Up @@ -435,6 +446,14 @@ OIDC/IAM/audience 配错、Cloud Run 返回 4xx/5xx、或容器在 app-level Tel
`RUNTIME_GUARD_REQUIRE_SUCCESS=true`,并把 `RUNTIME_GUARD_LOOKBACK_MINUTES` 设成覆盖该环境预期
Scheduler 运行时间的窗口。默认不强制心跳,避免非交易窗口误报。

更严格的完成检查是 `Execution Report Heartbeat`
(`.github/workflows/execution-report-heartbeat.yml`)。它会按 GitHub Environment 在工作日
预期市场窗口后分别检查该环境 `EXECUTION_REPORT_GCS_URI` 下最近的 runtime report JSON,
读取 `status/stage/errors`,如果没有近期 report 或 report 状态为 `error` 等失败状态就发
Telegram。GitHub deploy service account 需要对 report bucket 有对象读取/列举权限。
每个 matrix job 默认检查自己的 Environment service。只有当某个 Environment 需要监控不同
service 列表时,才覆盖 `RUNTIME_HEARTBEAT_REQUIRED_SERVICES`。

### 部署单元和命名建议

- `QuantPlatformKit` 只是共享依赖,不单独部署;Cloud Run 继续只部署 `LongBridgePlatform`。
Expand Down
Loading