Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Performance analytics — a new Performance page (`/jobs/performance`) shows per-job-class statistics derived from the history table: run count, average duration, p50, p95, min, and max; rows are sorted by p95 descending so the slowest classes appear first; a period filter (1h / 24h / 7d / All) scopes the dataset; each class name links to the History page pre-filtered to that class; business logic lives in a `JobPerformanceStats` service using a single pluck query with Ruby-side aggregation for DB-agnostic percentile computation
- Metrics / health endpoint — `GET /jobs/metrics.json` returns a JSON document with job counts (`ready`, `scheduled`, `claimed`, `blocked`, `failed`), throughput (`completed_1h`, `completed_24h`), per-queue depth and pause state, and process health (`total`, `healthy`, `stale`, `by_kind`); when `slow_job_threshold` is configured, a `slow_jobs` count is also included; the endpoint goes through the same authentication and `connects_to` middleware as all other routes
- Recurring task "Run Now" — a "Run Now" button on the Recurring Tasks page triggers `task.enqueue(at: Time.current)` to enqueue the job immediately without waiting for its next scheduled run; SolidQueue's `RecurringExecution` deduplication prevents double-enqueuing
- Read replica support — when `connects_to` is set to `{ reading: <role>, writing: <role> }`, the engine automatically routes GET requests to the reading role and mutating requests (POST/DELETE/PATCH) to the writing role via `ActiveRecord::Base.connected_to(role:)`; passing any other hash (e.g. `{ role: :writing }`, `{ shard: :name }`) falls through to `connected_to` directly; defaults to `nil` so single-database setups are unaffected
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ SolidQueueWeb surfaces all of this in a browser UI available at any route you ch
- **CSV export** — "Export CSV" button on the jobs, failed jobs, and history pages downloads all records matching the current filters; columns are tailored per view
- **Slow job detection** — when `slow_job_threshold` is configured, claimed jobs running longer than the threshold are flagged with an orange row, a "slow" badge, and a "Running For" duration column on the Running tab; a "Slow Jobs" warning card appears on the dashboard with a link to the Running tab
- **Webhook alerts** — set `alert_webhook_url` and `alert_failure_threshold` to receive a POST request whenever the failed job count meets or exceeds the threshold; fires asynchronously so dashboard performance is unaffected; a configurable cooldown (default 1 h) prevents repeated alerts while the count stays elevated
- **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, min, and max duration; sorted by p95 descending so the slowest classes surface first; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
- **Metrics / health endpoint** — `GET /jobs/metrics.json` returns a machine-readable JSON document with job counts, throughput, per-queue depth and pause state, and process health summary; suitable for Prometheus scraping, uptime monitors, or external dashboards; `slow_jobs` count included when `slow_job_threshold` is configured

## Screenshots
Expand Down Expand Up @@ -212,7 +213,6 @@ Planned features, roughly ordered by priority:
- Bulk scheduled job actions — "Run All Now" button on the Scheduled tab, mirroring the "Retry All" pattern on the Failed Jobs page

**Observability**
- Performance analytics — average and percentile (p50/p95) duration per job class derived from the history table; surfaces slow job types before they become a problem
- Priority filter — filter and sort the jobs list by Solid Queue job priority

**Notifications**
Expand Down
12 changes: 12 additions & 0 deletions app/controllers/solid_queue_web/performance_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module SolidQueueWeb
class PerformanceController < ApplicationController
def index
@period = params[:period].presence_in(PERIOD_DURATIONS.keys)

scope = SolidQueue::Job.where.not(finished_at: nil)
scope = scope.where("finished_at >= ?", PERIOD_DURATIONS[@period].ago) if @period.present?

@rows = JobPerformanceStats.new(scope).rows
end
end
end
38 changes: 38 additions & 0 deletions app/services/solid_queue_web/job_performance_stats.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
module SolidQueueWeb
class JobPerformanceStats
Row = Struct.new(:class_name, :count, :avg, :p50, :p95, :min, :max, keyword_init: true)

def initialize(scope)
@scope = scope
end

def rows
grouped = @scope.pluck(:class_name, :created_at, :finished_at)
.group_by(&:first)

grouped.map do |class_name, records|
durations = records.map { |_, created, finished| (finished - created).to_f }.sort
Row.new(
class_name: class_name,
count: durations.size,
avg: mean(durations),
p50: percentile(durations, 50),
p95: percentile(durations, 95),
min: durations.first,
max: durations.last
)
end.sort_by { |r| -r.p95 }
end

private

def mean(sorted)
sorted.sum / sorted.size
end

def percentile(sorted, pct)
idx = [(pct / 100.0 * sorted.size).ceil - 1, 0].max
sorted[idx]
end
end
end
1 change: 1 addition & 0 deletions app/views/layouts/solid_queue_web/application.html.erb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
<li><%= link_to "Queues", queues_path, class: current_page?(queues_path) ? "active" : "", aria: { current: current_page?(queues_path) ? "page" : nil } %></li>
<li><%= link_to "Jobs", jobs_path, class: current_page?(jobs_path) ? "active" : "", aria: { current: current_page?(jobs_path) ? "page" : nil } %></li>
<li><%= link_to "History", history_path, class: current_page?(history_path) ? "active" : "", aria: { current: current_page?(history_path) ? "page" : nil } %></li>
<li><%= link_to "Performance", performance_path, class: current_page?(performance_path) ? "active" : "", aria: { current: current_page?(performance_path) ? "page" : nil } %></li>
<li><%= link_to "Failed", failed_jobs_path, class: current_page?(failed_jobs_path) ? "active" : "", aria: { current: current_page?(failed_jobs_path) ? "page" : nil } %></li>
<li><%= link_to "Recurring", recurring_tasks_path, class: current_page?(recurring_tasks_path) ? "active" : "", aria: { current: current_page?(recurring_tasks_path) ? "page" : nil } %></li>
<li><%= link_to "Processes", processes_path, class: current_page?(processes_path) ? "active" : "", aria: { current: current_page?(processes_path) ? "page" : nil } %></li>
Expand Down
50 changes: 50 additions & 0 deletions app/views/solid_queue_web/performance/index.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<div class="sqd-page-header">
<h1 class="sqd-page-title">Performance</h1>
</div>

<form class="sqd-search" action="<%= performance_path %>" method="get">
<div class="sqd-period-filter" role="group" aria-label="Time period">
<%= link_to "All", performance_path, class: @period.nil? ? "active" : "", aria: { current: @period.nil? ? "true" : nil } %>
<%= link_to "1h", performance_path(period: "1h"), class: @period == "1h" ? "active" : "", aria: { current: @period == "1h" ? "true" : nil } %>
<%= link_to "24h", performance_path(period: "24h"), class: @period == "24h" ? "active" : "", aria: { current: @period == "24h" ? "true" : nil } %>
<%= link_to "7d", performance_path(period: "7d"), class: @period == "7d" ? "active" : "", aria: { current: @period == "7d" ? "true" : nil } %>
</div>
</form>

<% if @rows.any? %>
<div class="sqd-card" style="margin-top: 1rem;">
<table>
<thead>
<tr>
<th scope="col">Job Class</th>
<th scope="col" style="text-align: right;">Runs</th>
<th scope="col" style="text-align: right;">Avg</th>
<th scope="col" style="text-align: right;">p50</th>
<th scope="col" style="text-align: right;">p95</th>
<th scope="col" style="text-align: right;">Min</th>
<th scope="col" style="text-align: right;">Max</th>
</tr>
</thead>
<tbody>
<% @rows.each do |row| %>
<tr>
<td>
<%= link_to row.class_name, history_path(q: row.class_name, period: @period),
class: "sqd-table-link" %>
</td>
<td class="sqd-mono" style="text-align: right;"><%= row.count %></td>
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.avg) %></td>
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p50) %></td>
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p95) %></td>
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.min) %></td>
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.max) %></td>
</tr>
<% end %>
</tbody>
</table>
</div>
<% else %>
<div class="sqd-card" style="margin-top: 1rem;">
<div class="sqd-empty">No finished jobs found<%= " in the last #{@period}" if @period %>.</div>
</div>
<% end %>
5 changes: 3 additions & 2 deletions config/routes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
resource :blocked_jobs, only: [:destroy]

get "metrics", to: "metrics#index", as: :metrics, defaults: { format: :json }
get "search", to: "search#index", as: :search
get "history", to: "history#index", as: :history
get "search", to: "search#index", as: :search
get "history", to: "history#index", as: :history
get "performance", to: "performance#index", as: :performance

resources :recurring_tasks, only: [:index], param: :key do
resource :run, only: [:create], controller: "recurring_tasks/runs"
Expand Down
96 changes: 96 additions & 0 deletions spec/requests/solid_queue_web/performance_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
require "rails_helper"

RSpec.describe "Performance", type: :request do
def finished_job(class_name:, duration_seconds:, finished_ago: 1.hour)
finished = finished_ago.ago
job = SolidQueue::Job.new(
queue_name: "default", class_name: class_name,
arguments: {}.to_json, priority: 0, active_job_id: SecureRandom.uuid
)
job.finished_at = finished
job.created_at = finished - duration_seconds
job.updated_at = finished
job.save!(validate: false)
job
end

describe "GET /jobs/performance" do
it "returns HTTP success" do
get "/jobs/performance"
expect(response).to have_http_status(:ok)
end

it "displays the Performance heading" do
get "/jobs/performance"
expect(response.body).to include("Performance")
end

it "shows an empty state when no finished jobs exist" do
get "/jobs/performance"
expect(response.body).to include("No finished jobs found")
end

it "renders a row for each distinct job class" do
finished_job(class_name: "AlphaJob", duration_seconds: 10)
finished_job(class_name: "BetaJob", duration_seconds: 20)

get "/jobs/performance"
expect(response.body).to include("AlphaJob")
expect(response.body).to include("BetaJob")
end

it "links each job class to the history page filtered by that class" do
finished_job(class_name: "AlphaJob", duration_seconds: 10)

get "/jobs/performance"
expect(response.body).to include("/jobs/history?q=AlphaJob")
end

it "renders period filter pills" do
get "/jobs/performance"
expect(response.body).to include("1h")
expect(response.body).to include("24h")
expect(response.body).to include("7d")
end

it "filters results to the selected period" do
finished_job(class_name: "RecentJob", duration_seconds: 5, finished_ago: 30.minutes)
finished_job(class_name: "OldJob", duration_seconds: 5, finished_ago: 48.hours)

get "/jobs/performance", params: { period: "24h" }
expect(response.body).to include("RecentJob")
expect(response.body).not_to include("OldJob")
end

it "shows empty state message with period when no jobs match the filter" do
get "/jobs/performance", params: { period: "1h" }
expect(response.body).to include("No finished jobs found in the last 1h")
end

it "sorts rows by p95 descending (slowest class first)" do
finished_job(class_name: "FastJob", duration_seconds: 2)
finished_job(class_name: "SlowJob", duration_seconds: 120)

get "/jobs/performance"
slow_pos = response.body.index("SlowJob")
fast_pos = response.body.index("FastJob")
expect(slow_pos).to be < fast_pos
end

describe "authentication" do
after { SolidQueueWeb.instance_variable_set(:@authenticate, nil) }

it "allows access when auth block returns truthy" do
SolidQueueWeb.authenticate { true }
get "/jobs/performance"
expect(response).to have_http_status(:ok)
end

it "returns 401 when auth block returns falsy" do
SolidQueueWeb.authenticate { false }
get "/jobs/performance"
expect(response).to have_http_status(:unauthorized)
end
end
end
end
83 changes: 83 additions & 0 deletions spec/services/solid_queue_web/job_performance_stats_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
require "rails_helper"

RSpec.describe SolidQueueWeb::JobPerformanceStats do
def finished_job(class_name:, duration_seconds:, finished_ago: 1.hour)
finished = finished_ago.ago
job = SolidQueue::Job.new(
queue_name: "default", class_name: class_name,
arguments: {}.to_json, priority: 0, active_job_id: SecureRandom.uuid
)
job.finished_at = finished
job.created_at = finished - duration_seconds
job.updated_at = finished
job.save!(validate: false)
job
end

let(:scope) { SolidQueue::Job.where.not(finished_at: nil) }

describe "#rows" do
it "returns an empty array when no finished jobs exist" do
expect(described_class.new(scope).rows).to be_empty
end

it "returns one row per distinct job class" do
finished_job(class_name: "AlphaJob", duration_seconds: 10)
finished_job(class_name: "BetaJob", duration_seconds: 20)

rows = described_class.new(scope).rows
expect(rows.map(&:class_name)).to match_array(%w[AlphaJob BetaJob])
end

it "computes count correctly" do
3.times { finished_job(class_name: "RepeatedJob", duration_seconds: 10) }

row = described_class.new(scope).rows.find { |r| r.class_name == "RepeatedJob" }
expect(row.count).to eq(3)
end

it "computes avg, min, and max correctly" do
finished_job(class_name: "MathJob", duration_seconds: 10)
finished_job(class_name: "MathJob", duration_seconds: 20)
finished_job(class_name: "MathJob", duration_seconds: 30)

row = described_class.new(scope).rows.find { |r| r.class_name == "MathJob" }
expect(row.avg).to be_within(0.5).of(20)
expect(row.min).to be_within(0.5).of(10)
expect(row.max).to be_within(0.5).of(30)
end

it "computes p50 as the median" do
[10, 20, 30, 40, 50].each { |d| finished_job(class_name: "P50Job", duration_seconds: d) }

row = described_class.new(scope).rows.find { |r| r.class_name == "P50Job" }
expect(row.p50).to be_within(0.5).of(30)
end

it "computes p95 near the high end of the distribution" do
20.times { |i| finished_job(class_name: "P95Job", duration_seconds: i + 1) }

row = described_class.new(scope).rows.find { |r| r.class_name == "P95Job" }
expect(row.p95).to be_within(1).of(19)
end

it "sorts rows by p95 descending" do
finished_job(class_name: "FastJob", duration_seconds: 2)
finished_job(class_name: "SlowJob", duration_seconds: 120)

rows = described_class.new(scope).rows
expect(rows.first.class_name).to eq("SlowJob")
expect(rows.last.class_name).to eq("FastJob")
end

it "respects a pre-filtered scope" do
finished_job(class_name: "InScopeJob", duration_seconds: 10, finished_ago: 30.minutes)
finished_job(class_name: "OutScopeJob", duration_seconds: 10, finished_ago: 48.hours)

filtered = scope.where("finished_at >= ?", 1.hour.ago)
rows = described_class.new(filtered).rows
expect(rows.map(&:class_name)).to include("InScopeJob")
expect(rows.map(&:class_name)).not_to include("OutScopeJob")
end
end
end