diff --git a/CHANGELOG.md b/CHANGELOG.md index 2582114..7459ebd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- Error frequency report — `GET /failed_jobs/errors` groups all failed jobs by exception class and message prefix, showing count and an expandable sample backtrace per group; links through to a filtered failed jobs list via `?error_class=`; the failed jobs index gains an "Error Summary" button and shows an active-filter breadcrumb with a clear link + ## [1.0.0] - 2026-05-27 ### Added diff --git a/README.md b/README.md index 202c3fa..ca3dcac 100644 --- a/README.md +++ b/README.md @@ -21,23 +21,13 @@ Run: bundle install ``` -Mount the engine in `config/routes.rb`: - -```ruby -mount SolidStackWeb::Engine, at: "/solid_stack" -``` - -The dashboard will be available at `/solid_stack` (or whatever path you choose). - -### Install generator - Run the install generator to create a documented initializer and wire up the mount point in one step: ```bash rails generate solid_stack_web:install ``` -This creates `config/initializers/solid_stack_web.rb` with every configuration option commented inline, and injects `mount SolidStackWeb::Engine, at: "/solid_stack"` into `config/routes.rb`. +This creates `config/initializers/solid_stack_web.rb` with every configuration option commented inline, and injects `mount SolidStackWeb::Engine, at: "/solid_stack"` into `config/routes.rb`. The dashboard will then be available at `/solid_stack` (or whatever path you choose). --- @@ -77,7 +67,7 @@ This creates `config/initializers/solid_stack_web.rb` with every configuration o ## General configuration -Create an initializer at `config/initializers/solid_stack_web.rb`: +The install generator creates `config/initializers/solid_stack_web.rb` with all options documented inline. The available options are: ```ruby SolidStackWeb.configure do |config| @@ -159,6 +149,7 @@ The dashboard is designed to be mounted behind your application's existing authe - **Queue depth sparklines** — Queues index shows a 12-hour depth chart per queue; each bar is the ready-job count at an hourly snapshot with an instant hover tooltip - **Job detail page** — full arguments (pretty-printed JSON), queue, priority, enqueued time, Active Job ID, concurrency key, scheduled/blocked-until metadata, and a Discard button - **Failed jobs** — list with retry / discard / bulk retry / bulk discard; **Failed job detail page** — full error, backtrace, and an inline JSON argument editor; submit to update arguments and retry in one action +- **Error frequency report** — `GET /failed_jobs/errors` groups all failed jobs by exception class and message prefix with a count and expandable sample backtrace; links through to a filtered list for each error group - **Scheduled job management** — "Run Now" and offset buttons (+1h / +24h / +7d) per row update the scheduled time inline via Turbo Stream; "Run All Now (N)" back-dates all matching executions at once - **Recurring task list** — enumerates all `SolidQueue::RecurringTask` records with cron schedule, job class or command, queue, next-run and last-run times, and a static/dynamic badge; each row has a "Run Now" button - **Performance statistics page** — `GET /stats` aggregates finished jobs by class name with execution count, avg, p50, p95, min, and max duration; click any column header to sort; defaults to p95 descending diff --git a/ROADMAP.md b/ROADMAP.md index 7b09ddd..988dbb4 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -11,7 +11,6 @@ The path to v1.0.0 is staged: first achieve feature parity with `solid_queue_das > _Surface patterns in failures, not just individual failed jobs._ -- **Error frequency report** — group all failed jobs by error class + message prefix; show count and a sample backtrace; makes "ArgumentError (x212), TimeoutError (x88)" visible at a glance without clicking into each job - **Failed job trend chart** — "Failures — Last 12 Hours" sparkline on the queue dashboard overview card; makes failure spikes visible before you click into the failed jobs list - **P99 + standard deviation in performance stats** — extend the stats table with a 99th-percentile and std-dev column; high std dev signals inconsistent jobs worth investigating diff --git a/app/assets/stylesheets/solid_stack_web/_09_detail.css b/app/assets/stylesheets/solid_stack_web/_09_detail.css index 608f2a8..4c44a95 100644 --- a/app/assets/stylesheets/solid_stack_web/_09_detail.css +++ b/app/assets/stylesheets/solid_stack_web/_09_detail.css @@ -84,6 +84,29 @@ .sqw-value-truncated { font-size: 12px; margin-top: 0.5rem; } +.sqw-error-details > summary { + cursor: pointer; + list-style: none; + display: block; + max-width: 480px; +} +.sqw-error-details > summary::-webkit-details-marker { display: none; } +.sqw-error-details[open] > summary { margin-bottom: 0.5rem; } + +.sqw-error-backtrace { + font-family: ui-monospace, "SFMono-Regular", Menlo, monospace; + font-size: 11px; + background: var(--bg); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 0.5rem 0.75rem; + overflow-x: auto; + white-space: pre; + max-height: 200px; + overflow-y: auto; + margin-top: 0.25rem; +} + .sqw-link { color: var(--primary); text-decoration: none; } .sqw-link:hover { text-decoration: underline; } diff --git a/app/controllers/solid_stack_web/failed_jobs/errors_controller.rb b/app/controllers/solid_stack_web/failed_jobs/errors_controller.rb new file mode 100644 index 0000000..367ec61 --- /dev/null +++ b/app/controllers/solid_stack_web/failed_jobs/errors_controller.rb @@ -0,0 +1,9 @@ +module SolidStackWeb + module FailedJobs + class ErrorsController < ApplicationController + def index + @groups = ErrorFrequencyReport.new.groups + end + end + end +end diff --git a/app/controllers/solid_stack_web/failed_jobs_controller.rb b/app/controllers/solid_stack_web/failed_jobs_controller.rb index 966c51e..11c8c1a 100644 --- a/app/controllers/solid_stack_web/failed_jobs_controller.rb +++ b/app/controllers/solid_stack_web/failed_jobs_controller.rb @@ -4,6 +4,8 @@ def index respond_to do |format| format.html do scope = ::SolidQueue::FailedExecution.includes(:job).order(created_at: :desc) + @error_class = params[:error_class].presence + scope = scope.where(id: ids_for_error_class(@error_class)) if @error_class @pagy, @executions = pagy(scope) end format.csv do @@ -41,6 +43,15 @@ def retry private + def ids_for_error_class(ec) + ::SolidQueue::FailedExecution.pluck(:id, :error).filter_map do |id, raw| + error = raw.is_a?(Hash) ? raw : JSON.parse(raw) + id if error["exception_class"] == ec + rescue StandardError + nil + end + end + def failed_jobs_csv CSV.generate(headers: true) do |csv| csv << %w[id class_name queue_name error_class error_message failed_at] diff --git a/app/models/solid_stack_web/error_frequency_report.rb b/app/models/solid_stack_web/error_frequency_report.rb new file mode 100644 index 0000000..f3e8d0d --- /dev/null +++ b/app/models/solid_stack_web/error_frequency_report.rb @@ -0,0 +1,34 @@ +module SolidStackWeb + class ErrorFrequencyReport + Row = Data.define(:exception_class, :message_prefix, :count, :sample_backtrace) + + MESSAGE_LIMIT = 120 + + def groups + ::SolidQueue::FailedExecution + .order(created_at: :desc) + .each_with_object({}) do |execution, acc| + key = [execution.exception_class.to_s, message_prefix(execution.message)] + entry = acc[key] ||= { count: 0, sample_backtrace: nil } + entry[:count] += 1 + entry[:sample_backtrace] ||= execution.backtrace + end + .map do |(exception_class, prefix), data| + Row.new( + exception_class: exception_class, + message_prefix: prefix, + count: data[:count], + sample_backtrace: data[:sample_backtrace] + ) + end + .sort_by { |row| -row.count } + end + + private + + def message_prefix(message) + return "" if message.nil? + message.length > MESSAGE_LIMIT ? "#{message[0, MESSAGE_LIMIT]}…" : message + end + end +end diff --git a/app/views/solid_stack_web/failed_jobs/errors/index.html.erb b/app/views/solid_stack_web/failed_jobs/errors/index.html.erb new file mode 100644 index 0000000..cf67cab --- /dev/null +++ b/app/views/solid_stack_web/failed_jobs/errors/index.html.erb @@ -0,0 +1,48 @@ +
+

Error Summary

+
+ <%= link_to "← Failed Jobs", failed_jobs_path, class: "sqw-btn sqw-btn--muted sqw-btn--sm" %> +
+
+ +<% if @groups.any? %> + + + + + + + + + + + <% @groups.each do |group| %> + + + + + + + <% end %> + +
Error ClassMessageCountActions
<%= group.exception_class.presence || "—" %> + <% if group.sample_backtrace.present? %> +
+ + <%= group.message_prefix.presence || "—" %> + +
<%= Array(group.sample_backtrace).first(10).join("\n") %>
+
+ <% else %> + <%= group.message_prefix.presence || "—" %> + <% end %> +
<%= group.count %> + <%= link_to "View Jobs", failed_jobs_path(error_class: group.exception_class), + class: "sqw-btn sqw-btn--muted sqw-btn--sm" %> +
+<% else %> +
+

No failed jobs

+

All clear — your jobs are running without errors.

+
+<% end %> \ No newline at end of file diff --git a/app/views/solid_stack_web/failed_jobs/index.html.erb b/app/views/solid_stack_web/failed_jobs/index.html.erb index 3f3a8d9..29f1f2e 100644 --- a/app/views/solid_stack_web/failed_jobs/index.html.erb +++ b/app/views/solid_stack_web/failed_jobs/index.html.erb @@ -1,6 +1,15 @@
-

Failed Jobs

+
+

Failed Jobs

+ <% if @error_class %> +
+ Filtered by <%= @error_class %> + — <%= link_to "Clear filter", failed_jobs_path %> +
+ <% end %> +
+ <%= link_to "Error Summary", failed_job_errors_path, class: "sqw-btn sqw-btn--muted sqw-btn--sm" %> <%= link_to "Export CSV", failed_jobs_path(format: :csv), class: "sqw-btn sqw-btn--muted sqw-btn--sm", data: { turbo: false } %>
diff --git a/config/routes.rb b/config/routes.rb index 87f36a7..7d832b3 100644 --- a/config/routes.rb +++ b/config/routes.rb @@ -20,6 +20,8 @@ end end + get "failed_jobs/errors", to: "failed_jobs/errors#index", as: :failed_job_errors + resources :failed_jobs, only: [:index, :show, :destroy] do member { post :retry } resource :arguments, only: [:update], controller: "failed_jobs/arguments" diff --git a/spec/requests/solid_stack_web/failed_job_errors_spec.rb b/spec/requests/solid_stack_web/failed_job_errors_spec.rb new file mode 100644 index 0000000..3d39130 --- /dev/null +++ b/spec/requests/solid_stack_web/failed_job_errors_spec.rb @@ -0,0 +1,117 @@ +require "rails_helper" + +RSpec.describe "FailedJobErrors", type: :request do + let(:engine_root) { "/solid_stack" } + + def create_failed(exception_class: "RuntimeError", message: "something went wrong", class_name: "FailingJob") + SolidQueue::Job.skip_callback(:create, :after, :prepare_for_execution) + job = SolidQueue::Job.create!( + class_name:, queue_name: "default", priority: 0, + arguments: { "executions" => 0, "exception_executions" => {} } + ) + execution = SolidQueue::FailedExecution.create!( + job: job, + error: { exception_class:, message:, backtrace: ["app/jobs/failing_job.rb:5"] } + ) + SolidQueue::Job.set_callback(:create, :after, :prepare_for_execution) + execution + end + + describe "GET /failed_jobs/errors" do + it "returns 200" do + get "#{engine_root}/failed_jobs/errors" + expect(response).to have_http_status(:ok) + end + + it "shows an empty state when there are no failed jobs" do + get "#{engine_root}/failed_jobs/errors" + expect(response.body).to include("No failed jobs") + end + + it "groups jobs by exception class and shows the count" do + 2.times { create_failed(exception_class: "ArgumentError", message: "bad arg") } + create_failed(exception_class: "RuntimeError", message: "boom") + + get "#{engine_root}/failed_jobs/errors" + + expect(response.body).to include("ArgumentError") + expect(response.body).to include("RuntimeError") + expect(response.body).to include("2") + end + + it "shows the message prefix" do + create_failed(exception_class: "ArgumentError", message: "wrong number of arguments") + + get "#{engine_root}/failed_jobs/errors" + + expect(response.body).to include("wrong number of arguments") + end + + it "orders groups by count descending" do + 3.times { create_failed(exception_class: "ArgumentError") } + create_failed(exception_class: "RuntimeError") + + get "#{engine_root}/failed_jobs/errors" + + argument_pos = response.body.index("ArgumentError") + runtime_pos = response.body.index("RuntimeError") + expect(argument_pos).to be < runtime_pos + end + + it "renders a View Jobs link for each error group" do + create_failed(exception_class: "TimeoutError") + + get "#{engine_root}/failed_jobs/errors" + + expect(response.body).to include("View Jobs") + expect(response.body).to include("error_class=TimeoutError") + end + + it "includes a link back to the failed jobs list" do + get "#{engine_root}/failed_jobs/errors" + expect(response.body).to include("Failed Jobs") + end + end + + describe "GET /failed_jobs with error_class filter" do + it "filters the list to the given error class" do + create_failed(exception_class: "ArgumentError", class_name: "ArgJob") + create_failed(exception_class: "RuntimeError", class_name: "RuntimeJob") + + get "#{engine_root}/failed_jobs", params: { error_class: "ArgumentError" } + + expect(response.body).to include("ArgJob") + expect(response.body).not_to include("RuntimeJob") + end + + it "shows the active filter and a clear link" do + create_failed(exception_class: "ArgumentError") + + get "#{engine_root}/failed_jobs", params: { error_class: "ArgumentError" } + + expect(response.body).to include("ArgumentError") + expect(response.body).to include("Clear filter") + end + + it "shows all jobs when no filter is set" do + create_failed(exception_class: "ArgumentError", class_name: "ArgJob") + create_failed(exception_class: "RuntimeError", class_name: "RuntimeJob") + + get "#{engine_root}/failed_jobs" + + expect(response.body).to include("ArgJob") + expect(response.body).to include("RuntimeJob") + end + + it "skips executions with unparseable error data" do + execution = create_failed(exception_class: "ArgumentError", class_name: "ArgJob") + allow(::SolidQueue::FailedExecution).to receive(:pluck).and_return( + [[execution.id, "not valid json {{{"]] + ) + + get "#{engine_root}/failed_jobs", params: { error_class: "ArgumentError" } + + expect(response).to have_http_status(:ok) + end + end +end