fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267
fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267niemyjski wants to merge 1 commit into
Conversation
|
|
||
| namespace Exceptionless.Tests.Jobs; | ||
|
|
||
| public class CleanupOrphanedDataJobTests : IntegrationTestsBase |
There was a problem hiding this comment.
tests need 3 name part with act, assert, arrange. check pr.
| } | ||
|
|
||
| [Fact] | ||
| public async Task CanCleanupMultipleSoftDeletedOrganizations() |
There was a problem hiding this comment.
tests need 3 name part with act, assert, arrange. check pr.
| ); | ||
| } | ||
|
|
||
| public async Task<IReadOnlyCollection<string>> GetDuplicateSignaturesAsync(int maxResults = 10000) |
There was a problem hiding this comment.
Make sure we have integration tests for this.
|
|
||
| public async Task<IReadOnlyCollection<string>> GetDuplicateSignaturesAsync(int maxResults = 10000) | ||
| { | ||
| var result = await CountAsync(q => q.FilterExpression("is_deleted:false") |
There was a problem hiding this comment.
soft deletes should be filtered out by default.
| if (buckets == null || buckets.Count == 0) | ||
| return Array.Empty<string>(); |
There was a problem hiding this comment.
use pattern matching and use []
| public Task<long> RemoveAllByProjectIdsAsync(string[] projectIds) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(projectIds); | ||
| if (projectIds.Length == 0) |
There was a problem hiding this comment.
prefer is and is not over == and !=, check pr.
| return RemoveAllAsync(q => q.Organization(organizationIds)); | ||
| } | ||
|
|
||
| public Task<long> ReassignStackAsync(IEnumerable<string> sourceStackIds, string targetStackId) |
There was a problem hiding this comment.
this feels dangerous where is it used? make sure we have integration tests for this and every method added in repos.
| if (composite?.Buckets == null || composite.Buckets.Count == 0) | ||
| return Array.Empty<string>(); |
There was a problem hiding this comment.
pattern matching here and return [] check pr for things like this.
| { | ||
| var search = await _configuration.Client.SearchAsync<PersistentEvent>(s => | ||
| { | ||
| s.Size(0).Aggregations(a => a |
There was a problem hiding this comment.
we should document here why we are using a composite agg. and performance / concerns.
| Task<IReadOnlyCollection<string>> GetDistinctOrganizationIdsAsync(int batchSize, CompositeKeyResult? afterKey = null); | ||
| } | ||
|
|
||
| public class CompositeKeyResult |
| catch (Exception ex) | ||
| { | ||
| error++; | ||
| _logger.LogError(ex, "Error fixing duplicate stack {ProjectId} {SignatureHash}", projectId, signature); |
There was a problem hiding this comment.
look at all log statements here we always include :{Message}
| await _stackRepository.SaveAsync(duplicateStacks); | ||
| await _stackRepository.SaveAsync(targetStack); |
There was a problem hiding this comment.
why do we call this twice when you can just call [] overload.
| namespace Exceptionless.Core.Jobs; | ||
|
|
||
| [Job(Description = "Deletes orphaned data.", IsContinuous = false)] | ||
| public class CleanupOrphanedDataJob : JobWithLockBase, IHealthCheck |
There was a problem hiding this comment.
make sure we have 100% coverage for this before we change this and it's passing in origin/main.
| private async Task CleanupSoftDeletedProjectsAsync(JobContext context) | ||
| { | ||
| var projectResults = await _projectRepository.GetAllAsync(o => o.SoftDeleteMode(SoftDeleteQueryMode.DeletedOnly).SearchAfterPaging().PageLimit(5)); | ||
| var projectResults = await _projectRepository.GetAllAsync(o => o.SoftDeleteMode(SoftDeleteQueryMode.DeletedOnly).SearchAfterPaging().PageLimit(100)); |
There was a problem hiding this comment.
should we be changing the page size in this pr? wouldn't it be better to get smaller projects incase they are removed etc.. and search over those.
| var projectResults = await _projectRepository.GetAllAsync(o => o.SoftDeleteMode(SoftDeleteQueryMode.DeletedOnly).SearchAfterPaging().PageLimit(100)); | ||
| _logger.CleanupProjectSoftDeletes(projectResults.Total); | ||
|
|
||
| while (projectResults.Documents.Count > 0 && !context.CancellationToken.IsCancellationRequested) |
There was a problem hiding this comment.
during cleanup what's renewing the job lock?
154223e to
7788c75
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors cleanup jobs to rely on repository abstractions, improves duplicate stack cleanup behavior, and adds integration coverage for cleanup/repository operations.
Changes:
- Reworked orphaned-data and cleanup jobs with lock renewal, cancellation checks, and repository-based deletes/updates.
- Added event repository helpers for bulk deletion, stack reassignment, and distinct-id aggregation.
- Added integration tests for cleanup pagination, retention, duplicate signatures, and event repository operations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/Exceptionless.Core/Jobs/CleanupDataJob.cs |
Extends lock duration and renews locks during paged cleanup/retention loops. |
src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs |
Replaces direct Elasticsearch calls with repository methods for orphan cleanup and duplicate-stack fixing. |
src/Exceptionless.Core/Repositories/EventRepository.cs |
Adds bulk delete, stack reassignment, and composite aggregation helpers. |
src/Exceptionless.Core/Repositories/Interfaces/IEventRepository.cs |
Exposes new event repository cleanup/query APIs and composite cursor type. |
src/Exceptionless.Core/Repositories/StackRepository.cs |
Adds duplicate signature aggregation lookup. |
src/Exceptionless.Core/Repositories/Interfaces/IStackRepository.cs |
Exposes duplicate signature lookup API. |
tests/Exceptionless.Tests/Jobs/CleanupDataJobTests.cs |
Adds cleanup pagination and retention integration coverage. |
tests/Exceptionless.Tests/Jobs/CleanupOrphanedDataJobTests.cs |
Adds integration coverage for orphan cleanup and duplicate stack merging. |
tests/Exceptionless.Tests/Repositories/EventRepositoryTests.cs |
Adds coverage for distinct ids, stack reassignment, and bulk delete helpers. |
tests/Exceptionless.Tests/Repositories/StackRepositoryTests.cs |
Adds duplicate signature repository coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public Task<long> ReassignStackAsync(IEnumerable<string> sourceStackIds, string targetStackId) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(sourceStackIds); | ||
| ArgumentException.ThrowIfNullOrEmpty(targetStackId); | ||
|
|
||
| return PatchAllAsync( | ||
| q => q.Stack(sourceStackIds), | ||
| new ScriptPatch("ctx._source.stack_id = params.targetStackId") |
|
|
||
| var buckets = duplicateStackAgg.Aggregations.Terms("stacks")?.Buckets ?? new List<Nest.KeyedBucket<string>>(); | ||
| int total = buckets.Count; | ||
| var duplicateSignatures = await _stackRepository.GetDuplicateSignaturesAsync(); |
…overage - Refactor CleanupOrphanedDataJob to use repository methods exclusively (eliminates all direct IElasticClient usage) - Fix critical bug: @min_count:2 → @min:2 in GetDuplicateSignaturesAsync (invalid syntax was silently ignored, returning ALL signatures as duplicates) - Add lock renewal at page boundaries in CleanupDataJob - Add OperationCanceledException filter before generic catch - Convert CompositeKeyResult class to record - Use pattern matching (is null/is not null, is []) throughout - Remove redundant is_deleted:false filter (repository handles soft deletes) - Revert page size to 5 (2.5s sleep/item makes large pages impractical) - Consolidate duplicate SaveAsync calls using spread syntax - Add XML documentation for composite aggregation and script safety - Add 8 new integration tests for EventRepository and StackRepository - Rename all tests to Method_Given_Expected convention Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7788c75 to
3c5880b
Compare
Summary
Comprehensive optimization of cleanup jobs: eliminate all direct Elasticsearch client usage, fix critical bugs, restore correct loop semantics, and add full integration test coverage.
Critical Bug Fixes
@min_count:2→@min:2inGetDuplicateSignaturesAsyncThe
@min_countsyntax is silently ignored by Foundatio Parsers, causing the aggregation to return ALL signature hashes (not just duplicates). This would have causedFixDuplicateStacksto treat every stack as a duplicate on the next run.FixDuplicateStacksonly ran one batch (regression from refactor)The original code looped until
GetDuplicateSignaturesAsyncreturned empty. The refactor broke this. Restored thewhileloop with:ImmediateConsistency()onCountAsyncinsideGetDuplicateSignaturesAsync(one refresh per batch, matching originalIndices.RefreshAsynccall pattern — NOT per item)ReassignStackAsyncdata-loss hazard on empty sequenceIf
sourceStackIdswas empty,PatchAllAsyncwould have no stack filter and would reassign ALL events to the target stack. Added materialization + early return guard.FixDuplicateStacksevent-first orderingMoved
ReassignStackAsyncbeforeSaveAsync(soft-delete). If event reassignment fails, duplicate stacks remain visible and no data is lost. Previously: soft-delete first → event reassignment failure → orphan cleanup deletes the events.GetDistinctFieldValuesAsyncafterKey cursor leakafterKeywas only populated whencomposite.AfterKey != null, never cleared. Added: always clear cursor first, then repopulate. Callers checkingafterKey.AfterKey.Count > 0now correctly detect end-of-pagination.Architecture (eliminate raw ES client from jobs)
CleanupOrphanedDataJobto use repository methods exclusivelyGetDistinctFieldValuesAsyncusing composite aggregation (encapsulated in repository — composite aggregation is not in Foundatio's DSL, so raw client use is justified and documented)RemoveAllByProjectIds/RemoveAllByOrganizationIds,RemoveAllByStackIdsAsynctoIEventRepositoryReassignStackAsyncusing parameterized Painless scriptGetDuplicateSignaturesAsynctoIStackRepositoryOther Fixes
OperationCanceledExceptionfilter before generic catchCleanupDataJobSaveAsynccalls into one using spread syntaxis_deleted:falsefilter (repository applies soft-delete filter by default)CompositeKeyResultconverted fromclasstorecordis null/is not null,is []):{Message}to error log format stringsRebased onto main
Resolved merge conflict with
#2268(Deleted counter tests) — preserved all tests from both branches.Test Coverage (39 job tests + 278 repo tests, all passing)
New job tests (named
Method_Given_Expected):RunAsync_SuspendedOrganization_SuspendsRelatedTokensRunAsync_SoftDeletedOrganization_RemovesAllRelatedDataRunAsync_SoftDeletedProject_RemovesProjectAndEventsRunAsync_SoftDeletedStack_RemovesStackAndEventsRunAsync_EventsOutsideRetentionPeriod_RemovesExpiredEventsDeleteOrphanedEventsByStack_WithLargeDataset_DeletesAllOrphanedEventsCleanupSoftDeletedOrganizations_WithMultiplePages_RemovesAllDataCleanupSoftDeletedStacks_WithMultiplePages_RemovesAllStacksEnforceRetention_WithMultipleOrganizations_RespectsPerOrgRetentionEnforceRetention_WithEventsOutsideRetention_DeletesOnlyExpiredEvents#2268(merged)New repository tests:
GetDistinctStackIds_WithMultipleStacks_ReturnsAllUniqueIdsGetDistinctStackIds_WithPagination_ReturnsAllIdsReassignStack_WithSourceEvents_MovesAllEventsToTargetRemoveAllByProjectIds_WithMatchingEvents_RemovesAllRemoveAllByOrganizationIds_WithMatchingEvents_RemovesAllGetDuplicateSignatures_WithDuplicates_ReturnsSignaturesGetDuplicateSignatures_WithNoDuplicates_ReturnsEmptyGetDuplicateSignatures_WithSoftDeletedStacks_ExcludesThem