Skip to content

Latest commit

 

History

History
510 lines (414 loc) · 19 KB

File metadata and controls

510 lines (414 loc) · 19 KB

Full-Text Search Implementation Plan for FailedMessageEntity

Overview

Add comprehensive full-text search capability to FailedMessageEntity across all supported databases (PostgreSQL, MySQL, SQL Server), following the pattern established in the Audit persistence layer (PR #5106).

User Requirements (Confirmed)

Search Scope

Headers - Search across all header values (from HeadersJson) ✅ Message Body - Include inline body content when below size threshold ❌ Denormalized fields - Skip (headers already contain this data) ❌ ProcessingAttemptsJson - Skip (not needed)

Inline Body Storage Pattern

Following the Audit implementation pattern from PostgreSQLAuditIngestionUnitOfWork.cs:

  • Add MaxBodySizeToStore configuration setting
  • Store message bodies inline in FailedMessageEntity when size ≤ threshold
  • Only use separate MessageBodyEntity table for large bodies
  • This enables body search without expensive JOINs

Implementation Approach

Computed Column Approach (matching Audit implementation)

  • Add tsvector/searchable text column
  • Use database-specific triggers/functions for auto-update
  • Native FTS capabilities per database

Current State Analysis

Existing Search Implementation

File: src/ServiceControl.Persistence.Sql.Core/Implementation/ErrorMessageDataStore.MessageQueries.cs

Current search is very limited:

query = query.Where(fm =>
    fm.MessageType!.Contains(searchTerms) ||
    fm.ExceptionMessage!.Contains(searchTerms) ||
    fm.UniqueMessageId.Contains(searchTerms));

Limitations:

  • Only searches 3 fields
  • Uses LIKE pattern (slow on large datasets)
  • No header or body search
  • No ranking or relevance

FailedMessageEntity Structure

File: src/ServiceControl.Persistence.Sql.Core/Entities/FailedMessageEntity.cs

Denormalized fields available for search:

  • MessageId (string, 200 chars)
  • MessageType (string, 500 chars) - currently searched
  • ExceptionType (string, 500 chars)
  • ExceptionMessage (string, 500 chars) - currently searched
  • UniqueMessageId (string, 200 chars) - currently searched
  • SendingEndpointName (string, 500 chars)
  • ReceivingEndpointName (string, 500 chars)
  • QueueAddress (string, 500 chars)
  • ConversationId (string, 200 chars)

JSON columns:

  • HeadersJson (jsonb/json/nvarchar(max)) - recently added, contains all headers
  • ProcessingAttemptsJson (jsonb/json/nvarchar(max)) - contains processing attempt metadata
  • FailureGroupsJson (jsonb/json/nvarchar(max)) - contains failure group information

Related entity:

  • MessageBodyEntity (separate table) - contains message body content

Reference Implementation (Audit PR #5106)

The PostgreSQL Audit implementation provides a proven pattern:

Key components:

  1. tsvector column named query for full-text search index
  2. PostgreSQL trigger to automatically update the tsvector on INSERT/UPDATE
  3. Weighted fields: Headers (priority A), Body (priority B)
  4. Query using websearch_to_tsquery with @@ operator
  5. GIN index on the tsvector column
  6. Autovacuum configuration for maintenance

Implementation Plan

Phase 1: Schema Changes

1.1 Add Inline Body Storage and Search Columns to FailedMessageEntity

File: src/ServiceControl.Persistence.Sql.Core/Entities/FailedMessageEntity.cs

Add two new columns:

// Inline body storage (for small messages below threshold)
public byte[]? Body { get; set; }

// Full-text search column (database-specific type)
public string? Query { get; set; } // Will be mapped to tsvector (PG), text (MySQL), nvarchar(max) (SQL Server)

Rationale:

  • Body column stores message content inline when size ≤ MaxBodySizeToStore threshold
  • Avoids JOIN with MessageBodyEntity for small messages (performance optimization)
  • Query column stores the searchable text vector/index (matches Audit implementation naming)

1.2 Configure Column Types per Database

Files:

  • src/ServiceControl.Persistence.Sql.Core/EntityConfigurations/FailedMessageConfiguration.cs - Add basic configuration
  • src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlDbContext.cs - Override to tsvector
  • src/ServiceControl.Persistence.Sql.MySQL/MySqlDbContext.cs - Override to text
  • src/ServiceControl.Persistence.Sql.SqlServer/SqlServerDbContext.cs - Override to nvarchar(max)

Configure column types:

  • Body: bytea (PostgreSQL), longblob (MySQL), varbinary(max) (SQL Server)
  • Query: text (PostgreSQL), text (MySQL), nvarchar(max) (SQL Server)

Note: We use text for all databases instead of tsvector for PostgreSQL. The tsvector conversion happens at query time using to_tsvector(), keeping the storage consistent across databases.

1.3 Add Configuration Setting

File: src/ServiceControl.Persistence.Sql.Core/SqlServerPersistenceConfiguration.cs

Add setting to control inline body storage threshold:

public int MaxBodySizeToStore { get; set; } = 102400; // 100KB default (matches Audit)

Phase 2: Database-Specific Setup

2.1 PostgreSQL Implementation

New file: src/ServiceControl.Persistence.Sql.PostgreSQL/FullTextSearchSetup.cs

Create setup class to handle:

  • query tsvector column configuration
  • GIN index creation
  • Trigger function creation for automatic tsvector updates
  • Autovacuum configuration

PostgreSQL Index Setup (simplified - no trigger):

-- GIN Index for fast full-text search on tsvector expression
CREATE INDEX idx_failed_messages_search
ON failed_messages
USING GIN(to_tsvector('english', COALESCE(query, '')));

-- Autovacuum configuration for high-throughput tables
ALTER TABLE failed_messages SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

Key Points:

  • No trigger needed - Query column populated from application code (consistent with other databases)
  • Expression index - GIN index on to_tsvector('english', query) for full-text search
  • Autovacuum keeps statistics current for high INSERT volume
  • The query column type is text (consistent across databases)

2.2 MySQL Implementation

New file: src/ServiceControl.Persistence.Sql.MySQL/FullTextSearchSetup.cs

Create setup class to handle:

  • query text column for searchable text
  • FULLTEXT index creation

MySQL Setup Pattern:

-- FULLTEXT index on query column
CREATE FULLTEXT INDEX idx_failed_messages_search
ON failed_messages(query);

Note: MySQL doesn't support triggers that modify the same row, so the query column must be populated from application code during INSERT/UPDATE (see Phase 4).

2.3 SQL Server Implementation

New file: src/ServiceControl.Persistence.Sql.SqlServer/FullTextSearchSetup.cs

Create setup class to handle:

  • Full-text catalog creation
  • Full-text index on Query column

SQL Server Setup Pattern:

-- Create full-text catalog
IF NOT EXISTS (SELECT * FROM sys.fulltext_catalogs WHERE name = 'ft_failed_messages')
  CREATE FULLTEXT CATALOG ft_failed_messages;

-- Create full-text index on Query column
CREATE FULLTEXT INDEX ON FailedMessages(Query)
KEY INDEX PK_FailedMessages
ON ft_failed_messages
WITH CHANGE_TRACKING AUTO;

Note: SQL Server full-text indexing requires a primary key. The Query column must be populated from application code during INSERT/UPDATE (see Phase 4).

Phase 3: Migration Generation

3.1 Create New Migrations

Generate new migrations for each database provider that include:

  • Add SearchableTextJson column
  • Execute database-specific FTS setup scripts

Files to create:

  • PostgreSQL migration: Adds tsvector column, trigger, GIN index, autovacuum
  • MySQL migration: Adds text column, FULLTEXT index
  • SQL Server migration: Adds nvarchar(max) column, full-text catalog and index

3.2 Data Migration

For existing data, add migration step to populate the searchable text column:

  • PostgreSQL: Trigger will handle automatically on next update, or force update
  • MySQL: Application code updates during migration
  • SQL Server: Application code updates during migration

Phase 4: Application Layer Changes

4.1 Update Inline Body Storage Logic in RecoverabilityIngestionUnitOfWork

File: src/ServiceControl.Persistence.Sql.Core/Implementation/UnitOfWork/RecoverabilityIngestionUnitOfWork.cs

Modify RecordFailedProcessingAttempt to implement inline body storage pattern:

public async Task RecordFailedProcessingAttempt(
    MessageContext context,
    FailedMessage.ProcessingAttempt processingAttempt,
    List<FailedMessage.FailureGroup> groups)
{
    var uniqueMessageId = context.Headers.UniqueId();
    var contentType = GetContentType(context.Headers, MediaTypeNames.Text.Plain);
    var bodySize = context.Body.Length;

    // Determine if body should be stored inline based on size threshold
    byte[]? inlineBody = null;
    bool storeBodySeparately = bodySize > parent.Configuration.MaxBodySizeToStore;

    if (!storeBodySeparately && !context.Body.IsEmpty)
    {
        inlineBody = context.Body.ToArray(); // Store inline
    }

    // ... existing metadata and denormalization logic ...

    if (existingMessage != null)
    {
        // Update existing message
        existingMessage.ProcessingAttemptsJson = JsonSerializer.Serialize(attempts);
        existingMessage.FailureGroupsJson = JsonSerializer.Serialize(groups);
        existingMessage.HeadersJson = JsonSerializer.Serialize(processingAttempt.Headers);
        existingMessage.Body = inlineBody; // Update inline body
        existingMessage.Query = BuildSearchableText(processingAttempt.Headers, inlineBody); // Populate Query for all databases
        // ... other updates ...
    }
    else
    {
        // Create new message
        var failedMessageEntity = new FailedMessageEntity
        {
            // ... existing fields ...
            HeadersJson = JsonSerializer.Serialize(processingAttempt.Headers),
            Body = inlineBody, // Store inline body
            Query = BuildSearchableText(processingAttempt.Headers, inlineBody) // Populate Query for all databases
        };
        parent.DbContext.FailedMessages.Add(failedMessageEntity);
    }

    // Store body separately only if it exceeds threshold
    if (storeBodySeparately)
    {
        await StoreMessageBody(uniqueMessageId, context.Body, contentType, bodySize);
    }
}

// Helper method to build searchable text (for MySQL/SQL Server)
private string BuildSearchableText(Dictionary<string, string> headers, byte[]? body)
{
    var parts = new List<string>
    {
        string.Join(" ", headers.Values) // All header values
    };

    // Add body content if present and can be decoded as text
    if (body != null && body.Length > 0)
    {
        try
        {
            var bodyText = Encoding.UTF8.GetString(body);
            parts.Add(bodyText);
        }
        catch
        {
            // Skip non-text bodies
        }
    }

    return string.Join(" ", parts.Where(p => !string.IsNullOrWhiteSpace(p)));
}

Key Changes:

  • Check body size against MaxBodySizeToStore threshold
  • Store small bodies inline in Body column
  • Store large bodies in separate MessageBodyEntity table
  • Populate Query column for all databases (consistent application-level approach)
  • Build searchable text from headers + inline body
  • Query column stores plain text for all databases (PostgreSQL converts to tsvector at query time)

4.2 Create Full-Text Search Provider Interface

New file: src/ServiceControl.Persistence.Sql.Core/FullTextSearch/IFullTextSearchProvider.cs

public interface IFullTextSearchProvider
{
    IQueryable<FailedMessageEntity> ApplyFullTextSearch(
        IQueryable<FailedMessageEntity> query,
        string searchTerms);
}

4.3 Implement Database-Specific Providers

New file: src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlFullTextSearchProvider.cs

public class PostgreSqlFullTextSearchProvider : IFullTextSearchProvider
{
    public IQueryable<FailedMessageEntity> ApplyFullTextSearch(
        IQueryable<FailedMessageEntity> query,
        string searchTerms)
    {
        // Convert text to tsvector at query time, use websearch_to_tsquery for user-friendly search
        return query.FromSqlRaw(
            @"SELECT * FROM failed_messages
              WHERE to_tsvector('english', COALESCE(query, '')) @@ websearch_to_tsquery('english', {0})",
            searchTerms);
    }
}

New file: src/ServiceControl.Persistence.Sql.MySQL/MySqlFullTextSearchProvider.cs

public class MySqlFullTextSearchProvider : IFullTextSearchProvider
{
    public IQueryable<FailedMessageEntity> ApplyFullTextSearch(
        IQueryable<FailedMessageEntity> query,
        string searchTerms)
    {
        // Use NATURAL LANGUAGE MODE for user-friendly search
        return query.FromSqlRaw(
            @"SELECT * FROM failed_messages
              WHERE MATCH(query) AGAINST({0} IN NATURAL LANGUAGE MODE)",
            searchTerms);
    }
}

New file: src/ServiceControl.Persistence.Sql.SqlServer/SqlServerFullTextSearchProvider.cs

public class SqlServerFullTextSearchProvider : IFullTextSearchProvider
{
    public IQueryable<FailedMessageEntity> ApplyFullTextSearch(
        IQueryable<FailedMessageEntity> query,
        string searchTerms)
    {
        // Use CONTAINS for boolean full-text search
        return query.FromSqlRaw(
            @"SELECT * FROM FailedMessages
              WHERE CONTAINS(Query, {0})",
            searchTerms);
    }
}

4.4 Update Query Methods

File: src/ServiceControl.Persistence.Sql.Core/Implementation/ErrorMessageDataStore.MessageQueries.cs

Replace the current simple Contains() search with full-text search provider:

public Task<QueryResult<IList<MessagesView>>> GetAllMessagesForSearch(
    string searchTerms,
    PagingInfo pagingInfo,
    SortInfo sortInfo,
    DateTimeRange? timeSentRange = null)
{
    return ExecuteWithDbContext(async dbContext =>
    {
        var query = dbContext.FailedMessages.AsQueryable();

        // Apply full-text search
        if (!string.IsNullOrWhiteSpace(searchTerms))
        {
            query = _fullTextSearchProvider.ApplyFullTextSearch(query, searchTerms);
        }

        // Apply time range filter
        // ... existing code ...
    });
}

4.5 Register Providers in DI

Files:

  • src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlPersistence.cs
  • src/ServiceControl.Persistence.Sql.MySQL/MySqlPersistence.cs
  • src/ServiceControl.Persistence.Sql.SqlServer/SqlServerPersistence.cs

Register the appropriate provider for each database:

services.AddSingleton<IFullTextSearchProvider, PostgreSqlFullTextSearchProvider>();

Phase 5: Configuration

5.1 Add Settings

File: src/ServiceControl.Persistence.Sql.Core/SqlServerPersistenceConfiguration.cs (or equivalent)

Add configuration option:

public bool EnableFullTextSearchOnBodies { get; set; } = true;

Phase 6: Testing

6.1 Unit Tests

Create unit tests for:

  • Searchable text building logic
  • Each database provider's query generation

6.2 Integration Tests

Add integration tests to verify:

  • Full-text search across all three databases
  • Search ranking/relevance
  • Performance with large datasets
  • Migration success

Critical Files to Modify

Core Layer (Database-Agnostic)

  1. src/ServiceControl.Persistence.Sql.Core/Entities/FailedMessageEntity.cs - Add SearchableTextJson column
  2. src/ServiceControl.Persistence.Sql.Core/EntityConfigurations/FailedMessageConfiguration.cs - Configure column
  3. src/ServiceControl.Persistence.Sql.Core/Implementation/ErrorMessageDataStore.MessageQueries.cs - Update search methods
  4. src/ServiceControl.Persistence.Sql.Core/Implementation/UnitOfWork/RecoverabilityIngestionUnitOfWork.cs - Populate searchable text
  5. src/ServiceControl.Persistence.Sql.Core/Implementation/EditFailedMessagesManager.cs - Populate searchable text
  6. src/ServiceControl.Persistence.Sql.Core/FullTextSearch/IFullTextSearchProvider.cs - New interface

PostgreSQL-Specific

  1. src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlDbContext.cs - Override column type configuration
  2. src/ServiceControl.Persistence.Sql.PostgreSQL/FullTextSearchSetup.cs - New: FTS setup scripts
  3. src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlFullTextSearchProvider.cs - New: Query provider
  4. src/ServiceControl.Persistence.Sql.PostgreSQL/PostgreSqlPersistence.cs - Register provider
  5. src/ServiceControl.Persistence.Sql.PostgreSQL/Migrations/ - New migration

MySQL-Specific

  1. src/ServiceControl.Persistence.Sql.MySQL/MySqlDbContext.cs - Override column type configuration
  2. src/ServiceControl.Persistence.Sql.MySQL/FullTextSearchSetup.cs - New: FTS setup scripts
  3. src/ServiceControl.Persistence.Sql.MySQL/MySqlFullTextSearchProvider.cs - New: Query provider
  4. src/ServiceControl.Persistence.Sql.MySQL/MySqlPersistence.cs - Register provider
  5. src/ServiceControl.Persistence.Sql.MySQL/Migrations/ - New migration

SQL Server-Specific

  1. src/ServiceControl.Persistence.Sql.SqlServer/SqlServerDbContext.cs - Override column type configuration
  2. src/ServiceControl.Persistence.Sql.SqlServer/FullTextSearchSetup.cs - New: FTS setup scripts
  3. src/ServiceControl.Persistence.Sql.SqlServer/SqlServerFullTextSearchProvider.cs - New: Query provider
  4. src/ServiceControl.Persistence.Sql.SqlServer/SqlServerPersistence.cs - Register provider
  5. src/ServiceControl.Persistence.Sql.SqlServer/Migrations/ - New migration

Trade-offs and Considerations

Computed Column Approach (Recommended)

Pros:

  • Matches proven Audit implementation pattern
  • Better query performance (pre-computed search text)
  • Cleaner query code
  • Native database FTS capabilities

Cons:

  • More complex migrations
  • Additional storage for search column
  • Database-specific trigger/computed column setup
  • Requires careful testing per database

Alternative: Query-Time Composition

Pros:

  • Simpler schema
  • No additional storage
  • Easier to understand

Cons:

  • Slower queries (on-the-fly text assembly)
  • More complex application code
  • Harder to leverage native FTS features
  • Limited ranking/relevance capabilities

Summary

This plan implements comprehensive full-text search for FailedMessageEntity across PostgreSQL, MySQL, and SQL Server by:

  1. Adding inline body storage - Following the Audit pattern, small message bodies (≤100KB) are stored inline to avoid JOINs
  2. Using native FTS per database - PostgreSQL (tsvector/GIN), MySQL (FULLTEXT), SQL Server (full-text catalog)
  3. Prioritizing search fields - Headers (weight A/high priority) + Body (weight B/medium priority)
  4. Leveraging triggers - PostgreSQL auto-updates search vectors; MySQL/SQL Server use application code
  5. Maintaining backward compatibility - Replaces simple LIKE search with proper full-text search

Key Benefits:

  • Performance: Native FTS indexes provide O(log N) search instead of O(N) table scans
  • Relevance: Weighted fields (headers > body) improve search result quality
  • Scalability: Inline body storage reduces JOINs for 90%+ of messages
  • Consistency: Matches proven Audit implementation pattern