Skip to content

Enhance robustness of InternalOpenSearchSink tests through scenario-driven coverage, refactoring, and regression prevention#6146

Open
pCastq wants to merge 3 commits into
opensearch-project:mainfrom
pCastq:test/improve-sink-test-coverage-and-structure
Open

Enhance robustness of InternalOpenSearchSink tests through scenario-driven coverage, refactoring, and regression prevention#6146
pCastq wants to merge 3 commits into
opensearch-project:mainfrom
pCastq:test/improve-sink-test-coverage-and-structure

Conversation

@pCastq
Copy link
Copy Markdown
Contributor

@pCastq pCastq commented May 16, 2026

Summary

Adds integration and unit tests covering previously untested branches in InternalOpenSearchSink,
and refactors test classes to eliminate duplication using the Template Method pattern.
This improves overall robustness and helps prevent regressions.

Problem & Solution

Existing State

The codebase contains InternalOpenSearchSinkIntegrationTestAuditAlias, an integration test
that covers the metadata.hasAlias() == true branch with a real cluster configured with a write alias.

Coverage Gap

The complementary path—where the sink must create a concrete index with automatic date-based naming—
was not tested. Additionally, cluster-level failure scenarios (race conditions, acknowledged=false,
generic exceptions) could not be reliably reproduced on a single-node embedded cluster.

Solution

Three coordinated changes eliminate duplication and complete coverage:

  1. New integration test: InternalOpenSearchSinkIntegrationTestConcreteIndex
    exercises index creation on a plain cluster without aliases, covering the default path where
    indices follow the security-auditlog-YYYY.MM.dd pattern.

  2. Extract commonality: AbstractInternalOpenSearchSinkIntegrationTest
    holds the shared test methods (testPersistsAuditEventsToTarget, testAuditDocumentContainsMandatoryFields)
    that are valid regardless of cluster configuration. Both concrete-index and alias variants
    inherit and reuse these tests through method overrides (cluster(), auditTarget()),
    eliminating 68 lines of duplicate code.

  3. Unit test hard-to-reproduce scenarios: InternalOpenSearchSinkTest
    uses Mockito to cover branches that cannot be reliably triggered on an embedded cluster:

    • Race condition: two nodes concurrently creating the same index → ResourceAlreadyExistsException must be treated as success
    • Cluster failures: network, authorization, or I/O errors → must be absorbed and return false
    • acknowledged=false: cluster manager creates the index but times out on shard propagation → must return false gracefully

Changes

New Test Classes

  • InternalOpenSearchSinkIntegrationTestConcreteIndex:
    covers the index creation path with a plain, unaliased cluster; verifies that indices
    are created with the default date-based pattern (security-auditlog-YYYY.MM.dd)

  • InternalOpenSearchSinkTest:
    unit tests with Mockito covering non-reproducible cluster scenarios:
    race conditions (ResourceAlreadyExistsException), generic cluster failures,
    and unacknowledged responses

Refactored Test Classes

  • AbstractInternalOpenSearchSinkIntegrationTest:
    extracted shared integration tests that work against both concrete-index and alias variants;
    subclasses override cluster() and auditTarget() to inject their configuration.
    Eliminates duplication; each variant now contributes only its specialized behavior.

  • InternalOpenSearchSinkIntegrationTestAuditAlias:
    refactored to extend the abstract base; removed 3 duplicate methods
    (countAuditDocs, generateAuditEvent, testAuditDocumentsViaAliasContainMandatoryFields);
    now focuses solely on its unique test: testRecognizesAuditTargetAsWriteAlias

Testing

All existing tests continue to pass. New tests provide complete branch coverage
for InternalOpenSearchSink.createIndexIfAbsent().

Issues Resolved

Closes #6145

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…Sink

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

PR Reviewer Guide 🔍

(Review updated until commit b0ed4d2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The countAuditDocs method does not handle the case where getTotalHits() returns null. While Objects.requireNonNull is used, it will throw NullPointerException if the value is null, which may not be the intended behavior for a test helper method. This could cause test failures that are difficult to diagnose if the search response structure changes or if the cluster is in an unexpected state.

long countAuditDocs(Client client) {
    return Objects.requireNonNull(
        client.search(new SearchRequest(auditTarget()).source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0)))
            .actionGet()
            .getHits()
            .getTotalHits()
    ).value();
Inconsistent Pattern

The test now uses a nested try-with-resources block where client is obtained from CLUSTER.getRestClient() inside the test method, but the outer scope already has access to the cluster via the cluster() method. This creates an inconsistent pattern compared to other tests in the same class that might use cluster().getRestClient(). While not a bug, this inconsistency could confuse maintainers.

try (TestRestClient client = CLUSTER.getRestClient(CLUSTER.getAdminCertificate())) {
    generateAuditEvent("_cluster/health");

    await().until(() -> {
        HttpResponse countResponse = client.postJson(AUDIT_ALIAS + "/_search", """
            {"query": {"match_all": {}}, "size": 0}
            """);
        countResponse.assertStatusCode(200);
        return countResponse.getLongFromJsonBody("/hits/total/value") > 0;
    });

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

PR Code Suggestions ✨

Latest suggestions up to b0ed4d2

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Strengthen date pattern validation

The regex pattern \d{4}\.\d{2}\.\d{2}$ accepts invalid dates like 9999.99.99.
Consider validating that the date components represent actual calendar dates (e.g.,
month 01-12, day 01-31) to prevent false positives when the index naming deviates
from expected format.

src/integrationTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTestConcreteIndex.java [106-110]

 assertThat(
     "All audit indices must follow date-based pattern security-auditlog-YYYY.MM.dd",
-    Arrays.stream(response.indices()).allMatch(name -> name.matches(AUDIT_INDEX_PREFIX + "\\d{4}\\.\\d{2}\\.\\d{2}$")),
+    Arrays.stream(response.indices()).allMatch(name -> 
+        name.matches(AUDIT_INDEX_PREFIX + "\\d{4}\\.(0[1-9]|1[0-2])\\.(0[1-9]|[12]\\d|3[01])$")
+    ),
     is(true)
 );
Suggestion importance[1-10]: 6

__

Why: The improved regex validates month (01-12) and day (01-31) ranges, preventing false positives like 9999.99.99. This strengthens the test assertion, though the current pattern is likely sufficient for detecting major naming issues in practice.

Low
Add descriptive error message

The Objects.requireNonNull() call will throw NullPointerException if getTotalHits()
returns null, but this provides no context about which component failed. Consider
adding a descriptive error message to the requireNonNull() call to aid debugging
when the search response structure is unexpected.

src/integrationTest/java/org/opensearch/security/auditlog/sink/AbstractInternalOpenSearchSinkIntegrationTest.java [73-80]

 long countAuditDocs(Client client) {
     return Objects.requireNonNull(
         client.search(new SearchRequest(auditTarget()).source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0)))
             .actionGet()
             .getHits()
-            .getTotalHits()
+            .getTotalHits(),
+        "getTotalHits() returned null for audit target: " + auditTarget()
     ).value();
 }
Suggestion importance[1-10]: 5

__

Why: Adding a descriptive error message to Objects.requireNonNull() improves debugging when getTotalHits() returns null. However, this is a minor enhancement since such failures are rare in integration tests and the stack trace would still identify the location.

Low

Previous suggestions

Suggestions up to commit f345baf
CategorySuggestion                                                                                                                                    Impact
General
Verify response status in audit event generation

The method silently ignores any exceptions thrown by restClient.get(path), including
authentication failures or network errors. This can lead to flaky tests where audit
events are not generated but the test continues without detection. Verify the
response status or propagate exceptions to ensure event generation succeeded.

src/integrationTest/java/org/opensearch/security/auditlog/sink/AbstractInternalOpenSearchSinkIntegrationTest.java [88-92]

 void generateAuditEvent(String path) {
     try (TestRestClient restClient = cluster().getRestClient(cluster().getAdminCertificate())) {
-        restClient.get(path);
+        TestRestClient.HttpResponse response = restClient.get(path);
+        if (response.getStatusCode() >= 400) {
+            throw new IllegalStateException("Failed to generate audit event for path: " + path + 
+                ", status: " + response.getStatusCode());
+        }
     }
 }
Suggestion importance[1-10]: 7

__

Why: Valid concern about silent failures in test helper methods. Verifying the response status would make tests more robust and failures easier to diagnose. The improved code properly checks for error status codes and provides clear error messages.

Medium
Document null constructor parameters with comments

Passing multiple null arguments to the constructor makes the test fragile and
unclear about which dependencies are actually required. If the constructor validates
these parameters or uses them in createIndexIfAbsent(), the test may fail
unexpectedly. Use mock objects or verify that null is acceptable for these specific
parameters.

src/test/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkTest.java [90-99]

 sink = new InternalOpenSearchSink(
     "test-sink",
     Settings.EMPTY,
-    null,
-    null,
+    null,  // auditLogImpl - not used in createIndexIfAbsent
+    null,  // fallbackSink - not used in createIndexIfAbsent
     client,
     threadPool,
-    null,
+    null,  // auditConfig - not used in createIndexIfAbsent
     clusterService
 );
Suggestion importance[1-10]: 4

__

Why: While adding comments to clarify null parameters improves code readability, this is a minor documentation enhancement. The test is already working correctly, and the suggestion doesn't address any functional issue or bug.

Low
Possible issue
Add explicit null handling for total hits

The Objects.requireNonNull() call will throw NullPointerException if getTotalHits()
returns null, which can occur in certain OpenSearch versions or edge cases. This
will cause test failures with unclear error messages. Add explicit null handling
with a descriptive error message or return a default value.

src/integrationTest/java/org/opensearch/security/auditlog/sink/AbstractInternalOpenSearchSinkIntegrationTest.java [73-80]

 long countAuditDocs(Client client) {
-    return Objects.requireNonNull(
-        client.search(
-            new SearchRequest(auditTarget())
-                .source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0))
-        ).actionGet().getHits().getTotalHits()
-    ).value();
+    var totalHits = client.search(
+        new SearchRequest(auditTarget())
+            .source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0))
+    ).actionGet().getHits().getTotalHits();
+    
+    if (totalHits == null) {
+        throw new IllegalStateException("Search response returned null total hits for audit target: " + auditTarget());
+    }
+    return totalHits.value();
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that getTotalHits() could return null, and the improved code provides better error handling with a descriptive message. However, Objects.requireNonNull() already throws NullPointerException with the object reference, so this is more of a code quality improvement than a critical fix.

Low
Suggestions up to commit e23204b
CategorySuggestion                                                                                                                                    Impact
General
Add descriptive null-check error message

The Objects.requireNonNull() call will throw NullPointerException if getTotalHits()
returns null, but this provides no context about which audit target failed. Add a
descriptive error message to the requireNonNull() call to aid debugging when the
search response is unexpectedly null.

src/integrationTest/java/org/opensearch/security/auditlog/sink/AbstractInternalOpenSearchSinkIntegrationTest.java [73-80]

 long countAuditDocs(Client client) {
     return Objects.requireNonNull(
         client.search(
             new SearchRequest(auditTarget())
                 .source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0))
-        ).actionGet().getHits().getTotalHits()
+        ).actionGet().getHits().getTotalHits(),
+        "Total hits must not be null for audit target: " + auditTarget()
     ).value();
 }
Suggestion importance[1-10]: 5

__

Why: Adding a descriptive error message to Objects.requireNonNull() improves debugging when getTotalHits() returns null. However, in practice, OpenSearch's search API rarely returns null for getTotalHits() in normal operation, making this a minor defensive improvement rather than a critical fix.

Low
Validate month and day ranges

The regex pattern \d{2} allows invalid month/day values like 99.99. While this may
be acceptable for basic pattern matching, consider validating that months are 01-12
and days are 01-31 to catch potential index naming bugs. Use (0[1-9]|1[0-2]) for
months and (0[1-9]|[12][0-9]|3[01]) for days.

src/integrationTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTestConcreteIndex.java [111-114]

 assertThat("All audit indices must follow date-based pattern security-auditlog-YYYY.MM.dd",
     Arrays.stream(response.indices())
-        .allMatch(name -> name.matches(AUDIT_INDEX_PREFIX + "\\d{4}\\.\\d{2}\\.\\d{2}$")),
+        .allMatch(name -> name.matches(AUDIT_INDEX_PREFIX + "\\d{4}\\.(0[1-9]|1[0-2])\\.(0[1-9]|[12][0-9]|3[01])$")),
     is(true));
Suggestion importance[1-10]: 4

__

Why: While the stricter regex would catch invalid date components like 99.99, the current pattern \d{4}\.\d{2}\.\d{2} is sufficient for verifying the index follows the expected naming convention. The sink itself generates valid dates, so this enhancement offers minimal practical benefit in an integration test context.

Low

…riven coverage, refactoring, and regression prevention:

- Add InternalOpenSearchSinkIntegrationTestConcreteIndex: covers the index
  creation path with a plain concrete date-based index (no alias)
- Extract shared tests into AbstractInternalOpenSearchSinkIntegrationTest
  using Template Method pattern: testPersistsAuditEventsToTarget and
  testAuditDocumentContainsMandatoryFields now run against both concrete-index
  and alias variants without duplication
- Refactor InternalOpenSearchSinkIntegrationTestAuditAlias: now extends the
  abstract base, removing 3 duplicate methods (countAuditDocs, generateAuditEvent,
  testAuditDocumentsViaAliasContainMandatoryFields); retains only the specialized
  test testRecognizesAuditTargetAsWriteAlias
- Add InternalOpenSearchSinkTest unit tests: covers race condition
  (ResourceAlreadyExistsException), generic cluster failures, and
  acknowledged=false scenarios that cannot be tested reliably on
  single-node embedded cluster
- Fix resource leak in testCreatesAuditIndexAutomatically

Closes opensearch-project#6145

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
@pCastq pCastq force-pushed the test/improve-sink-test-coverage-and-structure branch from e23204b to f345baf Compare May 16, 2026 01:04
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f345baf

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b0ed4d2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancing robustness of Internal Sink through scenario-driven testing

1 participant