Skip to content

pick the earliest-occurring level keyword in detect_log_level#23

Open
HrachShah wants to merge 1 commit into
mainfrom
fix/detect-log-level-pick-earliest-match
Open

pick the earliest-occurring level keyword in detect_log_level#23
HrachShah wants to merge 1 commit into
mainfrom
fix/detect-log-level-pick-earliest-match

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 25, 2026

Copy link
Copy Markdown
Owner

What

utils.detect_log_level returns the first regex match in a hardcoded
CRITICAL > ERROR > WARNING > INFO > DEBUG > TRACE order, regardless of
where the keyword actually appears in the line.

Repro

Summary by Sourcery

Update log level detection to choose the earliest-occurring level keyword in a log line and add unit tests to cover the new behavior and edge cases.

Bug Fixes:

  • Fix misclassification of log lines that contain multiple log level keywords by selecting the leftmost match instead of a fixed severity order.

Tests:

  • Add unit tests for detect_log_level covering earliest-match selection, case insensitivity, timestamped lines, non-level lines, and ignoring substring matches.

detect_log_level walked a hardcoded list of (regex, level) tuples in
CRITICAL > ERROR > WARNING > INFO > DEBUG > TRACE order and returned
the first patternprecedence — whichever keyword appears first in the line is the
intent of the line. Without this, '2024-01-15 10:23:45 WARNING cannot
connect to CRITICAL service' was classified as CRITICAL even though
the line is a WARNING. The fix runs every level regex, finds the
leftmost match across all of them, and returns the level that owns
that match. A new tests/test_utils.py pins the new contract: the
leftmost keyword wins, embedded levels in later words do not
override, and words like 'ERROR_RATE' or 'CRITICAL_ERROR' (which
contain the level as a prefix of a longer identifier) are correctly
ignored because the pattern uses \b boundaries.
@sourcery-ai

sourcery-ai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Reviewer's Guide

Update log level detection to choose the earliest-occurring level keyword in a line instead of a fixed severity order, and add unit tests covering the new behavior and existing edge cases.

Flow diagram for updated detect_log_level logic

flowchart TD
    A[detect_log_level line] --> B[Convert line to uppercase line_upper]
    B --> C[Initialize earliest_level = None]
    C --> D[Initialize earliest_index = infinity]
    D --> E[Iterate level_patterns]
    E --> F[re.finditer pattern line_upper]
    F --> G{Any matches?}
    G -->|No| H{More patterns?}
    H -->|Yes| E
    H -->|No| I{earliest_level is not None?}
    G -->|Yes| J[Take first_match.start]
    J --> K{first_match.start < earliest_index?}
    K -->|Yes| L[Update earliest_index and earliest_level]
    K -->|No| H
    L --> H
    I -->|Yes| M[Return earliest_level]
    I -->|No| N[Return UNKNOWN]
Loading

File-Level Changes

Change Details Files
Change log level detection to select the leftmost matching level keyword rather than the first match in a fixed severity-ordered pattern list.
  • Update detect_log_level docstring to describe earliest-occurring keyword behavior.
  • Introduce tracking variables earliest_level and earliest_index initialized to None and infinity, respectively.
  • Replace re.search with re.finditer to collect all matches for each level pattern in the input line.
  • Choose the level whose first match has the smallest start index across all patterns and return it if any match was found.
  • Preserve UNKNOWN as the default return value when no level keyword is present.
src/log_analyzer_cli/utils.py
Add unit tests to validate the new earliest-match behavior and guard existing behaviors like case-insensitivity and word-boundary matching.
  • Add tests for simple level keyword detection, including abbreviations like WARN, CRIT, and ERR.
  • Add tests for timestamp-prefixed log lines to ensure detection after leading timestamps.
  • Add tests verifying that when multiple level keywords are present, the earliest (leftmost) one is chosen, including cases where it is higher severity.
  • Add tests ensuring lines without level keywords return UNKNOWN, including empty and whitespace-only strings.
  • Add tests confirming that word-boundary-based patterns do not match level substrings inside other identifiers and that detection remains case-insensitive.
tests/test_utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • You don't need to materialize all matches with list(re.finditer(...)); using next(re.finditer(...), None) and checking that single result would avoid unnecessary allocations while still letting you compare positions.
  • Consider avoiding float('inf') for earliest_index and instead initializing it to None and adjusting the comparison logic, which can make the intent clearer and removes reliance on magic sentinel values.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- You don't need to materialize all matches with `list(re.finditer(...))`; using `next(re.finditer(...), None)` and checking that single result would avoid unnecessary allocations while still letting you compare positions.
- Consider avoiding `float('inf')` for `earliest_index` and instead initializing it to `None` and adjusting the comparison logic, which can make the intent clearer and removes reliance on magic sentinel values.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@HrachShah, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 37 minutes and 3 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aceeb8e1-ca63-4cde-8c57-1a4816149589

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and 1cc6fa9.

📒 Files selected for processing (2)
  • src/log_analyzer_cli/utils.py
  • tests/test_utils.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/detect-log-level-pick-earliest-match

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant