fix: parse group/pipe job logs correctly by nh13 · Pull Request #52 · fg-labs/snakesee

nh13 · 2026-03-28T04:27:33Z

Summary

Snakemake indents log output for jobs within pipe()/group: blocks by 4 spaces, causing the parser to miss rule names and timestamps for grouped jobs
Fix RULE_START_PATTERN.match() → .match(line.lstrip()) across all 8 call sites in core.py and add indented-line handling to LogLineParser in line_parser.py
Fix TIMESTAMP_PATTERN.match() → .search() and remove line.startswith("[") guards that blocked indented timestamps

Test plan

13 new tests covering group job parsing across parse_running_jobs_from_log, parse_failed_jobs_from_log, parse_completed_jobs_from_log, parse_all_jobs_from_log, and LogLineParser
All 148 existing parser tests still pass
Full suite (1033 tests) passes
ruff, mypy clean

Closes #42

Summary by CodeRabbit

Bug Fixes
- Log parsing is now whitespace-tolerant for indented rule/group blocks and timestamps; indented timestamps and rule starts are correctly recognized. Job completion counting now attributes completions to the correct individual jobs for accurate rule counts.
Tests
- Added extensive tests covering indented/group job parsing (running, failed, completed, scheduled) and timestamp regressions; relaxed an integration assertion to allow CI variability in reported total job counts.

coderabbitai · 2026-03-28T04:27:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3bff1483-0fbe-454e-bb39-2e832b5f142f

📥 Commits

Reviewing files that changed from the base of the PR and between 41b0ffa and 3c42206.

📒 Files selected for processing (4)

snakesee/parser/core.py
snakesee/parser/line_parser.py
tests/integration/test_workflows.py
tests/test_parser.py

✅ Files skipped from review due to trivial changes (1)

tests/integration/test_workflows.py

🚧 Files skipped from review as they are similar to previous changes (1)

snakesee/parser/core.py

📝 Walkthrough

Walkthrough

Rule-start and timestamp detection were made whitespace-tolerant (matching on left-stripped lines). Indented/grouped log lines are routed to a new indented-line handler that recognizes indented timestamps and rule starts and flushes pending ERROR state. Completed-job attribution now uses a JOBID→rule mapping. Tests for group/pipe log parsing were added.

Changes

Cohort / File(s)	Summary
Core parsing logic `snakesee/parser/core.py`	Added `job_rules: dict[str, str]` to map JOBID→rule for completion attribution; switched rule-start and timestamp checks to use `PATTERN.match(line.lstrip())` across multiple parsers to be whitespace-tolerant.
Indented line parsing `snakesee/parser/line_parser.py`	Added `LogLineParser._parse_indented_or_group_line(...)` and routed all indented lines to it. Handler left-strips lines, recognizes indented timestamps and rule/checkpoint starts, flushes pending ERROR events, updates context, and delegates property parsing as needed.
Tests / Fixtures `tests/test_parser.py`, `tests/integration/test_workflows.py`	Added extensive fixtures and unit/integration tests for grouped/pipe job logs (running/failed/completed/scheduled), regression tests for timestamp handling, and relaxed an integration assertion around PROGRESS-derived total_jobs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: parse checkpoint and localcheckpoint rule lines #53: Overlaps changes to rule-start detection and line-parser rule-start handling.
fix: defer error event emission until error block fully parsed #32: Related adjustments to indented/error-block handling and flushing pending ERROR events.
fix: show actual job IDs and look up logs by job_id #23: Touches the same parsing flows for running/failed/completed jobs and job-log propagation.

Poem

🐰 I hopped through indents, sniffed each line,

JobIDs led me to rules that brightly shine,
Timestamps unmasked where spaces used to hide,
Grouped jobs stood up, no longer brushed aside,
🥕📜 Hop, parse, and find!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: parse group/pipe job logs correctly' clearly and concisely describes the main change: fixing the parser to handle group/pipe job logs.
Description check	✅ Passed	The description is well-structured with a clear summary of the problem, solution, and comprehensive test plan, exceeding the template requirements.
Linked Issues check	✅ Passed	The PR fully addresses issue `#42` by fixing the parser to detect group/pipe jobs through handling indented log lines and correctly extracting rule names and timestamps.
Out of Scope Changes check	✅ Passed	All changes directly address the group/pipe job parsing problem, with parser logic updates, line parsing enhancements, comprehensive tests, and a focused integration test adjustment.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch nh/fix-group-job-parsing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-28T04:28:55Z

Codecov Report

❌ Patch coverage is 90.90909% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.92%. Comparing base (4053236) to head (3c42206).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
snakesee/parser/core.py	88.88%	2 Missing ⚠️
snakesee/parser/line_parser.py	92.30%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   88.59%   88.92%   +0.32%     
==========================================
  Files          52       52              
  Lines        4928     4955      +27     
==========================================
+ Hits         4366     4406      +40     
+ Misses        562      549      -13

Files with missing lines	Coverage Δ
snakesee/parser/core.py	`85.23% <88.88%> (+2.79%)`	⬆️
snakesee/parser/line_parser.py	`97.74% <92.30%> (-0.97%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

snakesee/parser/core.py (1)

266-270: ⚠️ Potential issue | 🟠 Major

parse_rules_from_log() still miscounts grouped completions.

This function still increments the last seen rule for every finish line. In grouped output, multiple rules can be opened before any of them finish, so the last rule block wins. With the grouped fixture added in this PR, both consumer completions would still be counted as producer, leaving rule-level historical stats wrong.

🔧 Suggested direction

 def parse_rules_from_log(log_path: Path) -> dict[str, int]:
     rule_counts: dict[str, int] = {}
     current_rule: str | None = None
+    job_rules: dict[str, str] = {}

     try:
         for line in log_path.read_text().splitlines():
             # Track current rule being executed
             if match := RULE_START_PATTERN.match(line.lstrip()):
                 current_rule = match.group(1)
+            elif match := JOBID_PATTERN.match(line):
+                if current_rule is not None:
+                    job_rules[match.group(1)] = current_rule
             # Count "Finished job" as rule completion
-            elif "Finished job" in line and current_rule is not None:
-                rule_counts[current_rule] = rule_counts.get(current_rule, 0) + 1
+            elif match := FINISHED_JOB_PATTERN.search(line):
+                if rule := job_rules.get(match.group(1), current_rule):
+                    rule_counts[rule] = rule_counts.get(rule, 0) + 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@snakesee/parser/core.py` around lines 266 - 270, parse_rules_from_log()
misattributes "Finished job" lines to the last seen rule by using current_rule;
instead maintain a stack of open rules (e.g., rules_stack) where on
RULE_START_PATTERN.match(line.lstrip()) you push match.group(1), and on seeing
"Finished job" you pop from rules_stack (if non-empty) and increment rule_counts
for the popped rule; update references to current_rule (remove or keep only for
convenience) and ensure you guard against popping an empty stack.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakesee/parser/core.py`:
- Around line 332-333: TIMESTAMP_PATTERN is currently used with .search(), which
finds timestamps anywhere in a line and produces false positives; update every
check to use TIMESTAMP_PATTERN.match(line.lstrip()) instead (i.e., strip leading
whitespace and anchor to the start) in the functions
parse_running_jobs_from_log(), parse_failed_jobs_from_log(), and
_get_first_log_timestamp(), and where record_pending_error() is gated by a
TIMESTAMP_PATTERN check—replace the .search(...) calls at those sites so
timestamp detection only matches at the start of the trimmed line.

---

Outside diff comments:
In `@snakesee/parser/core.py`:
- Around line 266-270: parse_rules_from_log() misattributes "Finished job" lines
to the last seen rule by using current_rule; instead maintain a stack of open
rules (e.g., rules_stack) where on RULE_START_PATTERN.match(line.lstrip()) you
push match.group(1), and on seeing "Finished job" you pop from rules_stack (if
non-empty) and increment rule_counts for the popped rule; update references to
current_rule (remove or keep only for convenience) and ensure you guard against
popping an empty stack.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b74cfa2c-513d-4f1b-9708-1d0353eb8458

📥 Commits

Reviewing files that changed from the base of the PR and between 684359e and 227ea32.

📒 Files selected for processing (3)

snakesee/parser/core.py
snakesee/parser/line_parser.py
tests/test_parser.py

snakesee/parser/core.py

coderabbitai

🧹 Nitpick comments (1)

snakesee/parser/line_parser.py (1)

276-285: Missing checkpoint/localcheckpoint handling for indented lines.

The non-indented rule start check (line 188-189) handles checkpoint and localcheckpoint prefixes, but this indented handler only checks for rule and localrule. If a checkpoint rule ever appears inside a group block, it won't be recognized as a rule start.

♻️ Proposed fix to add checkpoint support

         # Indented rule start: "    rule X:" or "    localrule X:"
-        if (first_stripped == "r" and stripped.startswith("rule ")) or (
-            first_stripped == "l" and stripped.startswith("localrule ")
+        if (
+            (first_stripped == "r" and stripped.startswith("rule "))
+            or (first_stripped == "l" and stripped.startswith("localrule "))
+            or (first_stripped == "c" and stripped.startswith("checkpoint "))
+            or (first_stripped == "l" and stripped.startswith("localcheckpoint "))
         ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@snakesee/parser/line_parser.py` around lines 276 - 285, The indented-line
rule-start branch only checks for "r"/"rule" and "l"/"localrule" and thus misses
"checkpoint"/"localcheckpoint"; update the conditional that examines
first_stripped and stripped (the block using RULE_START_PATTERN.match(stripped))
to also accept the checkpoint prefixes the same way the non-indented handler
does, so that when RULE_START_PATTERN matches you still call
self.context.get_pending_error(), self.context.reset_for_new_rule(rule) and
append ParseEvent(ParseEventType.RULE_START, {"rule": rule}) for
checkpoint/localcheckpoint names as well (keep the existing use of
RULE_START_PATTERN, match.group(1), pending handling, reset_for_new_rule, and
ParseEvent creation).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@snakesee/parser/line_parser.py`:
- Around line 276-285: The indented-line rule-start branch only checks for
"r"/"rule" and "l"/"localrule" and thus misses "checkpoint"/"localcheckpoint";
update the conditional that examines first_stripped and stripped (the block
using RULE_START_PATTERN.match(stripped)) to also accept the checkpoint prefixes
the same way the non-indented handler does, so that when RULE_START_PATTERN
matches you still call self.context.get_pending_error(),
self.context.reset_for_new_rule(rule) and append
ParseEvent(ParseEventType.RULE_START, {"rule": rule}) for
checkpoint/localcheckpoint names as well (keep the existing use of
RULE_START_PATTERN, match.group(1), pending handling, reset_for_new_rule, and
ParseEvent creation).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4f515dc-65cb-4dc2-9fc7-d4075c8b4df5

📥 Commits

Reviewing files that changed from the base of the PR and between 227ea32 and 3f2475d.

📒 Files selected for processing (3)

snakesee/parser/core.py
snakesee/parser/line_parser.py
tests/test_parser.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_workflows.py`:
- Around line 52-56: The assertion in tests/integration/test_workflows.py is too
strict given EventState.add_event (in snakesee/state_comparison.py) and the
optional Event.total_jobs (in snakesee/types.py) which can leave total_jobs as
0,1,2,3,4; update the assertion that checks result.total_jobs so it allows any
integer in the inclusive range 0–4 (rather than only 0 or 4), i.e. validate
result.total_jobs is >= 0 and <= 4 (or use membership in range(0,5)) to cover
partial PROGRESS snapshots.

In `@tests/test_parser.py`:
- Around line 2822-2855: The tests never exercise a mid-line timestamp fragment,
so they don't catch regressions that use TIMESTAMP_PATTERN.search(); update the
fixtures so a non-timestamp line contains an embedded "[Mon ...]" fragment and
ensure the parser is driven into a pending-error state so the mid-line fragment
would matter — e.g., in test_midline_timestamp_does_not_corrupt_running_jobs (or
add a new test) change the log_content to start an error block (use
parse_failed_jobs_from_log or construct a pending error with the same shape as
other tests) then include a line like "some message [Mon Jan  6 10:00:01 2026]
continued text" to verify parse_running_jobs_from_log /
parse_failed_jobs_from_log does not prematurely close the block; this will
ensure the code path that would be broken by TIMESTAMP_PATTERN.search() is
actually exercised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74b448f7-c1a7-4328-8474-9a19b6dd14b2

📥 Commits

Reviewing files that changed from the base of the PR and between 3f2475d and 41b0ffa.

📒 Files selected for processing (4)

snakesee/parser/core.py
snakesee/parser/line_parser.py
tests/integration/test_workflows.py
tests/test_parser.py

tests/integration/test_workflows.py

tests/test_parser.py

Snakemake indents log output for jobs within group/pipe blocks by 4 spaces. The parser used RULE_START_PATTERN.match() anchored at position 0 and line.startswith("[") checks that both fail on indented lines, causing group jobs to be invisible or assigned wrong rule names. Fix by using RULE_START_PATTERN.match(line.lstrip()) for rule detection and TIMESTAMP_PATTERN.search() for timestamp detection across all parser functions. Add _parse_indented_or_group_line() to LogLineParser for the same handling in the streaming path. Closes #42

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

snakesee/parser/core.py Outdated Show resolved Hide resolved

nh13 force-pushed the nh/fix-group-job-parsing branch from 227ea32 to 3f2475d Compare March 28, 2026 16:01

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

nh13 force-pushed the nh/fix-group-job-parsing branch 2 times, most recently from 46b52ac to 41b0ffa Compare March 28, 2026 22:34

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

tests/integration/test_workflows.py Show resolved Hide resolved

tests/test_parser.py Outdated Show resolved Hide resolved

nh13 force-pushed the nh/fix-group-job-parsing branch from 41b0ffa to 3c42206 Compare March 28, 2026 22:52

nh13 merged commit 75c9b27 into main Mar 28, 2026
8 checks passed

nh13 deleted the nh/fix-group-job-parsing branch March 28, 2026 23:03

nh13 mentioned this pull request Mar 29, 2026

chore: release snakesee 0.7.0 and logger plugin 0.1.1 #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse group/pipe job logs correctly#52

fix: parse group/pipe job logs correctly#52
nh13 merged 1 commit intomainfrom
nh/fix-group-job-parsing

nh13 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nh13 commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nh13 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

codecov bot commented Mar 28, 2026 •

edited

Loading