cmd/loop: CLI session recording and replay tests#1072
cmd/loop: CLI session recording and replay tests#1072starius wants to merge 16 commits intolightninglabs:masterfrom
Conversation
Summary of ChangesHello @starius, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the testing infrastructure for the Loop CLI by introducing a robust session recording and replay mechanism. The primary goal is to establish a high-level, deterministic, and regression-resistant testing path that does not depend on a live loopd daemon or specific local environment details. By capturing and replaying real command executions, the CLI's behavior, including output formatting, error messaging, and gRPC interactions, can be thoroughly verified in a controlled and reproducible manner, greatly improving the reliability of future changes. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive session recording and replay framework for testing the Loop CLI. This is a significant improvement that will make the CLI more robust and regression-resistant. The implementation is well-structured, abstracting away dependencies like the gRPC connection, clock, and stdio to allow for deterministic testing. The recorded test sessions are extensive and cover a wide range of commands and scenarios. The code is of high quality, and I only have one minor suggestion to improve the clarity of the replay logic.
6fca25d to
763c480
Compare
|
@starius, remember to re-request review from reviewers when ready |
Make the output homedir-independent to facilitate tests.
abandonSwap used to access flag --id which does not exist.
|
Rebased, add recorded sessions for newly added commands and flags (the last commit). |
|
My biggest question about the design for fixture maintenance. With 114 fixture files, what's the plan for bulk updates when CLI output format changes? A few things that would help long-term:
I don't think it's a blocker for this test update. Just want to understand the maintenance story before this grows further. |
| } | ||
|
|
||
| // finalize writes the recorded session to disk once. | ||
| func (r *sessionRecorder) finalize(runErr error) error { |
There was a problem hiding this comment.
Minor: consider writing session even if stopHooks fails, to preserve debugging data
From Claude, the realistic risk is low. stopHooks() just calls unhook functions that restore original file descriptors. Failures here would be unusual (maybe a closed pipe, or dup2 failure). In normal operation this won't happen.
It's suggestion is:
r.finalizeOnce.Do(func() {
// Still try to stop hooks, but don't abandon ship on error
hookErr := r.stopHooks()
r.logExit(runErr)
// ... write session to disk ...
// Return hook error only if write succeeded
if finalizeErr == nil {
finalizeErr = hookErr
}
})
|
|
||
| // copyExportedFields copies exported fields from src into dst. | ||
| func copyExportedFields(dst, src reflect.Value) { | ||
| // Iterate fields and copy only exported ones. |
There was a problem hiding this comment.
Moderate concern here: This copies only exported fields from cli.Command, cli.Flag, and cli.Argument structs. Here's the points where it's fragile:
- Unexported state is silently ignored
If urfave/cli stores important state in unexported fields, the clone won't have it. For example, if
cli.StringFlag had:
type StringFlag struct {
Name string // exported - copied
value string // unexported - NOT copied
isSet bool // unexported - NOT copied
}
The test at line 1071 (require.False(t, clonedAlpha.IsSet())) actually relies on this behavior - it expects
IsSet() to return false because the unexported state isn't copied. So currently this works in your favor.
- Structural assumptions about cli.Command
The code explicitly handles these fields (line 688-692):
- Flags
- MutuallyExclusiveFlags
- Arguments
- Commands
If urfave/cli v3.5 adds a new field like ConditionalFlags []cli.Flag, the cloning code won't deep-clone it - it'll share references with the original.
- Version pinning mitigates but doesn't eliminate risk
The repo is on urfave/cli v3.4.1. When you upgrade, the test TestCloneCommandForReplayResetsFlagState should catch obvious breaks, but it doesn't comprehensively cover all cli.Command fields.
Suggestions:
- Add a comment documenting the urfave/cli version dependency:
// cloneCommandForReplay deep-clones a command tree for deterministic replays.
// NOTE: This relies on urfave/cli/v3 internal structure. If upgrading cli
// versions, verify TestCloneCommandForReplayResetsFlagState still passes.
- Consider a simpler alternative: Since this is test-only, you could just call newRootCommand() fresh for each test instead of cloning. Is cloning is actually necessary?
| defer r.Close() | ||
|
|
||
| writer := composeWriter(forward, onChunk) | ||
| _, _ = io.Copy(writer, r) |
There was a problem hiding this comment.
Minor/Nit: You have a lot of error swallowing in this file. Consider logging io.Copy errors in session_stdio.go for debugging incomplete recordings, or documenting why they're intentionally ignored.
Claude suggests:
go func() {
defer wg.Done()
defer r.Close()
writer := composeWriter(forward, onChunk)
if _, err := io.Copy(writer, r); err != nil {
// At minimum, log it for debugging
log.Debugf("session stdout copy error: %v", err)
}
}()
Or track errors in a struct field and surface them in finalize():
type sessionRecorder struct {
// ...
copyErr error // Set if io.Copy fails
}
This work adds a repeatable, high-level testing path for the Loop CLI by recording real command runs and replaying them in tests. The goal is to make CLI behavior verifiable and regression-resistant without depending on a live loopd or local environment details. The recorded sessions become fixtures that exercise success and error paths across subcommands and flags, while replay ensures deterministic output checks in CI.
What changed
LOOP_SESSION_RECORD=trueis set.~and normalized JSON output and timestamps during replay so sessions are portable across machines.cmd/loop/testdata/sessions/and documented how to add more.Why this approach
The CLI previously relied on manual testing or live daemons, which makes it hard to detect regressions and impossible to run in CI without complex setup. Recording/replay gives us deterministic, reproducible coverage of real behavior, including error messaging, output formatting, and gRPC request/response handling, while still exercising the production code paths.
How it works
Recording wraps the CLI process and intercepts stdio and gRPC calls, saving a JSON session file with events and metadata. Replay loads these events, supplies recorded input, and provides a recorded gRPC connection so commands run without a live daemon. Output is normalized where needed so sessions can be replayed across machines.
Testing
go test ./cmd/loop -vLOOP_SESSION_RECORD=true loop <command> --network regtestPull Request Checklist
release_notes.mdif your PR contains major features, breaking changes or bugfixes