Initial changes for stream logs command#7490
Initial changes for stream logs command#7490saanikaguptamicrosoft wants to merge 1 commit intoAzure:foundry-training-devfrom
Conversation
jongio
left a comment
There was a problem hiding this comment.
This adds a solid polling-based log streamer that follows the extension's established patterns for command setup and credential management. A few issues to address - one bug in the terminal-job detection, some Go modernization gaps, and structural duplication.
Summary: 2 CRITICAL, 2 HIGH, 4 MEDIUM, 2 LOW
CRITICAL:
- Probe mechanism for detecting terminal jobs with no logs can't work -
GetArtifactContentreturns(nil, nil)on 404, so job status is never read - Discovery loop has no retry limit - wrong job name = infinite loop
HIGH:
sort.Stringsinstead ofslices.Sort- doesn't match codebase (job_get.gousesslices.Sorted)consecErrorscounter doesn't reset on successful list, only on successful file reads
MEDIUM:
math.Minwith float64 casts - use built-inmin()instead- Read-and-print pattern duplicated 3 times - extract helper
- Final flush loop duplicates main poll logic (~35 lines)
IsStreamableStatusdoc comment says "StreamableStatuses returns"
LOW:
interface{}should beany(go fixcatches this)- PR description is empty
| fmt.Println("Use 'azd ai training job download' to download job artifacts.") | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Bug: the probe mechanism can't work. GetArtifactContent returns (nil, nil) on HTTP 404 - the client closes the body and returns nil before this code can read X-VW-Job-Status. So probeResp is always nil here, lastJobStatus is never set, and the terminal-state check on line 126 never fires.
If a job finishes with no log files, this loop runs forever (only killed by Ctrl+C).
Fix: use apiClient.GetJob(ctx, name) to check status directly - same pattern as job_download.go:82:
job, jobErr := apiClient.GetJob(ctx, name)
if jobErr == nil && models.TerminalStatuses[job.Properties.Status] {
fmt.Printf("\nJob '%s' is in terminal state '%s' with no log files.\n", name, job.Properties.Status)
fmt.Println("Use 'azd ai training job download' to download job artifacts.")
return nil
}|
|
||
| fmt.Println("(Discovering log files...)") | ||
| time.Sleep(discoveryRetryDelay) | ||
| } |
There was a problem hiding this comment.
No retry limit on discovery. If the job name is misspelled or logs never appear, this loops forever with no upper bound. Every other retry path in this file has maxConsecErrors - discovery should too.
Add a max discovery attempts constant and check it:
const maxDiscoveryAttempts = 30 // 30 * 5s = 2.5 min max wait
discoveryAttempts := 0
// inside loop:
discoveryAttempts++
if discoveryAttempts >= maxDiscoveryAttempts {
return fmt.Errorf("no log files found after %d attempts - verify job name '%s' is correct", maxDiscoveryAttempts, name)
}| time.Sleep(discoveryRetryDelay) | ||
| } | ||
|
|
||
| sort.Strings(logFiles) |
There was a problem hiding this comment.
sort.Strings doesn't match the codebase. job_get.go uses slices.Sorted() consistently. Per AGENTS.md, prefer slices over sort.
| sort.Strings(logFiles) | |
| slices.Sort(logFiles) |
Also update the import to replace "sort" with "slices" (and the other sort.Strings on line 219).
| } | ||
| pollInterval = backoff(pollInterval) | ||
| continue | ||
| } |
There was a problem hiding this comment.
consecErrors only resets on successful file reads (line 262), not after a successful ListArtifactsInPath. A sequence like: list fails, list succeeds, read fails, list succeeds, read fails hits the limit even though recent operations succeeded. Reset after the successful list call too:
// After the artifactList loop (line 222), add:
consecErrors = 0|
|
||
| // backoff doubles the interval up to maxPollInterval. | ||
| func backoff(current time.Duration) time.Duration { | ||
| next := time.Duration(math.Min(float64(current*2), float64(maxPollInterval))) |
There was a problem hiding this comment.
math.Min with float64 casts is unnecessary - Go has built-in min() since 1.21. Also avoids potential precision loss with large time.Duration values.
| next := time.Duration(math.Min(float64(current*2), float64(maxPollInterval))) | |
| func backoff(current time.Duration) time.Duration { | |
| return min(current*2, maxPollInterval) | |
| } |
This also removes the "math" import.
| if !strings.HasSuffix(string(data), "\n") { | ||
| fmt.Println() | ||
| } | ||
| } |
There was a problem hiding this comment.
This "print data, ensure trailing newline" block appears 3 times (here, line 269, line 341). Extract it:
func printContent(data []byte) {
fmt.Print(string(data))
if !strings.HasSuffix(string(data), "\n") {
fmt.Println()
}
}| if contentLen > fs.offset { | ||
| fs.offset = contentLen | ||
| } | ||
| } |
There was a problem hiding this comment.
The final flush loop (lines 291-324) duplicates the file-reading logic from the main poll loop. If the read logic changes (e.g., error handling, offset tracking), both copies need updating. Consider extracting a pollFile(ctx, apiClient, jobName, path, fs) helper that both the main loop and final flush can call.
| "Running": true, | ||
| } | ||
|
|
||
| // StreamableStatuses returns true if the job status indicates streaming is appropriate. |
There was a problem hiding this comment.
Doc comment says "StreamableStatuses returns" but the function is IsStreamableStatus. Per Go conventions, doc comments should start with the function name.
| // StreamableStatuses returns true if the job status indicates streaming is appropriate. | |
| // IsStreamableStatus returns true if the job status indicates streaming is appropriate. |
No description provided.