Skip to content

Initial changes for stream logs command#7490

Open
saanikaguptamicrosoft wants to merge 1 commit intoAzure:foundry-training-devfrom
saanikaguptamicrosoft:saanika/stream
Open

Initial changes for stream logs command#7490
saanikaguptamicrosoft wants to merge 1 commit intoAzure:foundry-training-devfrom
saanikaguptamicrosoft:saanika/stream

Conversation

@saanikaguptamicrosoft
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a solid polling-based log streamer that follows the extension's established patterns for command setup and credential management. A few issues to address - one bug in the terminal-job detection, some Go modernization gaps, and structural duplication.

Summary: 2 CRITICAL, 2 HIGH, 4 MEDIUM, 2 LOW

CRITICAL:

  • Probe mechanism for detecting terminal jobs with no logs can't work - GetArtifactContent returns (nil, nil) on 404, so job status is never read
  • Discovery loop has no retry limit - wrong job name = infinite loop

HIGH:

  • sort.Strings instead of slices.Sort - doesn't match codebase (job_get.go uses slices.Sorted)
  • consecErrors counter doesn't reset on successful list, only on successful file reads

MEDIUM:

  • math.Min with float64 casts - use built-in min() instead
  • Read-and-print pattern duplicated 3 times - extract helper
  • Final flush loop duplicates main poll logic (~35 lines)
  • IsStreamableStatus doc comment says "StreamableStatuses returns"

LOW:

  • interface{} should be any (go fix catches this)
  • PR description is empty

fmt.Println("Use 'azd ai training job download' to download job artifacts.")
return nil
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: the probe mechanism can't work. GetArtifactContent returns (nil, nil) on HTTP 404 - the client closes the body and returns nil before this code can read X-VW-Job-Status. So probeResp is always nil here, lastJobStatus is never set, and the terminal-state check on line 126 never fires.

If a job finishes with no log files, this loop runs forever (only killed by Ctrl+C).

Fix: use apiClient.GetJob(ctx, name) to check status directly - same pattern as job_download.go:82:

job, jobErr := apiClient.GetJob(ctx, name)
if jobErr == nil && models.TerminalStatuses[job.Properties.Status] {
    fmt.Printf("\nJob '%s' is in terminal state '%s' with no log files.\n", name, job.Properties.Status)
    fmt.Println("Use 'azd ai training job download' to download job artifacts.")
    return nil
}


fmt.Println("(Discovering log files...)")
time.Sleep(discoveryRetryDelay)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No retry limit on discovery. If the job name is misspelled or logs never appear, this loops forever with no upper bound. Every other retry path in this file has maxConsecErrors - discovery should too.

Add a max discovery attempts constant and check it:

const maxDiscoveryAttempts = 30 // 30 * 5s = 2.5 min max wait
discoveryAttempts := 0
// inside loop:
discoveryAttempts++
if discoveryAttempts >= maxDiscoveryAttempts {
    return fmt.Errorf("no log files found after %d attempts - verify job name '%s' is correct", maxDiscoveryAttempts, name)
}

time.Sleep(discoveryRetryDelay)
}

sort.Strings(logFiles)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.Strings doesn't match the codebase. job_get.go uses slices.Sorted() consistently. Per AGENTS.md, prefer slices over sort.

Suggested change
sort.Strings(logFiles)
slices.Sort(logFiles)

Also update the import to replace "sort" with "slices" (and the other sort.Strings on line 219).

}
pollInterval = backoff(pollInterval)
continue
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consecErrors only resets on successful file reads (line 262), not after a successful ListArtifactsInPath. A sequence like: list fails, list succeeds, read fails, list succeeds, read fails hits the limit even though recent operations succeeded. Reset after the successful list call too:

// After the artifactList loop (line 222), add:
consecErrors = 0


// backoff doubles the interval up to maxPollInterval.
func backoff(current time.Duration) time.Duration {
next := time.Duration(math.Min(float64(current*2), float64(maxPollInterval)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math.Min with float64 casts is unnecessary - Go has built-in min() since 1.21. Also avoids potential precision loss with large time.Duration values.

Suggested change
next := time.Duration(math.Min(float64(current*2), float64(maxPollInterval)))
func backoff(current time.Duration) time.Duration {
return min(current*2, maxPollInterval)
}

This also removes the "math" import.

if !strings.HasSuffix(string(data), "\n") {
fmt.Println()
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "print data, ensure trailing newline" block appears 3 times (here, line 269, line 341). Extract it:

func printContent(data []byte) {
    fmt.Print(string(data))
    if !strings.HasSuffix(string(data), "\n") {
        fmt.Println()
    }
}

if contentLen > fs.offset {
fs.offset = contentLen
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final flush loop (lines 291-324) duplicates the file-reading logic from the main poll loop. If the read logic changes (e.g., error handling, offset tracking), both copies need updating. Consider extracting a pollFile(ctx, apiClient, jobName, path, fs) helper that both the main loop and final flush can call.

"Running": true,
}

// StreamableStatuses returns true if the job status indicates streaming is appropriate.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment says "StreamableStatuses returns" but the function is IsStreamableStatus. Per Go conventions, doc comments should start with the function name.

Suggested change
// StreamableStatuses returns true if the job status indicates streaming is appropriate.
// IsStreamableStatus returns true if the job status indicates streaming is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants