Skip to content

chore(mcp/docker): various improvements and housekeeping#2443

Draft
meowgorithm wants to merge 1 commit intomainfrom
docker-desktop-followup
Draft

chore(mcp/docker): various improvements and housekeeping#2443
meowgorithm wants to merge 1 commit intomainfrom
docker-desktop-followup

Conversation

@meowgorithm
Copy link
Copy Markdown
Member

No description provided.

@meowgorithm meowgorithm changed the title chore(mcp/docker): various improvements chore(mcp/docker): various improvements and housekeeping Mar 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines how MCP tool binary payloads (image/audio Data) are normalized into padded base64, aiming to avoid misclassifying raw ASCII bytes as unpadded base64.

Changes:

  • Add a regression test ensuring raw ASCII like "abc" is encoded rather than treated as unpadded base64.
  • Replace permissive raw base64 decoding with a heuristic (decodeLikelyBase64) to reduce accidental decoding of plain-text bytes.
  • Simplify isValidBase64 by delegating to the new decoder logic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
internal/agent/tools/mcp/tools.go Introduces heuristic base64 detection/decoding and updates validation to use it.
internal/agent/tools/mcp/tools_test.go Adds a test case covering ambiguous raw ASCII input that previously could be misdetected as base64.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +194 to +197
// decodeLikelyBase64 decodes canonical base64 and only accepts unpadded raw
// base64 when the input also contains a character that plain lowercase text
// cannot contain in base64. This avoids corrupting raw ASCII payload bytes like
// "abc" that RawStdEncoding could otherwise decode.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is hard to parse/technically inaccurate: “plain lowercase text cannot contain in base64” reads like lowercase text is invalid base64, but base64 can contain lowercase letters. Consider rephrasing to explicitly say you only treat unpadded base64 as base64 when it contains characters outside [a-z] (or otherwise clarify the heuristic/ambiguity being handled).

Suggested change
// decodeLikelyBase64 decodes canonical base64 and only accepts unpadded raw
// base64 when the input also contains a character that plain lowercase text
// cannot contain in base64. This avoids corrupting raw ASCII payload bytes like
// "abc" that RawStdEncoding could otherwise decode.
// decodeLikelyBase64 first tries to decode canonical (padded) base64. If that
// fails, it only treats the input as unpadded base64 when the input length is
// not a multiple of 4 and it contains at least one character outside 'a'–'z'
// (for example, A–Z, 0–9, '+' or '/'). This avoids misclassifying plain lowercase
// ASCII payloads like "abc" as base64, which RawStdEncoding could otherwise decode.

Copilot uses AI. Check for mistakes.
if len(s)%4 == 0 {
return nil, false
}
if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/") {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ContainsAny heuristic means valid unpadded raw base64 strings made entirely of lowercase letters (possible for some binary payloads) will be rejected and then double-encoded by ensureBase64. If that input can occur from MCP tool providers, this will corrupt media payloads. Either broaden the acceptance criteria for raw base64 or document that lowercase-only unpadded base64 is intentionally treated as plain text (and add a test to lock that behavior in).

Suggested change
if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/") {
if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/") {

Copilot uses AI. Check for mistakes.
@@ -216,21 +228,6 @@ func decodeBase64(data []byte) ([]byte, bool) {

// isValidBase64 checks if the data appears to be valid base64-encoded content.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isValidBase64 now delegates to decodeLikelyBase64, which intentionally rejects some valid raw base64 (e.g., lowercase-only unpadded). The function name and comment suggest a strict validity check, but the behavior is now a “likelihood” heuristic. Consider renaming (e.g., isLikelyBase64) or updating the comment to reflect the narrower acceptance rules.

Suggested change
// isValidBase64 checks if the data appears to be valid base64-encoded content.
// isValidBase64 returns true if the data is likely to be base64-encoded content.
// It uses decodeLikelyBase64, which employs a heuristic and may reject some
// technically valid base64 encodings (for example, certain unpadded, lowercase-
// only raw base64) to avoid misinterpreting plain ASCII data as base64.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants