chore(mcp/docker): various improvements and housekeeping by meowgorithm · Pull Request #2443 · charmbracelet/crush

meowgorithm · 2026-03-20T19:36:45Z

No description provided.

…tion

Copilot

Pull request overview

This PR refines how MCP tool binary payloads (image/audio Data) are normalized into padded base64, aiming to avoid misclassifying raw ASCII bytes as unpadded base64.

Changes:

Add a regression test ensuring raw ASCII like "abc" is encoded rather than treated as unpadded base64.
Replace permissive raw base64 decoding with a heuristic (decodeLikelyBase64) to reduce accidental decoding of plain-text bytes.
Simplify isValidBase64 by delegating to the new decoder logic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
internal/agent/tools/mcp/tools.go	Introduces heuristic base64 detection/decoding and updates validation to use it.
internal/agent/tools/mcp/tools_test.go	Adds a test case covering ambiguous raw ASCII input that previously could be misdetected as base64.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T19:40:25Z

+// decodeLikelyBase64 decodes canonical base64 and only accepts unpadded raw
+// base64 when the input also contains a character that plain lowercase text
+// cannot contain in base64. This avoids corrupting raw ASCII payload bytes like
+// "abc" that RawStdEncoding could otherwise decode.


The doc comment is hard to parse/technically inaccurate: “plain lowercase text cannot contain in base64” reads like lowercase text is invalid base64, but base64 can contain lowercase letters. Consider rephrasing to explicitly say you only treat unpadded base64 as base64 when it contains characters outside [a-z] (or otherwise clarify the heuristic/ambiguity being handled).

Suggested change

// decodeLikelyBase64 decodes canonical base64 and only accepts unpadded raw

// base64 when the input also contains a character that plain lowercase text

// cannot contain in base64. This avoids corrupting raw ASCII payload bytes like

// "abc" that RawStdEncoding could otherwise decode.

// decodeLikelyBase64 first tries to decode canonical (padded) base64. If that

// fails, it only treats the input as unpadded base64 when the input length is

// not a multiple of 4 and it contains at least one character outside 'a'–'z'

// (for example, A–Z, 0–9, '+' or '/'). This avoids misclassifying plain lowercase

// ASCII payloads like "abc" as base64, which RawStdEncoding could otherwise decode.

Copilot · 2026-03-20T19:40:26Z

+	if len(s)%4 == 0 {
+		return nil, false
+	}
+	if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/") {


The ContainsAny heuristic means valid unpadded raw base64 strings made entirely of lowercase letters (possible for some binary payloads) will be rejected and then double-encoded by ensureBase64. If that input can occur from MCP tool providers, this will corrupt media payloads. Either broaden the acceptance criteria for raw base64 or document that lowercase-only unpadded base64 is intentionally treated as plain text (and add a test to lock that behavior in).

Suggested change

if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/") {

if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/") {

Copilot · 2026-03-20T19:40:26Z

@@ -216,21 +228,6 @@ func decodeBase64(data []byte) ([]byte, bool) {

 // isValidBase64 checks if the data appears to be valid base64-encoded content.


isValidBase64 now delegates to decodeLikelyBase64, which intentionally rejects some valid raw base64 (e.g., lowercase-only unpadded). The function name and comment suggest a strict validity check, but the behavior is now a “likelihood” heuristic. Consider renaming (e.g., isLikelyBase64) or updating the comment to reflect the narrower acceptance rules.

Suggested change

// isValidBase64 checks if the data appears to be valid base64-encoded content.

// isValidBase64 returns true if the data is likely to be base64-encoded content.

// It uses decodeLikelyBase64, which employs a heuristic and may reject some

// technically valid base64 encodings (for example, certain unpadded, lowercase-

// only raw base64) to avoid misinterpreting plain ASCII data as base64.

charmcli added the area: mcp label Mar 20, 2026

meowgorithm changed the title ~~chore(mcp/docker): various improvements~~ chore(mcp/docker): various improvements and housekeeping Mar 20, 2026

fix(mcp): avoid corrupting raw ASCII payloads during base64 normaliza…

3fcbf04

…tion

meowgorithm force-pushed the docker-desktop-followup branch from 53294d4 to 3fcbf04 Compare March 20, 2026 19:37

meowgorithm requested a review from Copilot March 20, 2026 19:37

Copilot started reviewing on behalf of meowgorithm March 20, 2026 19:37 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(mcp/docker): various improvements and housekeeping#2443

chore(mcp/docker): various improvements and housekeeping#2443
meowgorithm wants to merge 1 commit intomainfrom
docker-desktop-followup

meowgorithm commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-// decodeLikelyBase64 decodes canonical base64 and only accepts unpadded raw
-// base64 when the input also contains a character that plain lowercase text
-// cannot contain in base64. This avoids corrupting raw ASCII payload bytes like
-// "abc" that RawStdEncoding could otherwise decode.
+// decodeLikelyBase64 first tries to decode canonical (padded) base64. If that
+// fails, it only treats the input as unpadded base64 when the input length is
+// not a multiple of 4 and it contains at least one character outside 'a'–'z'
+// (for example, A–Z, 0–9, '+' or '/'). This avoids misclassifying plain lowercase
+// ASCII payloads like "abc" as base64, which RawStdEncoding could otherwise decode.

	if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/") {
	if !strings.ContainsAny(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/") {

		@@ -216,21 +228,6 @@ func decodeBase64(data []byte) ([]byte, bool) {

		// isValidBase64 checks if the data appears to be valid base64-encoded content.

-// isValidBase64 checks if the data appears to be valid base64-encoded content.
+// isValidBase64 returns true if the data is likely to be base64-encoded content.
+// It uses decodeLikelyBase64, which employs a heuristic and may reject some
+// technically valid base64 encodings (for example, certain unpadded, lowercase-
+// only raw base64) to avoid misinterpreting plain ASCII data as base64.

Conversation

meowgorithm commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants