Skip to content

Strip quotes from charset before Encoding.GetEncoding#23

Open
mrotondo wants to merge 1 commit into
lennykean:masterfrom
mrotondo:fix-quoted-charset-encoding
Open

Strip quotes from charset before Encoding.GetEncoding#23
mrotondo wants to merge 1 commit into
lennykean:masterfrom
mrotondo:fix-quoted-charset-encoding

Conversation

@mrotondo

Copy link
Copy Markdown

Problem

Decoding an MTOM response whose application/xop+xml root part declares a quoted charset — e.g. Content-Type: application/xop+xml; charset="utf-8"; type="text/xml" — throws:

System.ArgumentException: '"utf-8"' is not a supported encoding name.
   at WcfCoreMtomEncoder.MtomPart.GetStringContentForEncoder(MessageEncoder encoder)
   at WcfCoreMtomEncoder.MtomMessageEncoder.ReadMessage(Stream, Int32, String contentType)

A quoted charset value is valid per RFC 7231 §3.1.1.1 (a parameter value may be a token or a quoted-string), and System.Net.Http.Headers.MediaTypeHeaderValue.CharSet returns the value with the surrounding quotes intact (see dotnet/runtime#42079). That value — "utf-8", quotes included — is then passed to Encoding.GetEncoding, which rejects it.

This happens against real servers: IRS MeF's state services (Apache/Axiom) emit charset="utf-8" on the MTOM root part, so the client crashes before it can read the message.

Fix

Trim surrounding quotes from CharSet before calling Encoding.GetEncoding, mirroring what the encoder already does for the type parameter (p.Value.Replace("\"", "")). Two call sites consumed the raw value:

  • MtomPart.GetStringContentForEncoder
  • MtomMessageEncoder.CreateStream

Both now use CharSet?.Trim('"') guarded by !string.IsNullOrEmpty(...), so an absent charset still falls back to Encoding.Default (no NullReferenceException).

Tests

Adds a WcfCoreMtomEncoder.Tests project with an xUnit theory that decodes a multipart/related MTOM response through MtomMessageEncoder.ReadMessage with the root part's charset set four ways:

charset before this PR after
charset="utf-8" (quoted, lower) ArgumentException passes
charset="UTF-8" (quoted, upper) ArgumentException passes
charset=utf-8 (unquoted) passes passes
(absent) passes passes

Note: the test project targets net10.0 with System.ServiceModel.Primitives 10.x; adjust the TFM / package version to match your CI's SDK if needed.

A quoted charset parameter (e.g. charset="utf-8") is valid per RFC 7231 3.1.1.1,
and System.Net.Http's MediaTypeHeaderValue.CharSet returns the value with the
quotes intact. Passing it straight to Encoding.GetEncoding throws
ArgumentException: '"utf-8"' is not a supported encoding name.

Real-world servers send this: IRS MeF (Apache/Axiom) emits
application/xop+xml; charset="utf-8" on the MTOM root part, which crashes
decoding in both MtomPart.GetStringContentForEncoder and
MtomMessageEncoder.CreateStream.

Trim the surrounding quotes (mirroring the existing handling of the type
parameter) and guard against an absent charset. Adds an xUnit theory covering
quoted (lower/upper), unquoted, and absent charset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mrotondo

mrotondo commented Jun 16, 2026

Copy link
Copy Markdown
Author

Please excuse the noise around the Claude-generated tests (though I do think they'd be good to have as part of the project). The critical piece is the changes to MtomMessageEncoder.cs and MtomPart.cs, which I've also manually tested by creating a small script that exercises both code paths and crashes before the change but works correctly after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant