Skip to content

Commit 5ca9ac9

Browse files
tglsfdcbigplaice
authored andcommitted
Make escaping functions retain trailing bytes of an invalid character.
Instead of dropping the trailing byte(s) of an invalid or incomplete multibyte character, replace only the first byte with a known-invalid sequence, and process the rest normally. This seems less likely to confuse incautious callers than the behavior adopted in 5dc1e42b4. While we're at it, adjust PQescapeStringInternal to produce at most one bleat about invalid multibyte characters per string. This matches the behavior of PQescapeInternal, and avoids the risk of producing tons of repetitive junk if a long string is simply given in the wrong encoding. This is a followup to the fixes for CVE-2025-1094, and should be included if cherry-picking those fixes. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/20250215012712.45@rfd.leadboat.com Backpatch-through: 13
1 parent 0b1f363 commit 5ca9ac9

2 files changed

Lines changed: 65 additions & 95 deletions

File tree

src/fe_utils/string_utils.c

Lines changed: 34 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -194,40 +194,25 @@ stardand_fmtId(const char *rawid, int encoding)
194194
/* Slow path for possible multibyte characters */
195195
charlen = pg_encoding_mblen(encoding, cp);
196196

197-
if (remaining < charlen)
198-
{
199-
/*
200-
* If the character is longer than the available input,
201-
* replace the string with an invalid sequence. The invalid
202-
* sequence ensures that the escaped string will trigger an
203-
* error on the server-side, even if we can't directly report
204-
* an error here.
205-
*/
206-
enlargePQExpBuffer(id_return, 2);
207-
pg_encoding_set_invalid(encoding,
208-
id_return->data + id_return->len);
209-
id_return->len += 2;
210-
id_return->data[id_return->len] = '\0';
211-
212-
/* there's no more input data, so we can stop */
213-
break;
214-
}
215-
else if (pg_encoding_verifymbchar(encoding, cp, charlen) == -1)
197+
if (remaining < charlen ||
198+
pg_encoding_verifymbchar(encoding, cp, charlen) == -1)
216199
{
217200
/*
218201
* Multibyte character is invalid. It's important to verify
219-
* that as invalid multi-byte characters could e.g. be used to
202+
* that as invalid multibyte characters could e.g. be used to
220203
* "skip" over quote characters, e.g. when parsing
221204
* character-by-character.
222205
*
223-
* Replace the bytes corresponding to the invalid character
224-
* with an invalid sequence, for the same reason as above.
206+
* Replace the character's first byte with an invalid
207+
* sequence. The invalid sequence ensures that the escaped
208+
* string will trigger an error on the server-side, even if we
209+
* can't directly report an error here.
225210
*
226211
* It would be a bit faster to verify the whole string the
227212
* first time we encounter a set highbit, but this way we can
228-
* replace just the invalid characters, which probably makes
229-
* it easier for users to find the invalidly encoded portion
230-
* of a larger string.
213+
* replace just the invalid data, which probably makes it
214+
* easier for users to find the invalidly encoded portion of a
215+
* larger string.
231216
*/
232217
enlargePQExpBuffer(id_return, 2);
233218
pg_encoding_set_invalid(encoding,
@@ -236,11 +221,13 @@ stardand_fmtId(const char *rawid, int encoding)
236221
id_return->data[id_return->len] = '\0';
237222

238223
/*
239-
* Copy the rest of the string after the invalid multi-byte
240-
* character.
224+
* Handle the following bytes as if this byte didn't exist.
225+
* That's safer in case the subsequent bytes contain
226+
* characters that are significant for the caller (e.g. '>' in
227+
* html).
241228
*/
242-
remaining -= charlen;
243-
cp += charlen;
229+
remaining--;
230+
cp++;
244231
}
245232
else
246233
{
@@ -409,49 +396,39 @@ appendStringLiteral(PQExpBuffer buf, const char *str,
409396
/* Slow path for possible multibyte characters */
410397
charlen = PQmblen(source, encoding);
411398

412-
if (remaining < charlen)
413-
{
414-
/*
415-
* If the character is longer than the available input, replace
416-
* the string with an invalid sequence. The invalid sequence
417-
* ensures that the escaped string will trigger an error on the
418-
* server-side, even if we can't directly report an error here.
419-
*
420-
* We know there's enough space for the invalid sequence because
421-
* the "target" buffer is 2 * length + 2 long, and at worst we're
422-
* replacing a single input byte with two invalid bytes.
423-
*/
424-
pg_encoding_set_invalid(encoding, target);
425-
target += 2;
426-
427-
/* there's no more valid input data, so we can stop */
428-
break;
429-
}
430-
else if (pg_encoding_verifymbchar(encoding, source, charlen) == -1)
399+
if (remaining < charlen ||
400+
pg_encoding_verifymbchar(encoding, source, charlen) == -1)
431401
{
432402
/*
433403
* Multibyte character is invalid. It's important to verify that
434-
* as invalid multi-byte characters could e.g. be used to "skip"
404+
* as invalid multibyte characters could e.g. be used to "skip"
435405
* over quote characters, e.g. when parsing
436406
* character-by-character.
437407
*
438-
* Replace the bytes corresponding to the invalid character with
439-
* an invalid sequence, for the same reason as above.
408+
* Replace the character's first byte with an invalid sequence.
409+
* The invalid sequence ensures that the escaped string will
410+
* trigger an error on the server-side, even if we can't directly
411+
* report an error here.
412+
*
413+
* We know there's enough space for the invalid sequence because
414+
* the "target" buffer is 2 * length + 2 long, and at worst we're
415+
* replacing a single input byte with two invalid bytes.
440416
*
441417
* It would be a bit faster to verify the whole string the first
442418
* time we encounter a set highbit, but this way we can replace
443-
* just the invalid characters, which probably makes it easier for
444-
* users to find the invalidly encoded portion of a larger string.
419+
* just the invalid data, which probably makes it easier for users
420+
* to find the invalidly encoded portion of a larger string.
445421
*/
446422
pg_encoding_set_invalid(encoding, target);
447423
target += 2;
448-
remaining -= charlen;
449424

450425
/*
451-
* Copy the rest of the string after the invalid multi-byte
452-
* character.
426+
* Handle the following bytes as if this byte didn't exist. That's
427+
* safer in case the subsequent bytes contain important characters
428+
* for the caller (e.g. '>' in html).
453429
*/
454-
source += charlen;
430+
source++;
431+
remaining--;
455432
}
456433
else
457434
{

src/interfaces/libpq/fe-exec.c

Lines changed: 31 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -4076,6 +4076,7 @@ PQescapeStringInternal(PGconn *conn,
40764076
const char *source = from;
40774077
char *target = to;
40784078
size_t remaining = strnlen(from, length);
4079+
bool already_complained = false;
40794080

40804081
if (error)
40814082
*error = 0;
@@ -4102,65 +4103,57 @@ PQescapeStringInternal(PGconn *conn,
41024103
/* Slow path for possible multibyte characters */
41034104
charlen = pg_encoding_mblen(encoding, source);
41044105

4105-
if (remaining < charlen)
4106+
if (remaining < charlen ||
4107+
pg_encoding_verifymbchar(encoding, source, charlen) == -1)
41064108
{
41074109
/*
4108-
* If the character is longer than the available input, report an
4109-
* error if possible, and replace the string with an invalid
4110-
* sequence. The invalid sequence ensures that the escaped string
4111-
* will trigger an error on the server-side, even if we can't
4112-
* directly report an error here.
4110+
* Multibyte character is invalid. It's important to verify that
4111+
* as invalid multibyte characters could e.g. be used to "skip"
4112+
* over quote characters, e.g. when parsing
4113+
* character-by-character.
4114+
*
4115+
* Report an error if possible, and replace the character's first
4116+
* byte with an invalid sequence. The invalid sequence ensures
4117+
* that the escaped string will trigger an error on the
4118+
* server-side, even if we can't directly report an error here.
41134119
*
41144120
* This isn't *that* crucial when we can report an error to the
4115-
* caller, but if we can't, the caller will use this string
4116-
* unmodified and it needs to be safe for parsing.
4121+
* caller; but if we can't or the caller ignores it, the caller
4122+
* will use this string unmodified and it needs to be safe for
4123+
* parsing.
41174124
*
41184125
* We know there's enough space for the invalid sequence because
41194126
* the "to" buffer needs to be at least 2 * length + 1 long, and
41204127
* at worst we're replacing a single input byte with two invalid
41214128
* bytes.
4122-
*/
4123-
if (error)
4124-
*error = 1;
4125-
if (conn)
4126-
libpq_append_conn_error(conn, "incomplete multibyte character");
4127-
4128-
pg_encoding_set_invalid(encoding, target);
4129-
target += 2;
4130-
4131-
/* there's no more input data, so we can stop */
4132-
break;
4133-
}
4134-
else if (pg_encoding_verifymbchar(encoding, source, charlen) == -1)
4135-
{
4136-
/*
4137-
* Multibyte character is invalid. It's important to verify that
4138-
* as invalid multi-byte characters could e.g. be used to "skip"
4139-
* over quote characters, e.g. when parsing
4140-
* character-by-character.
4141-
*
4142-
* Replace the bytes corresponding to the invalid character with
4143-
* an invalid sequence, for the same reason as above.
41444129
*
41454130
* It would be a bit faster to verify the whole string the first
41464131
* time we encounter a set highbit, but this way we can replace
4147-
* just the invalid characters, which probably makes it easier for
4148-
* users to find the invalidly encoded portion of a larger string.
4132+
* just the invalid data, which probably makes it easier for users
4133+
* to find the invalidly encoded portion of a larger string.
41494134
*/
41504135
if (error)
41514136
*error = 1;
4152-
if (conn)
4153-
libpq_append_conn_error(conn, "invalid multibyte character");
4137+
if (conn && !already_complained)
4138+
{
4139+
if (remaining < charlen)
4140+
libpq_append_conn_error(conn, "incomplete multibyte character");
4141+
else
4142+
libpq_append_conn_error(conn, "invalid multibyte character");
4143+
/* Issue a complaint only once per string */
4144+
already_complained = true;
4145+
}
41544146

41554147
pg_encoding_set_invalid(encoding, target);
41564148
target += 2;
4157-
remaining -= charlen;
41584149

41594150
/*
4160-
* Copy the rest of the string after the invalid multi-byte
4161-
* character.
4151+
* Handle the following bytes as if this byte didn't exist. That's
4152+
* safer in case the subsequent bytes contain important characters
4153+
* for the caller (e.g. '>' in html).
41624154
*/
4163-
source += charlen;
4155+
source++;
4156+
remaining--;
41644157
}
41654158
else
41664159
{

0 commit comments

Comments
 (0)