[http] sanitise URL options before use them#22186
Conversation
Test Results 22 files 22 suites 3d 10h 33m 7s ⏱️ Results for commit ab58efa. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
In general I don't think we can really trust all this code to produce properly "sanitized" input. Sanitizers are notoriously hard to get right and I see many suspicious places in the Sniffer code.
So if these checks are in place as a measure against accidental errors, they may be useful, but I wouldn't consider them a measure against actual attacks.
Regardless, I put a couple of more specific comments.
Also, we definitely need tests for this.
| if (!value || !*value) | ||
| return ""; | ||
|
|
||
| TString res = value; |
There was a problem hiding this comment.
How long can value be? Since we're trying to sanitize the input, shouldn't we also put a reasonable max length?
There was a problem hiding this comment.
There is no such limit and I cannot make here any assumption.
civetweb cuts URL length by 16K.
There was a problem hiding this comment.
The backend is not guaranteed to be civetweb in principle, is it?
We should put an artificial limit here so we don't have to worry about huge allocations/reallocations/copies/iterations and we could in principle even preallocate the output string once and do a single pass over the string to validate it.
There was a problem hiding this comment.
In my mind such checks are not part of this PR.
They can be done in TUrl class which used to parse all url arguments
There was a problem hiding this comment.
I find it fragile to trust the assumption that all strings getting here come from some other place that already partially sanitized them, especially considering it's not even an explicit thing happening at the calling site.
That said, you can at least spare the strlen(value) == 0 and simply check res.Length() afterwards, to save one string scan.
| clean.Append('\\'); | ||
| clean.Append(c); | ||
| } else if (c > 31) | ||
| // ignore all special symbols |
There was a problem hiding this comment.
I think c == 127 is a "special" symbol (DEL)
There was a problem hiding this comment.
Probably I just will use !std::iscntrl(c)
There was a problem hiding this comment.
I'm not sure what this excludes exactly (does it exclude all ASCII non-printable characters? Does it depend on the locale? The man page makes it look like the case), so I was actually more comfortable with the explicit check.
There was a problem hiding this comment.
I do not like to exclude all non-latin symbols here - as long as they supported on all other places.
Main intention of this PR - prevent creation of string literals which can be mis-interpreted by cling.
There was a problem hiding this comment.
You are basically already excluding non-latin symbols by doing char-based string processing, so it's really about excluding special ASCII control characters that we don't expect to ever see...
To be more precise: if Unicode support is desired then we should make sure it works, as it's not really obvious at all that we are handling it correctly throughout the entire pipeline from start to finish (and I don't think we should bother)
There was a problem hiding this comment.
In DecodeUrlOptionValue test I add German letters.
| if (std::isalnum(c) || std::strchr(".:+-", c)) | ||
| sanitized.Append(c); | ||
| } else if ((c == '\"') || (c == '\\')) { | ||
| // escape quotes inside string |
There was a problem hiding this comment.
This is quite bizarre if I understand correctly: if an argument is "ab"cdef" we will parse it as the string "ab\"cdef"?
There was a problem hiding this comment.
It will be delivered to C++ as
new TMethodCall(obj_cl, method_name, "ab\"cdef" );
There was a problem hiding this comment.
Shouldn't we rather require all internal quotes to be already escaped in the argument? If the incoming argument is supposed to be a valid string literal it would not be valid if it contains unescaped quotes, which may indicate some other kind of error (on the user side or otherwise)
There was a problem hiding this comment.
Shouldn't we rather require all internal quotes to be already escaped in the argument?
On each layer escape character can disappear.
Quotes are very special - because when they goes into C++ interpreter they can break argument list and start new command. By such transformation we want guarantee that in any case interpreter will take provided string as single argument. Actually exactly such exploit is shown in [THttpServer-4] issue.
So attempt here to produce string which not closed in between.
There was a problem hiding this comment.
On each layer escape character can disappear.
That sounds like a bug from the previous layers...
Regardless, it would be nice to have some tests on this and comments to make it clear why we're doing this and exactly what form of strings do we accept from the user side
There was a problem hiding this comment.
There are many differences how quotes escaped in URL and in C++.
Main problem - string is used again in C++ script. So one need to provide/create escape characters.
There was a problem hiding this comment.
Even so, please add unit tests to verify/document the expected behavior of this code before merging.
There was a problem hiding this comment.
I add testing of 4 different combinations how quotes can appear in command definition and in command invocation.
There was a problem hiding this comment.
Thanks.
I think we should also add unit tests for the DecodeUrlOptionValue function, passing all kinds of "antagonistic" expressions that we expect to be sanitized by it.
There was a problem hiding this comment.
I add to last commit extra tests for DecodeUrlOptionValue method
Remove any special symbols Add escape characters for quote and escape itself Try to avoid manipulation of arguments for method execution
Avoid special characters as draw arguments
Always use DecodeUrlOptionValue method when processing URL arguments or URL string. Internally method provides escape symbols for quotes and backslash. If expecting numeric value - remove all symbols keeping alphanumeric, '.', '+', '-' and ':'
It allows to deserialize post data as ROOT object when processing exe.json request. While this can leads to arbitrary code loading and injection, disable this feature by default. Can be enabled back with: ``` serv->SetAllowPostObject(kTRUE); ```
While here arbitrary string injected into ProcessLine, ensure that only numeric argument is not quoted. All other arguments kinds will be quoted and prevent execution of potentially dangerous code
69ba55a to
c4d0d1f
Compare
| EXPECT_EQ(test.Decode("\"\"\"", kFALSE), "\"\\\"\""); | ||
|
|
||
| // keep quotes and keep german letters - remove new line | ||
| EXPECT_EQ(test.Decode("\"Gänse\nfüßchen\"", kFALSE), "\"Gänsefüßchen\""); |
There was a problem hiding this comment.
Could you please also add strings containing "operator-like" things, like someFunc("someArg"), someArray[3], and verify they get stripped?
There was a problem hiding this comment.
I add one - but only quotes are escaped, nothing else is changed.
Only later such special string will be surrounded by normal quotes or discarded while it does not match to numeric value.
Verify execution of several supported requests which can be handled by http server. Testing: - root.json - root.xml - exe.json - exe.json with POST data - item.json - cmd.json - multi.json Also verify basic functionality of TRootSniffer::DecodeUrlOptionValue method
URL syntax allow to provide different special symbols using
%12%20symbols.Also some situations un-escaped quotes can make a problems.
Therefore cleanup arguments from URL string adding C escape symbols.
Use it in
exe.jsonandcmd.json- main place where arguments which are coming via URLare formatted for
ProcessLine.Add sophisticated check in
cmd.jsonprocessing. One need to check quotes in argument andin command definition around argument. If none exists - only simple numeric values can be injected
without quotes. Avoid double quotes one both side of argument placeholder.
Provide special flag to enable processing of post data in the
exe.json, off by default.Provide testing for the main requests processing in the sniffer.
This emulates response of
THttpServerwith same kind of http requests.