adding the JsonShapeSerializer#3842
Conversation
| return schema.GetMemberName(); | ||
| } | ||
|
|
||
| void BeginStructure(const Schema&) {} |
There was a problem hiding this comment.
whats goin on here, shouldnt we be adding something to the stack when we begin/end a structure?
|
|
||
| void WriteBoolean(const Schema& schema, bool value) { | ||
| if (!m_stack.empty() && m_stack.back().isList) { | ||
| JsonValue v; |
There was a problem hiding this comment.
i dont think we wanna declare a JsonValue value here because thats gunna make a heap allocation in cjson
| Aws::Vector<JsonValue> listItems; | ||
| }; | ||
|
|
||
| Aws::Vector<StackEntry> m_stack; |
There was a problem hiding this comment.
OK i see the grand picture of this and this matches our performance profile issues. This is actually much more simple than you are making it, lets start with what we are trying to do:
- we are prioritizing code gen to avoid a DOM/in memory tree representation of the object like cjson does
- we are allocating all of the memory up front (or at least try to) so that that we're only allocating and deallocating memory once.
So to start JsonValue will NOT be used in this class. JsonValue calls cjson which will have a multiple heap allocations as it traverses the tree. so that is out the window.
We are going to make it simple we have a character buffer(your calling it a stack in this case), and we just write to it, nothing more.
The basic structure of this would look like
#include <vector>
class json_parser_no_dom {
public:
json_parser_no_dom(size_t buffer_size) {
buffer_.reserve(buffer_size);
}
json_parser_no_dom(const json_parser_no_dom &other) = delete;
json_parser_no_dom(json_parser_no_dom &&other) noexcept = default;
json_parser_no_dom & operator=(const json_parser_no_dom &other) = delete;
json_parser_no_dom & operator=(json_parser_no_dom &&other) noexcept = default;
//...
std::string serialize() { return buffer_; }
private:
std::string buffer_{};
};where in this case the parser has a pre-allocated buffer to write stuff into. now we just put stuff into it.
class json_parser_no_dom {
public:
void write_string(const std::string& str, const std::string& value) {
buffer_.append("\"");
buffer_.append(str);
buffer_.append("\":");
buffer_.append("\"");
buffer_.append(value);
buffer_.append("\"");
}
};where you could do something like
auto main() -> int {
json_parser_no_dom json_parser{100};
json_parser.begin_structure();
json_parser.write_string("foo", "bar");
json_parser.end_structure();
std::cout << json_parser.serialize() << std::endl;
return 0;
}and see
{"foo":"bar"}
so at a very high level I think the problem here is that you are relying on JsonValue and cjson to do the hard parts, and that actually means you are performing heap allocations. what you want to do is rely on the schema for tree walking, and then you just dump the tree to a buffer.
| void EndNestedStructure() override; | ||
|
|
||
| Aws::String GetPayload() const; | ||
| const Aws::String& GetPayload() const; |
There was a problem hiding this comment.
while returning by const Aws::String&avoids a copy, this would create a lifetime issues upstream where if JsonShapeSerializer goes out of scope you have a dangling ref. this also correlates to another comment i have which is that i dont like the temporary variables that we use to read into std::vector. so at a high level i think we have a few requirements:
- we want to avoid a copy when we call get payload
- after get payload is called, we can call get payload again, however the buffer at that point is "finalized" and we want to transfer ownership for the buffer to the called.
- we want to avoid temporary variables while writing values to underlying buffer.
So to solve all of these things i suggest that we:
- use std::string as the buffer, pre-allocated to 8KiB on the heap to avoid re-allocation.
- we create a operation that returns the buffer string by move that invalidates the serializer object however avoid a copy.
enum class error {
ALREADY_FINALIZED,
};
class buffer_builder {
public:
buffer_builder() {
buffer_.reserve(8192);
}
//methods to write to buffer
std::expected<std::string, error> serialize() {
if (state_ == state::FINALIZED) {
return std::unexpected(error::ALREADY_FINALIZED);
}
state_ = state::FINALIZED;
return std::move(buffer_);
}
private:
enum class state: uint8_t {
INITIALIZED,
FINALIZED,
} state_{state::INITIALIZED};
std::string buffer_;
};|
|
||
| static const int MAX_DEPTH = 64; | ||
|
|
||
| struct JsonShapeSerializer::Impl { |
There was a problem hiding this comment.
should probably be a class that has a constructor and fields that are private to ensure encapsulation of the buffer.
| using namespace smithy::schema; | ||
| using namespace Aws::Utils; | ||
|
|
||
| static const int MAX_DEPTH = 64; |
There was a problem hiding this comment.
cjson has this set at 1000, we should also likely set this the same. so we avoid any surprises
| struct JsonShapeSerializer::Impl { | ||
| Aws::String m_buf; | ||
| int m_depth = 0; | ||
| bool m_needsComma[MAX_DEPTH] = {}; |
There was a problem hiding this comment.
perfer c++ std::array or Aws::Array over c arrays
| WriteFieldName(schema); | ||
| auto start = m_buf.size(); | ||
| m_buf.resize(start + 11); | ||
| int written = snprintf(&m_buf[start],12,"%d", value); |
There was a problem hiding this comment.
i think this goes for all of the operations but we should stop using snprintf and .resize alltogether. this is a string buffer. you should just use the until to_string https://github.com/aws/aws-sdk-cpp/blob/main/src/aws-cpp-sdk-core/include/aws/core/utils/StringUtils.h#L189-L198 and just append the the string to the buffer, allowing the string implementation to handle the underlying buffer resize not us.
|
|
||
| using namespace smithy::schema; | ||
|
|
||
| class JsonShapeSerializerTest : public Aws::Testing::AwsCppSdkGTestSuite {}; |
There was a problem hiding this comment.
Test cases we don't have that we should have:
MAX_DEPTHenforcement- json escaped data
| m_buf.resize(start + writter); | ||
| } | ||
|
|
||
| void WriteString(const Schema& schema, const Aws::String& value) { |
There was a problem hiding this comment.
so in this case, we need to account for "json escaping strings", and if you dont, crafted json strings can break the string. smithy java extracts this to a utils file called JsonWriteUtils. and in the utils file the do json encoding. we idealliy should have a identical class called JsonWriteUtils that works the same way but is only internally visible to the SDK.
| } | ||
|
|
||
| Aws::String GetPayload() { | ||
| assert(!m_finalized); |
There was a problem hiding this comment.
assertions are only enabled in debug mode, what happens in release mode? shouldnt we return a Aws::Outcome to represent error?
| class Schema { | ||
| public: | ||
| Schema() = default; | ||
| Schema(const char* memberName, ShapeType type) |
There was a problem hiding this comment.
why not const Aws::String& memberName? lets avoid using c
Adding the implementation for JsonShapeSerializer
Check all that applies:
Check which platforms you have built SDK on to verify the correctness of this PR.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.