HPCC-35246 New function for deserializing all attributes#21165
HPCC-35246 New function for deserializing all attributes#21165streeterd wants to merge 5 commits intohpcc-systems:candidate-10.2.xfrom
Conversation
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35246 Jirabot Action Result: |
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
8222d10 to
113d12d
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
045c061 to
05598bb
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
Performance test 35246 vs 10.2.x - release build35246: 10.2.x Checking out to before this change: Test results: 35246 vs 10.2.x comparison table: |
0360611 to
17b6ad8
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
ghalliday
left a comment
There was a problem hiding this comment.
@streeterd looks good. A few relatively minor comments/improvements.
the atom version can be further improved, but that should be a separate PR (which I suspect is your intention).
| v->key.setPtr(isnocase() ? AttrStr::createNC(attrName) : AttrStr::create(attrName)); | ||
| if (arrayOwner) | ||
| { | ||
| CQualifierMap *map = arrayOwner->queryMap(); |
There was a problem hiding this comment.
This can be done outside the attribute loop, and I think arrayOwner is always null, so instead you could add
dbgassertex(!arrayOwner);
and delete this code.
| // Allocate memory for all attributes (existing + new) in one go, and copy existing attributes if needed. | ||
| unsigned insertPos = numAttrs; | ||
| AttrValue *newattrs = newAttrArray(insertPos + newAttrPairs); | ||
| if (attrs) |
There was a problem hiding this comment.
minor: You can add a dbgassert(numAttrs == 0), and remove this code. When deserializing there are never any existing values.
| const char *attrName = base + ctx.matchOffsets[i]; | ||
| const char *attrValue = base + ctx.matchOffsets[i + 1]; | ||
|
|
||
| if (arrayOwner) |
There was a problem hiding this comment.
This can be deleted - see comment on function above.
| PtrStrUnion<HashKeyElement> name; | ||
| protected: | ||
| virtual bool removeAttribute(const char *k) override; | ||
| static CriticalSection &queryHashCrit(); |
There was a problem hiding this comment.
minor: Not yet used. I don't think it needs to be here.
| { | ||
| // Since most nodes have <= 15 attribute name/value pairs; | ||
| // reserve space up front to avoid reallocations. | ||
| static constexpr size32_t expectedMaximumAttributeOffsetCount = 15 * 2; |
|
A decent improvement in performance. |
65a1687 to
3d1f417
Compare
1.Refactored attribute deserialization to remove unnecessary findAttribute calls. 2.Allocate memory for all attributes instead of individually. 3.hashcrit moved from deserialization to new function queryHashCrit() Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
2.Remove queryHashCrit() as not yet needed. 3.Remove static from constexpr as not needed. Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
3d1f417 to
6b1cd44
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
ghalliday
left a comment
There was a problem hiding this comment.
A few more comments to clean the code up, but they could be done in a follow up PR.
| return; | ||
|
|
||
| // Allocate space for existing and new attributes | ||
| AttrValue *newAttrs = (AttrValue *)realloc(attrs, (numAttrs + newAttrPairs) * sizeof(AttrValue)); |
There was a problem hiding this comment.
dbgassertex(numAttrs == 0);
then directly assign to attrs (and avoid the add in this statement.
minor: The other code in setAttribute does not verify the realloc succeeded, so no need to do so here.
| attrs = newattrs; | ||
| numAttrs = insertPos + newAttrPairs; | ||
|
|
||
| // Set attribute values and update qualifier map if needed |
There was a problem hiding this comment.
// set attribute value. Qualifier map will never be set up, so no need to update.
| { | ||
| const char *attrName = base + ctx.matchOffsets[i]; | ||
| const char *attrValue = base + ctx.matchOffsets[i + 1]; | ||
| AttrValue *v = new (&attrs[numAttrs++]) AttrValue; // Initialize new AttrValue |
There was a problem hiding this comment.
In a later PR, it would be better to have a local variable incremented, and only assign to newAttrs once. It would be symetrical to CAtomPTree, and may avoid some memory load/stores. (It depends how clever the compiler is...)
| dbgassertex(numAttrs == 0); | ||
| unsigned insertPos = numAttrs; | ||
| AttrValue *newattrs = newAttrArray(insertPos + newAttrPairs); | ||
| attrs = newattrs; |
There was a problem hiding this comment.
minor: This could be cleaned up a bit. E.g. assign direct to attrs, set insertPos to 0. set numAttrs to newAttrPairs.
Fine to do in a separate PR.
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - looks like there's some build issues at the moment:
FAILED: system/jlib/CMakeFiles/jlib.dir/jptree.cpp.o
ccache /opt/rh/devtoolset-11/root/usr/bin/c++ -DCPPHTTPLIB_USE_POLL -DINLINE_GET_CYCLES_NOW -DJLIB_EXPORTS -DOPENTELEMETRY_ABI_VERSION_NO=1 -DPROTOBUF_USE_DLLS -DUCHAR_TYPE=uint16_t -DUSE_LIBMEMCACHED -DUSE_OPENTEL_GRPC -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -D_LARGEFILE_SOURCE=1 -D_NO_MYSQL_REPOSITORY -D_USE_APR -D_USE_AZURE -D_USE_CPPUNIT -D_USE_ICU -D_USE_LIBARCHIVE -D_USE_LIBXML2 -D_USE_LIBXSLT -D_USE_NUMA -D_USE_OPENLDAP -D_USE_OPENSSL -D_USE_OPENSSLV3 -D_USE_PARQUET -D_USE_ZLIB -D_USRDLL -D__USE_FILE_OFFSET64=1 -D__USE_LARGEFILE64=1 -Djlib_EXPORTS -I/hpcc-dev/HPCC-Platform/system/jlib -I/hpcc-dev/HPCC-Platform/system/win32 -I/hpcc-dev/HPCC-Platform/system/include -I/hpcc-dev/HPCC-Platform/system/lzma -I/hpcc-dev/HPCC-Platform/system/globalid -I/hpcc-dev/HPCC-Platform/system/security/shared -I/hpcc-dev/HPCC-Platform/system/security/cryptohelper -I/hpcc-dev/HPCC-Platform/system/httplib -I/hpcc-dev/build/system/jlib -I/hpcc-dev/build -I/hpcc-dev/build/oss -isystem /hpcc-dev/vcpkg_installed/x64-linux-dynamic/include -isystem /hpcc-dev/vcpkg_installed/x64-linux-dynamic/lib/pkgconfig/../../include -frtti -fPIC -fmessage-length=0 -Werror=format -Wformat-security -Wformat-nonliteral -pthread -Wuninitialized -Werror=return-type -Werror=format-nonliteral -Wno-psabi -Wparentheses -Werror=delete-non-virtual-dtor -Wall -Wextra -Wno-switch -Wno-unused-parameter -Werror -O3 -DNDEBUG -g -fno-inline-functions -g -fno-default-inline -fPIC -MD -MT system/jlib/CMakeFiles/jlib.dir/jptree.cpp.o -MF system/jlib/CMakeFiles/jlib.dir/jptree.cpp.o.d -o system/jlib/CMakeFiles/jlib.dir/jptree.cpp.o -c /hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp
/hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:3832:6: error: no declaration matches 'void LocalPTree::deserializeAttributes(const char*, PTreeDeserializeContext&)'
3832 | void LocalPTree::deserializeAttributes(const char *base, PTreeDeserializeContext &ctx)
| ^~~~~~~~~~
/hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:3832:6: note: no functions named 'void LocalPTree::deserializeAttributes(const char*, PTreeDeserializeContext&)'
In file included from /hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:46:
/hpcc-dev/HPCC-Platform/system/jlib/jptree.ipp:903:17: note: 'class LocalPTree' defined here
903 | class jlib_decl LocalPTree : public PTree
| ^~~~~~~~~~
/hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:4063:6: error: no declaration matches 'void CAtomPTree::deserializeAttributes(const char*, PTreeDeserializeContext&)'
4063 | void CAtomPTree::deserializeAttributes(const char *base, PTreeDeserializeContext &ctx)
| ^~~~~~~~~~
/hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:4063:6: note: no functions named 'void CAtomPTree::deserializeAttributes(const char*, PTreeDeserializeContext&)'
In file included from /hpcc-dev/HPCC-Platform/system/jlib/jptree.cpp:46:
/hpcc-dev/HPCC-Platform/system/jlib/jptree.ipp:869:17: note: 'class CAtomPTree' defined here
869 | class jlib_decl CAtomPTree : public PTree
| ^~~~~~~~~~
2.Ensured deserializeAttributes function is called in deserialization. Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
2.Code cleanup in CAtomPTree::deserializeAttributes 3.Remove need to add numAttrs to newAttrPairs in LocalPTree::deserializeAttributes Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
@jakesmith should be fixed in aff3127 |
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
ghalliday
left a comment
There was a problem hiding this comment.
Looks good to me. Still some scope for cleaning the code up, but no reason not to merge.
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - please see comments.
| const char *attrValue = base + ctx.matchOffsets[i + 1]; | ||
| AttrValue *v = &newattrs[insertPos++]; | ||
|
|
||
| if (!v->key.set(attrName)) |
There was a problem hiding this comment.
attrHT needs protecting (as setAtttribute does).
To optimize, you could delay until needed in this needed (CLeavableCriticalBlock is likely helpful here).
| attrs = newattrs; | ||
| numAttrs = insertPos + newAttrPairs; | ||
|
|
||
| // Set attribute values and update qualifier map if needed |
There was a problem hiding this comment.
[comment generated by copilot]
correctness: Stale comment
The comment says "update qualifier map if needed" but the qualifier map is never updated (correctly so, given dbgassertex(!arrayOwner) above). Gavin already suggested a clearer replacement:
// Qualifier map will never be set up during deserialization, so no need to update.| return; | ||
|
|
||
| // Allocate space for existing and new attributes | ||
| AttrValue *newAttrs = (AttrValue *)realloc(attrs, (numAttrs + newAttrPairs) * sizeof(AttrValue)); |
There was a problem hiding this comment.
[comment generated by copilot]
style: Inconsistent variable naming
This uses newAttrs (camelCase) while the CAtomPTree version at line 4072 uses newattrs (lowercase), which matches the existing setAttribute() convention. Consider aligning to newattrs for consistency.
| dbgassertex(numAttrs == 0); | ||
| unsigned insertPos = 0; // Position to insert new attributes | ||
| AttrValue *newattrs = newAttrArray(insertPos + newAttrPairs); | ||
| attrs = newattrs; |
There was a problem hiding this comment.
attrs could be non-null, i.e. IPropertyTree::deserialize could at least theorectically be called on a tree with content already, so it should free it (freeAttrArray).
There was a problem hiding this comment.
With comment #21165 (comment) attrs is now asserted as null.
| // Allocate memory for all attributes (existing + new) in one go, and copy existing attributes if needed. | ||
| dbgassertex(numAttrs == 0); | ||
| unsigned insertPos = 0; // Position to insert new attributes | ||
| AttrValue *newattrs = newAttrArray(insertPos + newAttrPairs); |
There was a problem hiding this comment.
this needs protecting (as setAttribute does)
| virtual void serializeToStream(IBufferedSerialOutputStream &out) const = 0; | ||
| virtual void deserializeFromStream(IBufferedSerialInputStream &in, PTreeDeserializeContext &ctx) = 0; | ||
|
|
||
| virtual void deserializeAttributes(const char *base, PTreeDeserializeContext &ctx) = 0; |
There was a problem hiding this comment.
this doesn't need to be a exposed member of the interface (and it's at odds with the fact the concrete imlps. are in "protected".). It is and internal method only.
| VStringBuffer err("PTree deserialization error: out of memory reading attributes, numAttrs=%u, newAttrPairs=%u", numAttrs, newAttrPairs); | ||
| throwUnexpectedX(err.str()); | ||
| } | ||
| attrs = newAttrs; |
There was a problem hiding this comment.
there's nothing currently stopping IPT::deserialize (and hence this) being called on a tree with content, so 'attrs' may be non-null (and arrayOwner too).
I think in practice it is assumed that deserialize is always called on an empty tree though, so it's okay.
But if it's going to assert on dbgassertex(!arrayOwner), it should also dbgassertex(!attr) it should do so early on, and (be in unlikely wrappers), with a comment saying deserialize only supproted on empty tree.
There was a problem hiding this comment.
I could not directly wrap dbgassertex in an unlikely wrapper so I turned them into unlikely ifs.
| value = nullptr; | ||
| } | ||
|
|
||
| void PTree::deserializeAttributes(const char *base, PTreeDeserializeContext &ctx) |
There was a problem hiding this comment.
what is this implementation for? LocalPTree and CAtomPTree need a flavour of this, but nothing calls this version, and like PTree::setAttribute, this one should be left as abstract.
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
214a012 to
a298496
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
87979ed to
1c43a82
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
1c43a82 to
b05ef2b
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
b05ef2b to
8dd6e99
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
2.Ensure comments are correct 3.Correct variable name newattrs to newAttrs 4.Ensure deserializeAttributes is an internal method 5.Set dbgassertex as unlikely 6.Remove PTree::deserializeAttributes(...) as unused Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
8dd6e99 to
0b62fc4
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
Type of change:
Checklist:
Smoketest:
Testing:
The unit tests
PtreeThreadingStressTestandPTreeSerializationDeserializationTestwere used to test this change.