Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes regressions in tree-table querying and RLE decoding behavior, and adds tests to prevent crashes/misaligned decoding from reappearing.
Changes:
- Fix table metadata lookup to avoid accidental map insertion and null dereference when loading device index entries.
- Correct Int32 RLE header decoding to read varint headers and properly handle RLE runs vs bit-packing; fix allocator mismatch in reset/free paths.
- Add regression tests covering deep device paths and large RLE run-count headers.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/test/reader/tree_view/tsfile_reader_tree_test.cc | Adds regression tests for deep device paths and missing measurements; introduces a non-hermetic local “simpletest”. |
| cpp/test/encoding/int32_rle_codec_test.cc | Adds regression tests for large varint run counts and reset/free correctness; adds a helper to craft RLE segments. |
| cpp/src/file/tsfile_io_reader.cc | Avoids operator[] insertion by switching to find() and handling null entries for table index lookups. |
| cpp/src/encoding/int64_rle_decoder.h | Adds RLE-run handling path, but still reads only a 1-byte header which can’t support large varint headers. |
| cpp/src/encoding/int32_rle_decoder.h | Reads header as varint and adds explicit RLE-run decoding; fixes mem_alloc/mem_free mismatch in reset and run decoding. |
| cpp/src/common/device_id.cc | Adjusts segment handling to avoid incorrect table-name splits for deep device paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| TEST_F(TsFileTreeReaderTest, simpletest) { | ||
| TsFileReader reader; | ||
| reader.open("/Users/colin/Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/wxid_197w1jpv66ag22_cc63/msg/file/2026-03/1761643915818-1-0-0.tsfile"); | ||
| ResultSet* result; | ||
| int ret = reader.query_table_on_tree({"t", "h"}, INT64_MIN, | ||
| INT64_MAX, result); | ||
| ASSERT_EQ(ret, E_OK); | ||
|
|
||
| auto* table_result_set = (storage::TableResultSet*)result; | ||
| bool has_next = false; | ||
| print_table_result_set((table_result_set)); | ||
| } |
There was a problem hiding this comment.
This unit test is not hermetic: it depends on an absolute, machine-specific path and will fail in CI/other dev machines. It also doesn’t clean up the ResultSet via destroy_query_data_set(...) or reader.close() on success. Please remove this test, or convert it into a disabled/manual test that uses repo testdata (checked-in fixture) and performs proper cleanup.
| TEST_F(TsFileTreeReaderTest, simpletest) { | |
| TsFileReader reader; | |
| reader.open("/Users/colin/Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/wxid_197w1jpv66ag22_cc63/msg/file/2026-03/1761643915818-1-0-0.tsfile"); | |
| ResultSet* result; | |
| int ret = reader.query_table_on_tree({"t", "h"}, INT64_MIN, | |
| INT64_MAX, result); | |
| ASSERT_EQ(ret, E_OK); | |
| auto* table_result_set = (storage::TableResultSet*)result; | |
| bool has_next = false; | |
| print_table_result_set((table_result_set)); | |
| } |
cpp/src/encoding/int64_rle_decoder.h
Outdated
| @@ -112,7 +112,11 @@ class Int64RleDecoder : public Decoder { | |||
| common::SerializationUtil::read_ui8(header, byte_cache_))) { | |||
| return ret; | |||
| } | |||
| call_read_bit_packing_buffer(header); | |||
| if (header & 1) { | |||
| call_read_bit_packing_buffer(header); | |||
| } else { | |||
| call_read_rle_run(header); | |||
| } | |||
There was a problem hiding this comment.
The Int64 decoder still reads the run header as a single byte (read_ui8). In the RLE/bit-packing hybrid format, the group header is a varint, so run counts like 64+ require multi-byte headers (same regression you fixed in Int32RleDecoder). This will mis-decode/lose alignment for larger run counts. Update this to read an unsigned varint header value (e.g., read_var_uint(...) into a wider integer type) and pass that through to both the RLE-run and bit-packing paths.
cpp/src/encoding/int64_rle_decoder.h
Outdated
| int call_read_rle_run(uint8_t header) { | ||
| int ret = common::E_OK; | ||
| int run_length = (int)(header >> 1); |
There was a problem hiding this comment.
Because call_read_rle_run takes uint8_t header, run_length is capped to 127 and cannot represent larger runs even if the encoded header is a multi-byte varint. This should take the full decoded varint header value (e.g., uint32_t/uint64_t header_value) and compute run_length = header_value >> 1 to support large run counts.
| for (int i = 0; i < byte_width; i++) { | ||
| common::SerializationUtil::write_ui8((value >> (i * 8)) & 0xFF, |
There was a problem hiding this comment.
This extracts bytes from a signed int32_t using right-shift. Right-shifting negative signed integers is implementation-defined in C++. Even though current test values are positive, this helper is generic and could be reused; consider casting to uint32_t before shifting (or using a uint32_t u = static_cast<uint32_t>(value) for extraction) to make the encoding well-defined.
| for (int i = 0; i < byte_width; i++) { | |
| common::SerializationUtil::write_ui8((value >> (i * 8)) & 0xFF, | |
| uint32_t u = static_cast<uint32_t>(value); | |
| for (int i = 0; i < byte_width; i++) { | |
| common::SerializationUtil::write_ui8((u >> (i * 8)) & 0xFF, |
| ResultSet* result; | ||
| // query_table_on_tree used to SEGV here due to wrong table-name lookup | ||
| ASSERT_EQ(E_OK, | ||
| reader.query_table_on_tree({m_temp, m_humi}, INT64_MIN, | ||
| INT64_MAX, result)); |
There was a problem hiding this comment.
For consistency with the other test in this file (and to avoid any accidental use of an uninitialized pointer if this code is modified later), initialize result to nullptr before passing it to query_table_on_tree.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #764 +/- ##
===========================================
+ Coverage 62.70% 62.72% +0.01%
===========================================
Files 705 705
Lines 42118 42152 +34
Branches 6204 6216 +12
===========================================
+ Hits 26411 26439 +28
- Misses 14772 14782 +10
+ Partials 935 931 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request introduces several important bug fixes and improvements to the RLE (Run-Length Encoding) decoding logic, especially for handling large run counts and memory management, as well as fixes for device path handling and table lookup in the TsFile tree reader. It also adds comprehensive regression tests to prevent recurrence of these issues.
RLE Decoder Improvements and Bug Fixes
Int32RleDecoderto correctly read multi-byte LEB128 varint headers for RLE run counts, fixing decoding errors for large runs (e.g., run_count ≥ 64). The decoder now usesread_var_uintfor the header and properly distinguishes between RLE and bit-packing modes. (cpp/src/encoding/int32_rle_decoder.h)Int32RleDecoder::reset()to usemem_freeinstead ofdelete[]for freeing buffers allocated withmem_alloc, preventing crashes and undefined behavior. (cpp/src/encoding/int32_rle_decoder.h)cpp/test/encoding/int32_rle_codec_test.cc)Device Path Parsing and Table Lookup Fixes
StringArrayDeviceIDconstruction to correctly split device ID strings, ensuring accurate extraction of table names and device names for deep device paths (e.g., "root.sensors.TH"). (cpp/src/common/device_id.cc) [1] [2]TsFileIOReader::load_device_index_entryto usefind()instead ofoperator[]when looking up table metadata, avoiding unintended insertion of null entries and returning proper error codes when a table or device does not exist. (cpp/src/file/tsfile_io_reader.cc)Test Enhancements
TsFileTreeReaderTestfor:cpp/test/reader/tree_view/tsfile_reader_tree_test.cc)These changes collectively improve the robustness and correctness of RLE decoding and device/table handling in the codebase.