Skip to content

Fix/tree query and rle decoder#764

Open
ColinLeeo wants to merge 5 commits intodevelopfrom
fix/tree-query-and-rle-decoder
Open

Fix/tree query and rle decoder#764
ColinLeeo wants to merge 5 commits intodevelopfrom
fix/tree-query-and-rle-decoder

Conversation

@ColinLeeo
Copy link
Copy Markdown
Contributor

@ColinLeeo ColinLeeo commented Apr 1, 2026

This pull request introduces several important bug fixes and improvements to the RLE (Run-Length Encoding) decoding logic, especially for handling large run counts and memory management, as well as fixes for device path handling and table lookup in the TsFile tree reader. It also adds comprehensive regression tests to prevent recurrence of these issues.

RLE Decoder Improvements and Bug Fixes

  • Updated Int32RleDecoder to correctly read multi-byte LEB128 varint headers for RLE run counts, fixing decoding errors for large runs (e.g., run_count ≥ 64). The decoder now uses read_var_uint for the header and properly distinguishes between RLE and bit-packing modes. (cpp/src/encoding/int32_rle_decoder.h)
  • Fixed memory management in Int32RleDecoder::reset() to use mem_free instead of delete[] for freeing buffers allocated with mem_alloc, preventing crashes and undefined behavior. (cpp/src/encoding/int32_rle_decoder.h)
  • Added and updated regression tests for RLE decoding, covering large run counts, multiple consecutive runs, and memory management after reset. (cpp/test/encoding/int32_rle_codec_test.cc)

Device Path Parsing and Table Lookup Fixes

  • Fixed StringArrayDeviceID construction to correctly split device ID strings, ensuring accurate extraction of table names and device names for deep device paths (e.g., "root.sensors.TH"). (cpp/src/common/device_id.cc) [1] [2]
  • Improved TsFileIOReader::load_device_index_entry to use find() instead of operator[] when looking up table metadata, avoiding unintended insertion of null entries and returning proper error codes when a table or device does not exist. (cpp/src/file/tsfile_io_reader.cc)

Test Enhancements

  • Added regression tests in TsFileTreeReaderTest for:
    • Querying devices with deep dot-separated paths, ensuring correct table and device name extraction.
    • Handling missing measurements gracefully without crashing, verifying that the correct error code is returned instead of a segmentation fault. (cpp/test/reader/tree_view/tsfile_reader_tree_test.cc)

These changes collectively improve the robustness and correctness of RLE decoding and device/table handling in the codebase.

@ColinLeeo ColinLeeo requested a review from Copilot April 1, 2026 10:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes regressions in tree-table querying and RLE decoding behavior, and adds tests to prevent crashes/misaligned decoding from reappearing.

Changes:

  • Fix table metadata lookup to avoid accidental map insertion and null dereference when loading device index entries.
  • Correct Int32 RLE header decoding to read varint headers and properly handle RLE runs vs bit-packing; fix allocator mismatch in reset/free paths.
  • Add regression tests covering deep device paths and large RLE run-count headers.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cpp/test/reader/tree_view/tsfile_reader_tree_test.cc Adds regression tests for deep device paths and missing measurements; introduces a non-hermetic local “simpletest”.
cpp/test/encoding/int32_rle_codec_test.cc Adds regression tests for large varint run counts and reset/free correctness; adds a helper to craft RLE segments.
cpp/src/file/tsfile_io_reader.cc Avoids operator[] insertion by switching to find() and handling null entries for table index lookups.
cpp/src/encoding/int64_rle_decoder.h Adds RLE-run handling path, but still reads only a 1-byte header which can’t support large varint headers.
cpp/src/encoding/int32_rle_decoder.h Reads header as varint and adds explicit RLE-run decoding; fixes mem_alloc/mem_free mismatch in reset and run decoding.
cpp/src/common/device_id.cc Adjusts segment handling to avoid incorrect table-name splits for deep device paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +512 to +524

TEST_F(TsFileTreeReaderTest, simpletest) {
TsFileReader reader;
reader.open("/Users/colin/Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/wxid_197w1jpv66ag22_cc63/msg/file/2026-03/1761643915818-1-0-0.tsfile");
ResultSet* result;
int ret = reader.query_table_on_tree({"t", "h"}, INT64_MIN,
INT64_MAX, result);
ASSERT_EQ(ret, E_OK);

auto* table_result_set = (storage::TableResultSet*)result;
bool has_next = false;
print_table_result_set((table_result_set));
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit test is not hermetic: it depends on an absolute, machine-specific path and will fail in CI/other dev machines. It also doesn’t clean up the ResultSet via destroy_query_data_set(...) or reader.close() on success. Please remove this test, or convert it into a disabled/manual test that uses repo testdata (checked-in fixture) and performs proper cleanup.

Suggested change
TEST_F(TsFileTreeReaderTest, simpletest) {
TsFileReader reader;
reader.open("/Users/colin/Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/wxid_197w1jpv66ag22_cc63/msg/file/2026-03/1761643915818-1-0-0.tsfile");
ResultSet* result;
int ret = reader.query_table_on_tree({"t", "h"}, INT64_MIN,
INT64_MAX, result);
ASSERT_EQ(ret, E_OK);
auto* table_result_set = (storage::TableResultSet*)result;
bool has_next = false;
print_table_result_set((table_result_set));
}

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +119
@@ -112,7 +112,11 @@ class Int64RleDecoder : public Decoder {
common::SerializationUtil::read_ui8(header, byte_cache_))) {
return ret;
}
call_read_bit_packing_buffer(header);
if (header & 1) {
call_read_bit_packing_buffer(header);
} else {
call_read_rle_run(header);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Int64 decoder still reads the run header as a single byte (read_ui8). In the RLE/bit-packing hybrid format, the group header is a varint, so run counts like 64+ require multi-byte headers (same regression you fixed in Int32RleDecoder). This will mis-decode/lose alignment for larger run counts. Update this to read an unsigned varint header value (e.g., read_var_uint(...) into a wider integer type) and pass that through to both the RLE-run and bit-packing paths.

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +131
int call_read_rle_run(uint8_t header) {
int ret = common::E_OK;
int run_length = (int)(header >> 1);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because call_read_rle_run takes uint8_t header, run_length is capped to 127 and cannot represent larger runs even if the encoded header is a multi-byte varint. This should take the full decoded varint header value (e.g., uint32_t/uint64_t header_value) and compute run_length = header_value >> 1 to support large run counts.

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +179
for (int i = 0; i < byte_width; i++) {
common::SerializationUtil::write_ui8((value >> (i * 8)) & 0xFF,
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extracts bytes from a signed int32_t using right-shift. Right-shifting negative signed integers is implementation-defined in C++. Even though current test values are positive, this helper is generic and could be reused; consider casting to uint32_t before shifting (or using a uint32_t u = static_cast<uint32_t>(value) for extraction) to make the encoding well-defined.

Suggested change
for (int i = 0; i < byte_width; i++) {
common::SerializationUtil::write_ui8((value >> (i * 8)) & 0xFF,
uint32_t u = static_cast<uint32_t>(value);
for (int i = 0; i < byte_width; i++) {
common::SerializationUtil::write_ui8((u >> (i * 8)) & 0xFF,

Copilot uses AI. Check for mistakes.
Comment on lines +461 to +465
ResultSet* result;
// query_table_on_tree used to SEGV here due to wrong table-name lookup
ASSERT_EQ(E_OK,
reader.query_table_on_tree({m_temp, m_humi}, INT64_MIN,
INT64_MAX, result));
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other test in this file (and to avoid any accidental use of an uninitialized pointer if this code is modified later), initialize result to nullptr before passing it to query_table_on_tree.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.04348% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.72%. Comparing base (75e0664) to head (f5ee23c).

Files with missing lines Patch % Lines
cpp/src/encoding/int64_rle_decoder.h 23.80% 16 Missing ⚠️
cpp/src/file/tsfile_io_reader.cc 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #764      +/-   ##
===========================================
+ Coverage    62.70%   62.72%   +0.01%     
===========================================
  Files          705      705              
  Lines        42118    42152      +34     
  Branches      6204     6216      +12     
===========================================
+ Hits         26411    26439      +28     
- Misses       14772    14782      +10     
+ Partials       935      931       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants