[10017] Slice REE values along with run_ends#10094
Conversation
|
as referenced in #9959 (comment) this could have some (minor) performance implications. |
|
i'm not sure if we can go with this approach, since the contract for slicing states it is zero-copy 🤔
https://docs.rs/arrow/latest/arrow/array/trait.Array.html#tymethod.slice (though im not sure how strict this constraint is) |
|
Yea I agree, wanted to give it an attempt. The main issue is that if the run_ends aren't re-written the logical array that is expressed is incorrect. For example but in both arrays the run_ends are |
Which issue does this PR close?
Closes #10017.
Rationale for this change
RunArray::slice()kept the full physical run_ends and values buffers, so downstream operations (e.g. length(), substring()) would iterate over physical runs outside the logical slice range. This is wasted work proportional to how narrow the slice is relative to the full array.What changes are included in this PR?
RunArray::slice()now trims values and run_ends to only the physical runs that overlap the logical slice range.Are these changes tested?
Yes
Are there any user-facing changes?
no