Is your feature request related to a problem or challenge?
Yes, current implementation doesn't adjust timestamp-related types, which could lead to more partitions to be scanned during query.
Describe the solution you'd like
The function project_binary adjusts boundary if possible.
It does 4 things:
- adjust boundary (i.e., boundary for '<' to boundary to '<=')
- transform (i.e., converts timestamp type to int type)
- fix negative value (i.e., timestamp earlier than epoch)
- adjust operator, which turns '<' to '<=', and '>' to '>='
Currently for operation (1), we don't support timestamp types
|
PredicateOperator::LessThan => match (datum.data_type(), datum.literal()) { |
|
(PrimitiveType::Int, PrimitiveLiteral::Int(v)) => Some(Datum::int(v - 1)), |
|
(PrimitiveType::Long, PrimitiveLiteral::Long(v)) => Some(Datum::long(v - 1)), |
|
(PrimitiveType::Decimal { .. }, PrimitiveLiteral::Int128(v)) => { |
|
Some(Datum::decimal(decimal_from_i128_with_scale(v - 1, 0))?) |
|
} |
|
(PrimitiveType::Date, PrimitiveLiteral::Int(v)) => Some(Datum::date(v - 1)), |
|
(PrimitiveType::Timestamp, PrimitiveLiteral::Long(v)) => { |
|
Some(Datum::timestamp_micros(v - 1)) |
|
} |
|
_ => Some(datum.to_owned()), |
|
}, |
|
PredicateOperator::GreaterThan => match (datum.data_type(), datum.literal()) { |
|
(PrimitiveType::Int, PrimitiveLiteral::Int(v)) => Some(Datum::int(v + 1)), |
|
(PrimitiveType::Long, PrimitiveLiteral::Long(v)) => Some(Datum::long(v + 1)), |
|
(PrimitiveType::Decimal { .. }, PrimitiveLiteral::Int128(v)) => { |
|
Some(Datum::decimal(decimal_from_i128_with_scale(v + 1, 0))?) |
|
} |
|
(PrimitiveType::Date, PrimitiveLiteral::Int(v)) => Some(Datum::date(v + 1)), |
|
(PrimitiveType::Timestamp, PrimitiveLiteral::Long(v)) => { |
|
Some(Datum::timestamp_micros(v + 1)) |
|
} |
|
_ => Some(datum.to_owned()), |
|
}, |
Combined with (1) and (4), it's possible that we could scan more partitions than we should
For example, the query is ts < 2023-06-15T00:00:00Z (= micros value 1686787200000000)
Under the current impl,
- adjust_boundary: 1686787200000000 (unchanged)
- operator: turns from '<' to '<='
- day(1686787200000000) = 19523 (June 15th)
- result: p <= 19523, which scans June 15th too
Willingness to contribute
I can contribute to this feature independently
Is your feature request related to a problem or challenge?
Yes, current implementation doesn't adjust timestamp-related types, which could lead to more partitions to be scanned during query.
Describe the solution you'd like
The function
project_binaryadjusts boundary if possible.It does 4 things:
Currently for operation (1), we don't support timestamp types
iceberg-rust/crates/iceberg/src/spec/transform.rs
Lines 660 to 683 in 4b13ad5
Combined with (1) and (4), it's possible that we could scan more partitions than we should
For example, the query is ts < 2023-06-15T00:00:00Z (= micros value 1686787200000000)
Under the current impl,
Willingness to contribute
I can contribute to this feature independently