Skip to content

Feature request: implement boundary adjustment for timestamp types #2421

@dentiny

Description

@dentiny

Is your feature request related to a problem or challenge?

Yes, current implementation doesn't adjust timestamp-related types, which could lead to more partitions to be scanned during query.

Describe the solution you'd like

The function project_binary adjusts boundary if possible.

It does 4 things:

  • adjust boundary (i.e., boundary for '<' to boundary to '<=')
  • transform (i.e., converts timestamp type to int type)
  • fix negative value (i.e., timestamp earlier than epoch)
  • adjust operator, which turns '<' to '<=', and '>' to '>='

Currently for operation (1), we don't support timestamp types

PredicateOperator::LessThan => match (datum.data_type(), datum.literal()) {
(PrimitiveType::Int, PrimitiveLiteral::Int(v)) => Some(Datum::int(v - 1)),
(PrimitiveType::Long, PrimitiveLiteral::Long(v)) => Some(Datum::long(v - 1)),
(PrimitiveType::Decimal { .. }, PrimitiveLiteral::Int128(v)) => {
Some(Datum::decimal(decimal_from_i128_with_scale(v - 1, 0))?)
}
(PrimitiveType::Date, PrimitiveLiteral::Int(v)) => Some(Datum::date(v - 1)),
(PrimitiveType::Timestamp, PrimitiveLiteral::Long(v)) => {
Some(Datum::timestamp_micros(v - 1))
}
_ => Some(datum.to_owned()),
},
PredicateOperator::GreaterThan => match (datum.data_type(), datum.literal()) {
(PrimitiveType::Int, PrimitiveLiteral::Int(v)) => Some(Datum::int(v + 1)),
(PrimitiveType::Long, PrimitiveLiteral::Long(v)) => Some(Datum::long(v + 1)),
(PrimitiveType::Decimal { .. }, PrimitiveLiteral::Int128(v)) => {
Some(Datum::decimal(decimal_from_i128_with_scale(v + 1, 0))?)
}
(PrimitiveType::Date, PrimitiveLiteral::Int(v)) => Some(Datum::date(v + 1)),
(PrimitiveType::Timestamp, PrimitiveLiteral::Long(v)) => {
Some(Datum::timestamp_micros(v + 1))
}
_ => Some(datum.to_owned()),
},

Combined with (1) and (4), it's possible that we could scan more partitions than we should
For example, the query is ts < 2023-06-15T00:00:00Z (= micros value 1686787200000000)
Under the current impl,

  • adjust_boundary: 1686787200000000 (unchanged)
  • operator: turns from '<' to '<='
  • day(1686787200000000) = 19523 (June 15th)
  • result: p <= 19523, which scans June 15th too

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions