feat(cudf): Add GPU "and" and "or"#16913
feat(cudf): Add GPU "and" and "or"#16913simoneves wants to merge 18 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
5a7edb8 to
9c3473c
Compare
Build Impact AnalysisFull build recommended. Files outside the dependency graph changed:
These directories are not fully covered by the dependency graph. A full build is the safest option. Fast path • Graph from main@8475135005b9ad1d137f0e828770e8f191b47c75 |
… PR facebookincubator#16913 from @simoneves) Cherry-picked PR facebookincubator#16913: LogicalFunction for GPU-side 'and'/'or' evaluation, split out from Decimal Part 2. Also fixes API compatibility issues from cherry-picked PRs: - veloxToCudfTypeId → veloxToCudfDataType (removed double-wrapping) - toCudfTable missing mr parameter in CudfEnforceSingleRow Made-with: Cursor
…38/Q9 - Fix cudf::numeric_scalar calls to pass stream/mr explicitly (banned default stream/mr in this build config) - Fix veloxToCudfTypeId → veloxToCudfDataType API rename - Fix toCudfTable missing mr parameter - Add debug logging for boolean marker null counts and NLJ build/probe Cherry-picked PRs integrated: - PR facebookincubator#16522 count(*)/count(NULL) from @mattgara - PR facebookincubator#16920 CudfEnforceSingleRow from @perlitz - PR facebookincubator#16913 LogicalFunction and/or from @simoneves Made-with: Cursor
ea6a443 to
54ec1c3
Compare
CI Failure Analysis
❌ Spark Aggregate Fuzzer — FUZZER Failure View logsFuzzer: Spark Aggregation Fuzzer ( Error: Velox and reference DB (Spark) results don't match for The fuzzer instance aborted with SIGABRT after the Correlation with PR changes:
Known issues:
Reproduce locally: ./_build/debug/velox/functions/sparksql/fuzzer/spark_aggregation_fuzzer_test \
--seed 296072680 \
--duration_sec 300 \
--minloglevel=0 \
--stderrthreshold=2Recommended fix: |
54ec1c3 to
4ed41bb
Compare
CI Failure Analysis
|
4ed41bb to
ec4dc5f
Compare
CI Failure Analysis
|
8ab90fd to
1eb3d51
Compare
| // subclass(es). | ||
| virtual void convertSplit(std::shared_ptr<ConnectorSplit> split); | ||
|
|
||
| // Construct a RowTypePtr for the table column names and types. |
There was a problem hiding this comment.
Probably should go in private: section since only being used in next() of CudfHiveDataSource for now
There was a problem hiding this comment.
It's actually only used in the constructor. What am I missing?
Regardless, agreed, and moved to private: section.
| const RowTypePtr CudfHiveDataSource::getTableRowType() { | ||
| if (tableHandle_->dataColumns()) { | ||
| std::vector<std::string> names; | ||
| std::vector<TypePtr> types; | ||
| for (const auto& name : readColumnNames_) { | ||
| auto parsedType = tableHandle_->dataColumns()->findChild(name); | ||
| names.emplace_back(std::move(name)); | ||
| types.push_back(parsedType); | ||
| } | ||
| return ROW(std::move(names), std::move(types)); | ||
| } | ||
| return outputType_; | ||
| } |
There was a problem hiding this comment.
Do we need to reconstruct this everytime or can we reuse anything here?
There was a problem hiding this comment.
The function seems pretty lightweight, but I made it cache the result of the first call anyway.
CI Failure Analysis
|
|
Un-drafting for review, as this might now land before other pending stuff |
CI Failure Analysis❌ Memory Arbitration Fuzzer — FUZZER Failure View logsFuzzer: Memory Arbitration Fuzzer (instance 4 of 4) Error: The task hung during execution (driver was in Instances 1, 2, and 3 passed. Only instance 4 (seed 878309077) failed. Correlation with PR changes: This failure is not related to the PR changes. PR #16913 modifies files exclusively under Known issues: This is a known, pre-existing flaky failure tracked by multiple open issues:
The most recent Fuzzer Jobs run on Reproduce locally: ./velox_memory_arbitration_fuzzer \
--seed 878309077 \
--duration_sec 300 \
--minloglevel=0Note: This is an intermittent hang, so reproduction with the same seed is not guaranteed. Recommended fix: No action needed on this PR. The failure is a known intermittent issue in the Memory Arbitration Fuzzer infrastructure, unrelated to the cuDF changes in this PR. |
pramodsatya
left a comment
There was a problem hiding this comment.
Thanks for adding these functions @simoneves, I had one suggestion.
| std::nullopt, | ||
| }); | ||
| auto input = makeRowVector({c0, c1}); | ||
| auto exprSet = compileExpression("c0 AND c1", rowType); |
There was a problem hiding this comment.
Please try
// Use this directly if you don't want it to cast the returned vector.
VectorPtr evaluate(
const std::string& expression,
const RowVectorPtr& data,
const std::optional<SelectivityVector>& rows = std::nullopt) {
auto typedExpr = makeTypedExpr(expression, asRowType(data->type()));
return evaluate(typedExpr, data, rows);
}
There was a problem hiding this comment.
@jinchengchenghh Sorry, but I'm not sure what you mean. I see that what you pasted is one of the other variants of functions::test::FunctionBaseTest::evaluate() in FunctionBaseTest.h but I don't see how it helps here.
There was a problem hiding this comment.
@jinchengchenghh please clarify this one, thx.
There was a problem hiding this comment.
Usually, we execute
auto result = evaluate("c0 AND c1", input)
auto expected = (generate by your self)
Since we can run the CPU tests to get the expected result, could you add a new function assertCpuAndGpuEqual in CudfFunctionBaseTest ?
There was a problem hiding this comment.
@jinchengchenghh I get it now, thanks. Refactored as requested. There is still some duplicate code between each pair of tests (some more duplicated than others) but I think each one of them is short enough that any further refactoring would not make them any easier to read, so I'll stop here.
CI Failure Analysis⌛ Ubuntu debug with system dependencies — TEST Failure (Timeout) View logsFailed test: The test binary was killed by ctest's timeout. The same timeout occurred on the automatic retry. In both runs, the test that was executing when the timeout hit was: The preceding variant ( 556 of 557 tests in the binary completed successfully. Test execution summary:
Correlation with PR changes:
Known issues:
Reproduce locally: ./_build/debug/velox/exec/tests/velox_exec_test_group7 --gtest_filter="ExchangeClientTest/ExchangeClientTest.lazyFetching/CompactRow"Note: This is a flaky hang, so it may not reproduce on every run. Recommended action: |
This PR implements GPU operators for boolean "and" and "or" in Function Mode.
It was split out of Decimal Part 2, as it has nothing to do with the rest of the decimal implementation, but is required for an all-GPU processing of TPC-DS.
Per @jinchengchenghh be aware that unlike the Velox CPU implementation, logical operations with multiple column inputs cannot short-cut once a terminating case (e.g.
AND falseorOR true) is found, so performance in the case of a large number of such inputs may not be optimal. I don't know how much that's worth worrying about in practice. Note that such a short-cut IS implemented for literal inputs (i.e.<any number of columns> AND falseor<any number of columns> OR truewill not evaluate the columns at all).Note that in most cases, such expressions are handled by AST, especially after the fix in #17236, but a Function Mode fallback is still required as a backup.