Expanded math ops#114
Open
ShrutheeshIR wants to merge 10 commits into
Open
Conversation
ShrutheeshIR
commented
Jun 20, 2026
ShrutheeshIR
commented
Jun 20, 2026
| } | ||
|
|
||
| template <typename DataA, typename DataB, typename DataC> | ||
| inline constexpr auto blend(const DataA &a, const DataB &b, const DataC &mask ) -> DataC |
Contributor
Author
There was a problem hiding this comment.
Codegen can generate blend() ops, good to use it right
zkingston
requested changes
Jun 20, 2026
zkingston
left a comment
Collaborator
There was a problem hiding this comment.
Overall good, please add the NEON and WASM equivalents.
Comment on lines
+58
to
+72
| template <typename DataA, typename DataB, typename DataC> | ||
| inline constexpr auto blend(const DataA &a, const DataB &b, const DataC &mask) -> DataC | ||
| { | ||
| if constexpr (std::is_arithmetic_v<DataC>) | ||
| { | ||
| return (mask >= 0) ? a : b; | ||
| } | ||
| else | ||
| { | ||
| DataC a_vec(a); | ||
| DataC b_vec(b); | ||
| return a_vec.blend(b_vec, mask); | ||
| } | ||
| } | ||
|
|
Collaborator
There was a problem hiding this comment.
What's this extra blend instruction?
Contributor
Author
There was a problem hiding this comment.
Just a unified expr for scalar and a vectorized conditional expr
cppad sometimes spits out a
v[14] = blend(v[14], -0.9999, (v[14]) - (-0.9999)); I want to be able to handle that, so I'll just pin it to the datatype of the first arg
zkingston
approved these changes
Jun 22, 2026
Collaborator
|
Can you verify the formatting is correct and that all builds pass? Otherwise LGTM. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I want more math ops for fancier math. Main changes
asin,acosandataninverse trigonometric functions inavx.hh. -- Usecephesimplementations from here. Note that I have used_mm256_sqrt_psops here instead of the custom definedsqrt(), b/c the precision is pretty poor andacoshas differences in the 3rd FP itself.math.hhto include these ops and other operations that are useful in generalTODO --
Tested it using
PS: The custom implementation of
sqrthonestly is pretty bad for most use cases, I think the default behavior should be_mm256_sqrt_ps, and we can expose asqrt_low_precisionfor other use cases (eg cc). This is the 2nd time I've wasted a couple of hours tracking down precision issues from my implementation only to realize sqrt is the culprit :(