Skip to content

Expanded math ops#114

Open
ShrutheeshIR wants to merge 10 commits into
KavrakiLab:mainfrom
CoMMALab:expanded_math_ops
Open

Expanded math ops#114
ShrutheeshIR wants to merge 10 commits into
KavrakiLab:mainfrom
CoMMALab:expanded_math_ops

Conversation

@ShrutheeshIR

@ShrutheeshIR ShrutheeshIR commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

I want more math ops for fancier math. Main changes

  1. Add asin, acos and atan inverse trigonometric functions in avx.hh. -- Use cephes implementations from here. Note that I have used _mm256_sqrt_ps ops here instead of the custom defined sqrt(), b/c the precision is pretty poor and acos has differences in the 3rd FP itself.
  2. Add them to the interface as well
  3. Expanded math.hh to include these ops and other operations that are useful in general

TODO --

  • Add them to neon
  • Add wasm

Tested it using

    size_t total_tests_failed = 0;
    std::cout << "Check atan2 implementation" << std::endl;
    for (auto i = -10; i <= 10; ++i)
    {
        for (auto j = -10; j <= 10; ++j)
        {
            if (i == 0 && j == 0)
            {
                continue;
            }
            float angle = std::atan2(static_cast<float>(i), static_cast<float>(j));

            // now convert it to a FloatVector<rake> and call atan
            vamp::FloatVector<rake> i_vec;
            i_vec[0] = static_cast<float>(i);
            vamp::FloatVector<rake> j_vec;
            j_vec[0] = static_cast<float>(j);
            auto angle_vec = atan2(i_vec, j_vec);

            if (std::abs(angle - angle_vec[{0, 0}]) > 1e-6)
            {
                std::cout << "Mismatch for (" << i << ", " << j << "): "
                          << "std::atan2 = " << angle << ", "
                          << "FloatVector<rake>::atan2 = " << angle_vec[{0, 0}] << std::endl;
                total_tests_failed++;
            }
        }
    }

    // check acos and asin   
    std::cout << "Check acos and asin implementation" << std::endl;
    for (auto i = -10; i <= 10; ++i)
    {
        float value = static_cast<float>(i) / 10.0f;  // values from -1.0 to 1.0
        if (value < -1.0f || value > 1.0f)
        {
            continue;  // skip out-of-range values
        }
        float angle = std::acos(value);
        float asin_angle = std::asin(value);

        vamp::FloatVector<rake> value_vec;
        value_vec[0] = value;
        auto angle_vec = acos(value_vec);
        auto asin_angle_vec = asin(value_vec);

        if (std::abs(angle - angle_vec[{0, 0}]) > 1e-6)
        {
            std::cout << "Mismatch for " << value << ": "
                      << "std::acos = " << angle << ", "
                      << "FloatVector<rake>::acos = " << angle_vec[{0, 0}] << std::endl;
            total_tests_failed++;
        }
        if (std::abs(asin_angle - asin_angle_vec[{0, 0}]) > 1e-6)
        {
            std::cout << "Mismatch for " << value << ": "
                      << "std::asin = " << asin_angle << ", "
                      << "FloatVector<rake>::asin = " << asin_angle_vec[{0, 0}] << std::endl;
            total_tests_failed++;
        }

    }
    std::cout << "Total tests failed : " << total_tests_failed << std::endl;

PS: The custom implementation of sqrt honestly is pretty bad for most use cases, I think the default behavior should be _mm256_sqrt_ps, and we can expose a sqrt_low_precision for other use cases (eg cc). This is the 2nd time I've wasted a couple of hours tracking down precision issues from my implementation only to realize sqrt is the culprit :(

@ShrutheeshIR ShrutheeshIR reopened this Jun 20, 2026
Comment thread src/impl/vamp/vector/isa/avx.hh
Comment thread src/impl/vamp/vector/math.hh Outdated
}

template <typename DataA, typename DataB, typename DataC>
inline constexpr auto blend(const DataA &a, const DataB &b, const DataC &mask ) -> DataC

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen can generate blend() ops, good to use it right

@zkingston zkingston left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good, please add the NEON and WASM equivalents.

Comment on lines +58 to +72
template <typename DataA, typename DataB, typename DataC>
inline constexpr auto blend(const DataA &a, const DataB &b, const DataC &mask) -> DataC
{
if constexpr (std::is_arithmetic_v<DataC>)
{
return (mask >= 0) ? a : b;
}
else
{
DataC a_vec(a);
DataC b_vec(b);
return a_vec.blend(b_vec, mask);
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this extra blend instruction?

@ShrutheeshIR ShrutheeshIR Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a unified expr for scalar and a vectorized conditional expr
cppad sometimes spits out a
v[14] = blend(v[14], -0.9999, (v[14]) - (-0.9999)); I want to be able to handle that, so I'll just pin it to the datatype of the first arg

@ShrutheeshIR ShrutheeshIR requested a review from zkingston June 21, 2026 23:02
@zkingston

Copy link
Copy Markdown
Collaborator

Can you verify the formatting is correct and that all builds pass? Otherwise LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants