1

I am trying to find the sum of all bytes in an __m128 register using SSE and SSE2.

So far what I have is

__m128i sum = _mm_sad_epu8(bytes, _mm_setzero_si128());
return  _mm_cvtsi128_si32(sum) + _mm_extract_epi16(sum, 4);

where bytes is the __m128 value that contains the bytes that I want to find the sum of.

This works, however I am getting a lot of overflows which leads to me getting the wrong values. Is there a way to do this without getting overflows?

Alternatively I was thinking about just adding them to an array and summing them up that way, however I haven't been able to find a store method for bytes.

Unfortunately I can only support SSE and SSE2 methods.

Thank you for your help!

  • 1
    `_mm_sad_epu8` against zero sums unsigned bytes without overflow. [How to horizontally sum signed bytes in XMM](https://stackoverflow.com/q/70370454) shows how to adapt that for signed, if that's what you mean by "overflow". See [Fastest way to do horizontal SSE vector sum (or other reduction)](https://stackoverflow.com/q/6996764) in general for various techniques for different sizes of elements. Also, `__m128` is float, `__m128i` is integer. – Peter Cordes Jun 08 '23 at 23:18

0 Answers0