0

What I am trying to implement is a way to broadcast a 32bit integer to a 256bit YMM register in C effectively using intel intrinsics. The twist is however, that I want each bit of the 32bit integer to be translated into either a 0x00 or 0xFF byte in the register, depending on whether the bit was 0 or 1 in my integer.

For instance, if I had a 4bit integer with the bits 0011 and a 16bit register, I would want the 16bit register to end up with the content: 0000 0000 1111 1111

If I use the usual intel intrinsic functions for broadcasting, I would end up with a register of the form: 0011 0011 0011 0011.

Since the lowest intel intrinsic shuffle possible is working on bytes, I cannot shuffle the bits around to match afterwards.

The only solution I have found, is to use an if for each bit before the register, and thus prepare the data beforehand and then load it into the register. Like this pseudo-C code-snippet:

if(some_int & 1) {
   expanded_bit[0] = 0xFF;
}
if(some_int & 2) {
   expanded_bit[1] = 0xFF;
}
if(some_int & 4) {
   expanded_bit[2] = 0xFF;
}
if(some_int & 8) {
   expanded_bit[3] = 0xFF;
}
some_register = _mm256_load_epi8(expanded_bit[0], expanded_bit[1], expanded_bit[2], expanded_bit[3]);

This does not really seem efficient though... (And one could say it maybe defeats the purpose of SIMD, if the overhead of preparing the data equals the time gained by using SIMD operations).

oPolo
  • 516
  • 3
  • 14
  • 1
    To clarify, you want the reverse of `VPMOVMSKB`? Something like http://stackoverflow.com/q/21622212/3185968? – EOF May 26 '16 at 14:23
  • Exactly! Or well, almost. The difference is that I want all 8 bits in each byte inside the YMM to have the same value as the corresponding bit in the int, not just the most significant bit of the byte. – oPolo May 26 '16 at 14:30
  • Hmmm, just read that question through that you send... I think actually they gave a hint, as to how it could be done efficiently using _mm256_blendv_epi8. I can have two registers, one all 1's and one all 0's. I can then construct the register I want from my integer using that as a mask... Gonna try it right now! Thanks! If you suggest that as an answer in just a short way, I'll mark it as the answer, as I would not have come up with that without any hints. – oPolo May 26 '16 at 14:33
  • 1
    This answer http://stackoverflow.com/a/21625569/3185968 to the question I linked does what you need, I think. If it does, I'd recommend marking this question as duplicate. – EOF May 26 '16 at 14:33
  • ups, _mm256_blend_epi32 I meant – oPolo May 26 '16 at 14:37
  • 2
    Since you want all-zero or all-one after broadcast, this isn't an *exact* duplicate. Use a `_mm256_cmpeq_epi8` against a vector of all-zeros after using Z Boson's shuffle/AND method. Actually, replace the AND with ANDN to invert each bit, because you'll invert again with a compare for `== 0`. – Peter Cordes May 26 '16 at 22:40

0 Answers0