I'm using both SSEx and AVXx intrinsics instruction. When I'm using Intel SSE2 or AVX2 and want to load a vector from memory I should use the following instruction (data type is int):
_mm_load_si128( (__m128i *)&a[ i ][ j ]);
_mm256_load_si256( (__m256i *)&a[ i ][ j ]);
and when the data type is float I should use like follows:
_mm_load_ps(&a[ i ][ j ]);
_mm256_load_ps(&a[ i ][ j ]);
so the question is what is the differences between float and int loading from memory that need a (type *) or not?