I've never written assembly code for SSE optimization, so sorry if this is a noob question. In this aritcle is explained how to vectorize a for with a conditional statement. However, my code (taken from here ) is of the form:
for (int j=-halfHeight; j<=halfHeight; ++j)
{
for(int i=-halfWidth; i<=halfWidth; ++i)
{
const float rx = ofsx + j * a12;
const float ry = ofsy + j * a22;
float wx = rx + i * a11;
float wy = ry + i * a21;
const int x = (int) floor(wx);
const int y = (int) floor(wy);
if (x >= 0 && y >= 0 && x < width && y < height)
{
// compute weights
wx -= x; wy -= y;
// bilinear interpolation
*out++ =
(1.0f - wy) * ((1.0f - wx) * im.at<float>(y,x) + wx * im.at<float>(y,x+1)) +
( wy) * ((1.0f - wx) * im.at<float>(y+1,x) + wx * im.at<float>(y+1,x+1));
} else {
*out++ = 0;
}
}
}
So, from my understanding, there are several differences with the linked article:
- Here we have a nested
for: I've always seen one levelforin vectroization, never seen a nested loop - The if condition is based on scalar values (x and y) and not on the array: how can I adapt the linked example to this?
- The
outindex isn't based oniorj(so it's notout[i]orout[j]): how can I filloutin this way?
In particular I'm confused because for indexes are always used as array indexes, while here are used to compute variables while the vector is incremented cycle by cycle
I'm using icpc with -O3 -xCORE-AVX2 -qopt-report=5 and a bunch of others optimization flags. According to Intel Advisor, this is not vectorized, and using #pragma omp simd generates warning #15552: loop was not vectorized with "simd"