0

I have an old implementation that used the _mm256_exp_ps() function, and I could compile them with GCC, ICC, and Clang; Now, I cannot compile the code anymore because the compiler does not find the function _mm256_exp_ps().

Here is the simplified version of my problem:

#include <stdio.h>
#include <x86intrin.h>

int main()
{
    __m256 vec1, vec2;
    vec2 = _mm256_exp_ps(vec1);

    return 0;
}

And the error is:

$ gcc -march=native  temp.c -o temp
temp.c: In function β€˜main’:
temp.c:9:16: warning: implicit declaration of function β€˜_mm256_exp_ps’; did you mean β€˜_mm256_rcp_ps’? [-Wimplicit-function-declaration]
    9 |         vec2 = _mm256_exp_ps(vec1);
      |                ^~~~~~~~~~~~~
      |                _mm256_rcp_ps
temp.c:9:16: error: incompatible types when assigning to type β€˜__m256’ from type β€˜int’

Which means the compiler cannot find the intrinsic.

If I use another function, for example, _mm256_add_ps(), there are no errors, which means the library is accessible; the problem is with _mm256_exp_ps() that might have been changed when they have added AVX512 support to the compiler.

#include <stdio.h>
#include <x86intrin.h>

int main()
{
    __m256 vec1, vec2;
    vec2 = _mm256_add_ps(vec1, vec2);

    return 0;
}

Could you please help me solve the problem?

2
  • 5
    The SVML library is proprietary. Have you tried compiling with ICC? See stackoverflow.com/questions/36636159/… Commented Feb 3, 2023 at 21:27
  • 4
    I think this post about _mm256_pow_ps is relevant. I guess your _mm256_exp_ps is (similarly) not an actual intrinsic but part of the SVML library. Commented Feb 3, 2023 at 21:27

1 Answer 1

0

As a workaround, which should hopefully allow you to compile and run your program, you could include a function yourself with the same name. If it is not a performance critical part of your program, it might be an acceptable fix. Below are SSE and AVX versions of the function.

#include <stdio.h>
#include <math.h>
#include <immintrin.h>
#include <xmmintrin.h>

// gcc Junk.c -o Junk.bin -mavx -lm
// gcc Junk.c -o Junk.bin -msse4 -lm

__m128 _mm128_exp_ps(__m128 invec) {
  float *element = (float *)&invec;  
  return _mm_setr_ps(
    expf(element[0]),
    expf(element[1]),
    expf(element[2]),
    expf(element[3])
    );
}
/*
__m256 _mm256_exp_ps(__m256 invec) {
  float *element = (float *)&invec;
  return _mm256_setr_ps(
    expf(element[0]),
    expf(element[1]),
    expf(element[2]),
    expf(element[3]),
    expf(element[4]),
    expf(element[5]),
    expf(element[6]),
    expf(element[7])
    );
}
*/
int main()
{
  __m128 vec1, vec2;
  vec1 = _mm_setr_ps( 1.0, 1.1, 1.2, 1.3);
  vec2 = _mm128_exp_ps(vec1);
  float *element = (float *)&vec2;
  int i;
  for (i=0; i<4; i++) {
      printf("%f %f\n", element[i], expf(1.0f + i/10.0f));
  }

  return 0;
}

EDIT:- After comments by Peter Cordes about possible undefined behaviour when setting a float pointer to a _mm128 or _mm256 variable, I thought I'd add a suggestion for maximum safety and portability taken from the suggestions in the links he provided. I don't know for sure that there is a problem with the above code due to alignment issues, but it appears that the more correct way to do this would be to replace the line

  float *element = (float *)&invec;

with

  float element[4];
  _mm_storeu_ps(element, invec);

and

  float element[8];
  _mm256_storeu_ps(element, invec);

for the SSE and AVX functions respectively.

4
  • float *element = (float *)&invec; is strict-aliasing UB, I think. Possibly ok in GNU C where __m128 is typedef float __m128 __attribute__((vector_size(16),may_alias)) since it's also a float pointer, but I'm not at all sure. Also, if the compiler doesn't call a vectorized exp function, you actually want it to spill to a temporary array and only reload the one float element, not tempt it into reloading the whole vector and shuffling between every call. print a __m128i variable shows how to access all elements of a vector portably. Commented Feb 4, 2023 at 0:32
  • 1
    Anyway yes this works, but a manually vectorized exp function is not that hard; many implementations are floating around online with various speed vs. precision tradeoffs. (Especially if you don't need to handle NaNs, or maybe even ignoring subnormals). Fastest Implementation of Exponential Function Using AVX has some good ones that are fast and e.g. wim's answer has relative error of about +-4e-8. AVX-512 getexpps / getmantps are quite useful for implementing exp/log. Commented Feb 4, 2023 at 0:49
  • I did wonder about this when I wrote it, but I thought the invec variable would be aligned adequately. I don't really want to depend on alignas(). I'll add an edit with the suggestions from the links on how this should be done in the safest most portable way. Commented Feb 4, 2023 at 11:34
  • I said strict aliasing UB, not alignment. Those are two separate things. In your version using an array, yes, alignment becomes relevant because alignof(float) is less than alignof(__m128). So yes, an unaligned store or alignas(16) float element[4]; and _mm_store_ps. Some compilers will choose to align the array on their own to avoid possible cache-line splits for the store, not that it even matters since store-forwarding still works. Commented Feb 4, 2023 at 16:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.