Using -march will also allow you more possibilities to use 3rd party closed source as well. You should be able to link -mcpu=cortex-r5 with -march=armv7-r code; well it is fine in one directions, so the tools may complain.
What are the differences and tradeoffs between -march=haswell, -march=core-avx2, and -mavx2 for compiling avx2 intrinsics? I know that -mavx2 is a flag and -march=haswell/core-avx2 are architectures which just translate to a bunch of flags. So -mavx2 is a subset of the other two. But beyond that, how do I choose the right one for my application?
As I understand it, -march=native will detect the ISA and extensions to use from cpuid (which include model, family and stepping information). -march=xxx will use a baseline set of extensions and a baseline ISA. There are a lot of possible combinations of extensions, so only the most relevant were chosen (e.g. skylake-avx512 was added to reflect an important extension of some skylakes). -march ...
Internet search for "-march=armv8.2-a+i8mm" turns up nearly nothing helpful. Either build_aar.sh is asking for an arch that doesn't make sense, or I need to plug in a version of clang that supports that arch.
-march: generate instructions for a specific machine type. Defaults to x86-64-v3 on AMD64 and armv8-a on AArch64. Use -march=compatibility for best compatibility, or -march=native for best performance if a native executable is deployed on the same machine or on a machine with the same CPU features. To list all available machine types, use ...
I'm compiling my C++ app using GCC 4.3. Instead of manually selecting the optimization flags I'm using -march=native, which in theory should add all optimization flags applicable to the hardware I'm
On x86 processors, just use -march=native. GCC will handle the rest by setting arch and tune to same value. ARM is trickier since GCC sometimes segfault's when using -march=native. You should also use a modern GCC or maybe Clang. Clang creates better code than GCC with some SIMD source code. You will need to benchmark to determine which performs best for your code.