やっとというか、とうとうというか、OpenCV 3.0 RCがリリースされました。
引用 The new acceleration layer, OpenCV HAL, will help to accelerate OpenCV on various platforms. It will grow substantially during OpenCV 3.x lifetime; now it can be considered as technology preview thing. Yet, it can be already useful because of the so-called “universal intrinsics” that will let you to write a code that is optimized for both SSE and NEON.
HALが登場したのね。ChangeLogには、
引用 Preliminary version of OpenCV HAL, low-level acceleration API beneath OpenCV, has been introduced. Currently it includes just a few math functions, but will grow soon. It also includes so-called "universal intrinsics", inspired by NEON=>SSE conversion header by Victoria Zhislina from Intel: https://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks. The idea is that one can use a single SIMD code branch that will compile to either SSE or NEON instructions depending on the target platform. For example, // a, b and c are floating-point arrays for( int i = 0; i < n; i+=4 ) v_store(c + i, v_load(a+i) + v_load(b+i)); will be expanded to either for( int i = 0; i < n; i+=4 ) _mm_storeu_ps(c + i, _mm_add_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i)); or for( int i = 0; i < n; i+=4 ) vst1q_f32(c + i, vaddq_f32(vld1q_f32(a+i), vld1q_f32(b+i)); Using such intrinsics one can write accelerated code, debug it on desktop and then run it without any changes on ARM and get reasonable performance.