OpenCV 3.0 RC - Vengineerの妄想(準備期間)

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

やっとというか、とうとうというか、OpenCV 3.0 RCがリリースされました。

引用
The new acceleration layer, OpenCV HAL, will help to accelerate OpenCV on various platforms. It will grow substantially during OpenCV 3.x lifetime; now it can be considered as technology preview thing. Yet, it can be already useful because of the so-called “universal intrinsics” that will let you to write a code that is optimized for both SSE and NEON.

HALが登場したのね。ChangeLogには、

引用
Preliminary version of OpenCV HAL, low-level acceleration API beneath OpenCV, has been introduced. Currently it includes just a few math functions, but will grow soon. It also includes so-called "universal intrinsics", inspired by NEON=>SSE conversion header by Victoria Zhislina from Intel: https://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks. The idea is that one can use a single SIMD code branch that will compile to either SSE or NEON instructions depending on the target platform. For example,
// a, b and c are floating-point arrays

    for( int i = 0; i < n; i+=4 )
        v_store(c + i, v_load(a+i) + v_load(b+i));

will be expanded to either

    for( int i = 0; i < n; i+=4 )
       _mm_storeu_ps(c + i, _mm_add_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i));

or
    for( int i = 0; i < n; i+=4 )
        vst1q_f32(c + i, vaddq_f32(vld1q_f32(a+i), vld1q_f32(b+i));

Using such intrinsics one can write accelerated code, debug it on desktop and then run it without any changes on ARM and get reasonable performance.

x86 SIMDとARM NEONをサポートしてくれているのね。素晴らしい。