PlaidML、その1 - Vengineerの妄想

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

このブログでも何度か紹介しました PlaidML、
ここからの引用です。

    PlaidML Architecture Overview¶

      At a High Level PlaidML Consists of:

      A core that exposes a C and C++ API:
        A HAL API and a library of backends that implement it (OpenCL/LLVM/etc)
        A runtime which takes tile code, optimizes it based on parameters from the HAL, 
        and a Platform that schedules operations and memory layout based on the type of Platform (Local / Remote)

      Python bindings built on top of the C API
        An operations library which is a generic library of tile code
        An API that can be called directly or used to develop other frontends

      Frontend adapters that utilize the op library and the API to implement support for that frontend
        ONNX
        Keras

ここにあるステップでインストールおよびベンチマークをWindows subsystem for Linuxでやってみたら、
plaidbench keras mobilenet の実行はできました。
一所懸命、CPUで頑張っているようです。

$ plaidbench keras mobilenet
/mnt/c/Users/haray/home/src/plaidml/local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the se
cond argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dt
ype(float).type`.
  from ._conv import register_converters as _register_converters
Running 1024 examples with mobilenet, batch size 1
INFO:plaidml:Opening device "llvm_preview_cpu.0"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
17211392/17225924 [============================>.] - ETA: 0sModel loaded.
Compiling network...
Warming up ...
Main timing
Example finished, elapsed: 6.58216619492 (compile), 197.281311035 (execution), 0.192657530308 (execution per example)
Correctness: PASS, max_error: 2.04187854251e-05, max_abs_error: 1.58697366714e-06, fail_ratio: 0.0

$ plaidbench --batch-size 16 keras --train mobilenet
/mnt/c/Users/haray/home/src/plaidml/local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the se
cond argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dt
ype(float).type`.
  from ._conv import register_converters as _register_converters
Running 1024 examples with mobilenet, batch size 16
Loading CIFAR data
Downloading data from http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170401792/170498071 [============================>.] - ETA: 0sINFO:plaidml:Opening device "llvm_preview_cpu.0"
Model loaded.
Compiling network...
Epoch 1/1
INFO:plaidml:Analyzing Ops: 663 of 2387 operations complete
INFO:plaidml:Analyzing Ops: 1290 of 2387 operations complete
INFO:plaidml:Analyzing Ops: 1596 of 2387 operations complete
INFO:plaidml:Analyzing Ops: 1978 of 2387 operations complete
Illegal typecast: float to <16 x float>
Set --print-stacktraces to see the entire traceback

2017/11/2(木)、PlaidML

2018/4/5(木)、PlaidMLがLLVMを使うようになったと

2018/5/16(水)、PlaidMLにて、ONNX import をサポート

5月は3日連続でブログを更新しています。

5/17 : Fully Automatic Differentiation for Tensor Expressions

5/18 : Automatic Kernel Generation in PlaidML

5/19 : Tensor Compilers: Comparing PlaidML, Tensor Comprehensions, and TVM

特に最後のブログエントリでは、Tensor Comprehensions と TVM の実行時間とコンパイル時間のベンチマークをやっています。

PlaidMLは、TVMと同様にOpenCLをサポートしています。

PlaidMLは、

1)、Autotunerの速度が速い(1カーネル当たり1秒以下)、
2)、Automatic Operation Gradientsをサポートしている

点です。