Glow、その2 - Vengineerの妄想

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

Glow、その1に続けて、色々調べた。

TwitterにTweetしたものを残しておきます。

　・The name Glow is an abbreviation for Graph-Lowering.

　・論文

　・関連するプロジェクト：TensorFlow XLA, TVM/NNVM, DLVM, Tensor Comprehensions and nGraph
　　一応、全部、チェックしている。DLVMはどうやら止めて、Swift for TensorFlow に移行するって

　・サポートしているBackendは、Interpreter、OpenCL、CPU

　・C++コード(LeNet MNIST)による例題
　　モデル(グラフ構築)作って、EE.compile(CompilationMode::Infer, F);　でMachine Codeを生成するって感じです。

　・サポートしているimportモデルは、Caffe2とONNX

ONNXをサポートしているので、推論時はどんなフレームワークで作ったものでもONNX exportできれば、OK!
ということで、Glowは推論エンジンのランタイムとして使える。。

　・Intepreterは、遂次実行(各レイヤに対するコードがハードコーディングされている)
　　CPUは、LLVにてコード生成
　　OpenCLは、遂次実行(http://kernel.cl というファイル内にハードコーディングされている)

　・最適化は、2種類。
　　1)、The high-level intermediate representation allows the optimizer to perform
　　　　domain-specific optimizations.

　　　　1)、不必要な変換操作をやらない。
　　　　2)、Conv層と BN層を融合する
　　　　次に、CPUの最適化を行う。

2)、The lower-level instruction-based address-only intermediate representation

allows the compiler to perform memory-related optimizations,

　・最適化コード：
　　グラフ最適化 => glow::optimize コード
　　Lower => glow::lower コード
　　IR最適化 => glow::optimize コード

　・Lowerでは、メモリ関連の最適化を行っている
　　　instruction scheduling
　　　static memory allocation
　　　copy elimination.

　　　"node lowering"を実行する。
　　　このフェーズでは、
　　　コンパイラは High-level operator nodes を Low-level linear algebra operator nodes に変換する。

　・性能

Glow is up to 2.5x faster than TensorFlow.
This is due to the fact that TensorFlow calls into Eigen which implements convolution
using the classic im2col followed by matrix multiplication,
while Glow compiles direct convolution and thus avoids im2col overhead.
(畳み込み演算の部分で差を出している)

　・量子化
　　Glow uses profile-guided quantization, observing execution

during inference to estimate the possible numeric range for each stage of the neural network.
Training-based quantization is considered future work.