AWS Neuron SDK : Neuron Compiler / Neuron Runtime とは？

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

AWS Neuron SDK については、github にドキュメントがあります。

TensorFlow、MXNet、PyTorch に対応するドキュメントがあります。

各ドキュメントに従って、下記のフローにて、Deployまでできそうです。

ChooseML Framework
Build Model
Train Model
Compile Model using AWS Neuron
Deploy Model on Inf1 and AWS Inferentia

今日は、この中の共通部分である「Compile Model using AWS Neuron」を見ていきます

コンパイラは、Neuron Compiler ( neuron-cc) です。このドキュメントによると、Neuron Compiler は、TensorFlow、MXNet、PyTorch、ONNXモデルを読み込んで、Ahead-of-Time (AOT) compiler として動くようです。Neuron Compiler では、モデル内のFP32をBF16に変換しています。最終的に、NEFF file (Neuron Executable File FOrmat)として出力し、このファイルは Neuron Runtime で使われます。

Neuron Compiler の Command line reference のページを眺めていたら、対応フレームワークは、TensorFlow、MXNet、ONNX となっていますね。TensorFlowでは、Frozen GraphDef と SavedModel をサポートしているようです。

Neuron Runtime は Inferentia chips上でNEFF file内のコードを実装させています。また、Neuron Runtime は、違ったモデルを NeuronCore Group に割り当てて実行するということもしています。

NeuronCore Group とは、

A NeuronCore Group is a set of NeuronCores that are used to load and run a compiled model. At any point in time, only one model will be running in a NeuronCore Group. Within a NeuronCore Group, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another.

にあるように、複数のNeuronCoreにて1つもモデルを実行させるためのもののようです。複数のNeruonCoreを繋げて動かすのが、NeuronCore Pipeline です。

チップ内には4コアあるので、コア間を繋げて、パイプライン処理させるというものです。inf1.xlarge と inf1.2xlarge ではなんでかがわかりませんが、NeuronCore Pipeline Model はサポートされていないようですね。ビデオのこのスライドにあります。

NeuronCore Pipelineを有効にするには、NeuronCore Compiler 実行時にパラメータとして、--num-neuroncores とコア数を指定する必要があります。デフォルトは 1 が指定されているので、Pipelineはできません。ということで、1チップでも4コア載っているので、--num-neuroncores 4 と指定すれば、4コアでの Pipeline ができそうなのですが、何故か？

inf1.xlarge と inf1.2xlarge では、Pipeline Model をサポートしていないと。とほほ。え、日本語のサイトを見てみたら、違いますね。スライドでは、NeuronCore Pipeline Mode とありますが、サイトではInferentia チップ間相互接続とあるので、Pipeline Modelはサポートするが、チップが1個なので、チップ間相互接続はサポートしないということなんですね。ということで、 inf1.xlarge と inf1.2xlarge でも Pipeline Model が利用できるようです。