AWS Neuron SDK : TensorFlow - Neuron (neuron_op)

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

昨日の続き、

tensorflow.neuron.saved_mode.compile

の中で、TensorFlowのモデルを neuron-cc にてコンパイルして結果を読みだして、 Node の 'executable' というアトリビュートに読みだしたデータを登録しています。

workdir_path = subgraph_compilers[node.name].workdir_path
executable_path = os.path.join(workdir_path, _neuron_executable_name)
with open(executable_path, 'rb') as f:
node.attr['executable'].s = f.read() if node.name in num_cores_tuple_map:

アトリビュートにデータを登録した Node の Op 名は、neuron_op です。

neruon_op は、こんな感じになっています。

@tf_export('neuron_op')
def neuron_op(input_tensors, graph_def, input_names, input_shapes, output_names, output_dtypes, output_shapes, executabl
e="", input_batch_axis=, output_batch_axis=, model_config=, name=None):
r"""TODO: add doc.

Args:
input_tensors: A list of `Tensor` objects.
graph_def: A `string`.
input_names: A list of `strings`.
input_shapes: A list of shapes (each a `tf.TensorShape` or list of `ints`).
output_names: A list of `strings`.
output_dtypes: A list of `tf.DTypes`.
output_shapes: A list of shapes (each a `tf.TensorShape` or list of `ints`).
executable: An optional `string`. Defaults to `""`.
input_batch_axis: An optional list of `ints`. Defaults to ``.
output_batch_axis: An optional list of `ints`. Defaults to ``.
model_config: An optional list of `ints`. Defaults to ``.
name: A name for the operation (optional).

Returns:
A list of `Tensor` objects of type `output_dtypes`.

executable ってありますよね。ここに、コンパイル済みのコードが入るんですね。

実際は、_pywrap_tensorflow.TFE_Py_FastPathExecute で NeuronOP を実行しています。

_result = _pywrap_tensorflow.TFE_Py_FastPathExecute(
_ctx._context_handle, _ctx._thread_local_data.device_name, "NeuronOp",
name, _ctx.post_execution_callbacks, input_tensors, "graph_def",
graph_def, "input_names", input_names, "input_shapes", input_shapes,
"output_names", output_names, "output_dtypes", output_dtypes,
"output_shapes", output_shapes, "executable", executable,
"input_batch_axis", input_batch_axis, "output_batch_axis",
output_batch_axis, "model_config", model_config)
return _result

TFE_Py_FastPathExecute は、下記のように、TFE_Py_FastPathExecute_C を呼び直しているだけ。

m.def("TFE_Py_FastPathExecute", [](const py::args args) {
// TFE_Py_FastPathExecute requires error checking prior to returning.
return tensorflow::pyo_or_throw(TFE_Py_FastPathExecute_C(args.ptr()));
});

TFE_Py_FastPathExecute_Cでは、TFE_execute で op を実行しています。

Py_BEGIN_ALLOW_THREADS;
TFE_Execute(op, retvals.data(), &num_retvals, status);
Py_END_ALLOW_THREADS;

TFE_execute では、

void TFE_Execute(TFE_Op* op, TFE_TensorHandle** retvals, int* num_retvals,
TF_Status* status) {
absl::FixedArray<std::unique_ptr<AbstractTensorHandleInterface>> handles(
*num_retvals);
status->status = op->operation->Execute(&handles, num_retvals);
if (!status->status.ok()) {
return;
}
for (int i = 0; i < *num_retvals; ++i) {
retvals[i] = new TFE_TensorHandle{std::move(handles[i])};
}
}

で、op->operation の Execute を実行しています。Execute では、EagerExecute を実行しています。

Status OperationInterface::Execute(
absl::FixedArray<std::unique_ptr<AbstractTensorHandleInterface>>* retvals,
int* num_retvals) {
absl::FixedArray<tensorflow::TensorHandle*> handle_retvals(*num_retvals);
TF_RETURN_IF_ERROR(
EagerExecute(&operation_, handle_retvals.data(), num_retvals));
for (int i = 0; i < *num_retvals; ++i) {
retvals->at(i).reset(
new tensorflow::TensorHandleInterface(handle_retvals[i]));
}
return Status::OK();
}

EagerExecute では、

return EagerLocalExecute(op, retvals, num_retvals);

NeruonOp で tensorflow_core/python/neuron/ops/gen_neuron_op.py の中でなんかやっています。NeuonOp そのものは、tensorflow_core/python/neuron/python/ops/_neuron_op.so に中にあって実際に何をやっているかは分かりません。TensorFlow の OpKernel なので、Compute というメソッドの中で実行されるとは思いますが。。。

TensorFlow XLAと同じやり方！

アクセラレータで実行するOpをまとめて、1つのOpにいれちゃうのって、TensorFlow XLAのやり方と同じですね。つまり、TensorFlowモデルを neuron-cc にてコンパイルしたものを実行する NeuronOP を実行する段階で、Inferentia に処理を依頼するという感じになっているわけです。

一般的な推論チップでは、フレームワークで学習したモデルを専用コンパイラにてコンパイルし、独自フォーマットに変換し、ファイルに出力する。出力したファイルを使って、独自APIにて推論するって感じです。C/C++/PythonなどのAPIを用意することで柔軟に対応していますが、フレームワーク内でモデルのコンパイル => 実行というのはTensorFlow XLAだけですね。そういう意味で、この AWS Neuron SDK + TensorFlow - Neuron は非常に興味深いものだと思います。

Vengineerの妄想

人生を妄想しています。

AWS Neuron SDK : TensorFlow - Neuron (neuron_op)

TensorFlow XLAと同じやり方！