はじめに

AWS Trainium / Inferentia2 の NeuronCore-v2 に入った、GPSIMD Engine。

図を見ると、3個ぐらい入っているのですが、これ、何でしょうか？

下図はAWSのサイトから持ってきました。説明のために引用します。

AWS Neuron Documention の中を探ってみたら

の中で、GPSIMD をキーワードで検索したら出てきました。

In this tutorial, we will build on the small MLP model shown in Neuron Custom C++ Operators in MLP Training and demonstrate methods to optimize the performance of a custom C++ operator. We will be taking advantage of the TCM accessor as well as the usage of multiple GPSIMD cores to enhance performance.

とあります。

このドキュメントの Extending the example to utilize multiple GPSIMD cores の relu_forward というメソッドの中に、下記のコードがあります。cpu というキーワードから、GPSIMD Engineの実態はCPUのようです。

    uint32_t cpu_id = get_cpu_id();
    uint32_t cpu_count = get_cpu_count();
    uint32_t partition = num_elem / cpu_count;
    if (cpu_id == cpu_count - 1) {
        partition = num_elem - partition * (cpu_count - 1);
    }

Using multiple GPSIMD cores に、

uint32_t get_cpu_count()
  Return the total number of available GPSIMD cores.

と説明がありました。

NeuronCore-v2の説明

NeuronCore-v2 Architecture — AWS Neuron Documentationには、

NeuronCore-v2 also introduces a new engine, called GPSIMD-Engine. This engine consists of 8 fully programmable 512-bit wide general-purpose processors, which can execute straight-line C-code, and have direct access to the other NeuronCore-v2 engines, as well as the embedded on-chip SRAM memory. With these cores, customers can implement custom-operators and execute them directly on the NeuronCore engines.

とありました。