はじめに

Googlte TPU Trillium (v6) の preview が始まったようです。

Google TPU Trillium preview

下記のGoogleのブログにて、

Trillium: Ushering in a new era of TPU performance

という項目がありました。

cloud.google.com

Compared to TPU v5e, Trillium delivers:

Over 4x improvement in training performance
Up to 3x increase in inference throughput
A 67% increase in energy efficiency
An impressive 4.7x increase in peak compute performance per chip
Double the High Bandwidth Memory (HBM) capacity
Double the Interchip Interconnect (ICI) bandwidth
Trillium can scale up to 256 chips

TPU v5e の性能は、

BF16 : 197 TFLOPs
int8 : 393 TFLOPs
HBM : HBM2/16 GB, 819 GBps
インターチップ相互接続 BW/1600 Gbps

ということは、

HBM : 32GB
ICI : 3200Gbps
Over 4x improvement in training performance
Up to 3x increase in inference throughput
A 67% increase in energy efficiency
An impressive 4.7x increase in peak compute performance per chip

は、Spec値じゃないので、比較できなさそう。

下記のベンチマーク、モデルのサイズが 70B が一番大きいものですね。

TPU v6e

あー、Trillium は、v6e なんですね。

cloud.google.com

BF16 : 197 TFLOPs => 918 TFLOPs
int8 : 393 TOPs => 1836 TOPs
HBM容量 : 16 GB => 32 GB
HBM帯域 : 819 GBps => 1640 GBps
ICI帯域 : 1600 Gbps => 3584 Gbps

BF16 が 918 TFLOPs だと、H100よりちょっと低いぐらいですね。

pricing

v6e : $2.7

v5e : $1.2

BF16/int8 の性能が 4.66倍で、Pricing が 2.25倍なので、コスパは2倍になっていますね。

おわりに

Each v6e chip contains one TensorCore. Each TensorCore has 4 matrix-multiply units (MXU), a vector unit, and a scalar unit.

とありますので、v5e と基本的には同じですね。MXU のサイズを 128 x 128 を 256 x 256 にすると、4倍になりますね。4.66/4 = 1.165 なので、動作周波数を 1.165倍にしているんでしょうね。

Trillium は、v6e 。ということは、v6p が出てくるんでしょうかね。

下記の自分のXの投稿を見つけたんですが、2025年に v6p が出てきそうですね。

Broadcomの資料のGoogleのところ

2024 - 2025 の2つのチップがあるが、
Google I/O 2024では、v6 Trillium ということで、
v6e/v6p ではないと考えると、左側が v6 で、右側はその次の v7 になるのかな？

v6 は、TSMCの N3Eかな？
N3Eは、2023年第4四半期に量産を開始
N3Pは、2024年に量産開始予定 pic.twitter.com/VflvEcX3Eb
— Vengineerの妄想 (@Vengineer) 2024年5月15日

Vengineerの妄想

人生を妄想しています。

Google TPU Trillium (v6e)、preview 始まる！

はじめに

Google TPU Trillium preview

TPU v6e

pricing

おわりに