Google では、TPU で XLA を使っていると

Vengineerの戯言 : Twitter、Slideshare
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

Google Groupに、Status of XLAという投稿がありました。

記録として、引用します。

Dear All,

What is the status of XLA in terms of maturity? Is there any work that explains its performance using some standard machine learning algorithms? What is the main shortcoming of this framework?


Thanks,
malik

これに対して、(引用します。)

The relevant answer depends a bit on whether you are thinking of implementing a backend using XLA for your hardware or if you are looking to use XLA to do machine learning as a user.

We have major production services using XLA at Google.

A big benefit is that XLA will fuse nodes together automatically, instead of having to introduce manually fused versions of op combinations from your model. We will also generate optimized code specifically for your model, instead of statically template-generating a huge number of optimized special cases with Eigen that get shipped with TensorFlow and then possibly still missing the specific case in your model. I'm not aware of a published direct benchmark comparison, though in any case to know the impact on a specific model you'd have to try it and see.

XLA needs to know/infer the bounds of arrays statically, which is not ideal if the bounds vary with the data. This can be already worked around via padding and bucketing. This is a limitation that could be lifted, though it would be a lot of work, so there are no promises around that, and knowing the bounds does allow a performance boost. Knowing the bounds will likely also simplify your work if you are looking to implement a backend.

つまり、Googleでは、プロダクションレベルでXLAを使っていると。

LinkedInによると、回答したBjarke Rouneさんは、GoogleでTPUのXLAを開発しているチームでTech Leaderのようです。

これで、裏付けが取れました。

追記)、2017.12.11
NIPS17でのJeff Deanさんのプレゼンテーション資料を追加

Vengineerの妄想

人生を妄想しています。

Google では、TPU で XLA を使っていると