Graphcoreの3つのケースとは？

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

下記の「London Meetup: Deep Dive into TensorFlow #27」に関するTweetから

ハッシュタグ #TensorFlowLDN で Graphcore に関するものを見ていきます。

www.eventbrite.com

The final #TensorFlowLDN of the year 🎄 and David Lacey from @graphcoreai kicking us off @ThinkRiseLDN.

Subscribe to our YouTube channel for speaker content 📺 https://t.co/fSwMaj5Qs8 #machinelearning #datascience #TensorFlow pic.twitter.com/IUEKrfbyPR
— Seldon (@seldon_io) 2019年12月4日

How limitations in processor speed drove the innovation of the Colossus IPU

David Lacey of @graphcoreai at #TensorFlowLDN @ThinkRiseLDN for @seldon_io pic.twitter.com/gqKhybcAKC
— Lee Baker (@BakerLJ) 2019年12月4日

Managing the full pipeline on IPUs.

Awesome depth of thinking that David and the @graphcoreai team has put into #machinelearning on chip to create the next breakthroughs. #TensorFlowLDN pic.twitter.com/cQGS7KxxDg
— Seldon (@seldon_io) 2019年12月4日

この最後の写真にある3つのケース

MODEL SHARDING
MODEL REPLICATION
MODEL PIPELING

これ、昨日のgithub の examples にもありました。

TensorFlow の code_examples の中に。

SHARDING
REPLICATION
PIPELING => これは、PopART

ここの説明によれば、Sharding とは、以下のように1つのモデルを複数チップ実行するときのオプション。

--shards : The number of IPUs to split the model over (default 1). If shards > 1 then the first part of the model will be run on one IPU with later parts run on other IPUs with data passed between them. This is essential if the model is too large to fit on a single IPU, but can also be used to increase the possible batch size. As an advanced usage, the sharding algorithm can be influenced using the --sharding-exclude-filter and --sharding-include-filter options. These specify sub-strings of edge names that can be used when looking for a cutting point in the graph.

このファイルが、-shards の例題になっている。

tensorflow.python.ipu.autoshard.ipu_autoshard() にて、自動分割してくれる。

tensorflow.python.ipu.scopes.ipu_shard(n) にて、手動分割ができる。

cfg = utils.create_ipu_config(profiling=True)
cfg = utils.auto_select_ipus(cfg, NUM_SHARDS)

utils.configure_ipu_system(cfg)

にて、分割を指定できる。あとは普通にTensorFlowのsessionにて、run するだけ。

このファイルが、REPLICATIONの例

# To use replication, we make as many feeds as there are replicated IPUs by passing in replication_factor
infeed = ipu_infeed_queue.IPUInfeedQueue(dataset, replication_factor=opts.replication_factor, feed_name='in')
outfeed = ipu_outfeed_queue.IPUOutfeedQueue(replication_factor=opts.replication_factor, feed_name='out')

InfeedQueueとOutputfeedQueueの replication_factorを指定すればいいみたい。

PIPELININGについては、Image Classification Inference には、

Run inference using optimized data pipelining for image classification with pre-trained weights

とありました。

また、ここには、

--pipeline-splits
--pipeline-depth

というオプションの説明があります。

--pipeline-depth : When a model is run over multiple shards (see --shards) pipelining the data flow can improve throughput by utilising more than one IPU at a time. At present the splitting points for the pipelined model must be specified, with one less split than the number of shards used. Use --pipeline-splits to specifiy the splits - if omitted then the list of available splits will be output.