Vengineerの妄想

人生を妄想しています。

Cerebras CS-1のScaledML2020のプレゼン資料

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそすべては、SystemC v0.9公開から始まった 

記録のために。

ScaledML2020での、Cerebras Systemsのプレゼン資料 からのメモ

GPUでの学習では、

  • Traditional: Layer-Sequntial (Single GPU : Modest Data Parallel)

   Run one layer at once.

   Medium batch size

  • Traditional: Layer-Sequntial (GPU Cluster : Extreme Data Parallel)

   Run layer replicas on parallel devices.

   Extremely large batch size

   Weight sync overhead

 

 WSEでの学習では、

  • WSE: Layer-Sequntial (Replicas: Modest Data Parallel)

   Run layer replicas on fabric sections

   Medium batch size

   Low weiht sync overhead

  • WSE: Layer-Sequntial (Non Replicas: Low Data Parallel)

   Replicas not constrained to device

   Run one layer at once on entire wafer

   Small batch size

   No weight sync overhead

  • WSE: Layer-Pipelined (Model Parallel, Modest Data Parallel)

   Run all layers on fabric sections

   Medium batch size

   No weight sync overhead

  •  WSE: Layer-Pipelined (Model Parallel, Low Data Parallel)

   Pipelined Bakpropagation

   Weight update without draining pipeline

   Arbitrarily small batch size

   No weight sync overhead

   Staleness compensation in deep networks