@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった
記録のために。
ScaledML2020での、Cerebras Systemsのプレゼン資料 からのメモ
GPUでの学習では、
- Traditional: Layer-Sequntial (Single GPU : Modest Data Parallel)
Run one layer at once.
Medium batch size
- Traditional: Layer-Sequntial (GPU Cluster : Extreme Data Parallel)
Run layer replicas on parallel devices.
Extremely large batch size
Weight sync overhead
WSEでの学習では、
- WSE: Layer-Sequntial (Replicas: Modest Data Parallel)
Run layer replicas on fabric sections
Medium batch size
Low weiht sync overhead
- WSE: Layer-Sequntial (Non Replicas: Low Data Parallel)
Replicas not constrained to device
Run one layer at once on entire wafer
Small batch size
No weight sync overhead
- WSE: Layer-Pipelined (Model Parallel, Modest Data Parallel)
Run all layers on fabric sections
Medium batch size
No weight sync overhead
- WSE: Layer-Pipelined (Model Parallel, Low Data Parallel)
Pipelined Bakpropagation
Weight update without draining pipeline
Arbitrarily small batch size
No weight sync overhead
Staleness compensation in deep networks