Cerebras CS-1のScaledML2020のプレゼン資料

記録のために。

GPUでの学習では、

　　　Run one layer at once.

　　　Medium batch size

　　　Run layer replicas on parallel devices.

　　　Extremely large batch size

　　　Weight sync overhead

WSEでの学習では、

　　　Run layer replicas on fabric sections

　　　Medium batch size

　　　Low weiht sync overhead

　　　Replicas not constrained to device

　　　Run one layer at once on entire wafer

　　　Small batch size

　　　No weight sync overhead

　　　Run all layers on fabric sections

　　　Medium batch size

　　　No weight sync overhead

　　　Pipelined Bakpropagation

　　　Weight update without draining pipeline

　　　Arbitrarily small batch size

　　　No weight sync overhead

　　　Staleness compensation in deep networks

Vengineerの妄想