Glow、その4 - Vengineerの妄想

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

graphTest.cppのsimpleTestConvテストを見てみましょう。
デバッグのためにの dump 関数の使い方と、ダンプ結果。

引用して、コメントを付けています。
TEST(Graph, simpleTestConv) {
  // Moduleを宣言
  Module MD;
  // "F"という名のFunctionを生成
  Function *F = MD.createFunction("F");
  // IRFunction を宣言
  IRFunction M(F);
  // 2つのノード(KとSという変数:Variables)を生成
  Node *K = MD.createVariable(ElemKind::FloatTy, {4, 320, 200, 3}, "input");
  Node *S = MD.createVariable(ElemKind::IndexTy, {4, 1}, "select");

  // 入力をK(input)としたConvノード(Conv1)を生成
  K = F->createConv("Conv1", K, 16, 3, 2, 3, 1);
  // 入力をCinv1とした RELUノード(Relu)を生成
  K = F->createRELU("Relu", K);
  // 入力をReluとした SoftMaxノード(SoftMak)を生成
  K = F->createSoftMax("SoftMax", K, S);
  // 入力をSoftMaxとして Saveノード(Save)を生成
  F->createSave("Save", K);

  // ダンプ
  F->dump();
  // DAGをダンプ
  F->dumpDAG();

  // Lower
  lower(F, CompilationMode::Train, MockBackend());
  // 再液化
  ::optimize(F, CompilationMode::Train);
  // IRの生成
  M.generateIR();
  // ダンプ
  M.dump();

  EXPECT_GT(M.getInstrs().size(), 0);
}

下記が F->dump() の結果

Graph structure F:
Convolution
name : Conv1
Input : float<4 x 320 x 200 x 3>
Filter : float<16 x 3 x 3 x 3>
Bias : float<16>
Kernel : 3
Stride : 2
Pad : 3
Group : 1
users : 1
Result : float<4 x 162 x 102 x 16>

Relu
name : Relu
Input : float<4 x 162 x 102 x 16>
users : 1
Result : float<4 x 162 x 102 x 16>

SoftMax
name : SoftMax
Input : float<4 x 162 x 102 x 16>
Selected : index<4 x 1>
users : 1
Result : float<4 x 162 x 102 x 16>

Save
name : Save
Input : float<4 x 162 x 102 x 16>
Output : float<4 x 162 x 102 x 16>
users : 0

下記が、F->dumpDAG() の結果。これを Glow IR と呼ぶそうな。論文の Figure 3.

Writing dotty graph for Function to: dotty_graph_dump_0x4848280.dot
function F
declare {
  %input = WeightVar float<4 x 320 x 200 x 3> mutable // size: 3072000 // Users: @in 1
  %select = WeightVar index<4 x 1> mutable // size: 32
  %filter = WeightVar float<16 x 3 x 3 x 3> mutable // size: 1728 // Users: @in 1
  %bias = WeightVar float<16> mutable // size: 64 // Users: @in 1
  %Save = WeightVar float<4 x 162 x 102 x 16> mutable // size: 4230144 // Users: @out 8

  ; size = 7303968 bytes
}

下記が、M.dump() の結果

code {
  0 %Conv1.res = allocactivation  { Ty: float<4 x 162 x 102 x 16>} // size: 4230144 // Users: @out 9,
@in 5, @out 1
  1 %Conv1 = convolution @out %Conv1.res, @in %input, @in %filter, @in %bias { Kernel: 3, Stride: 2, P
ad: 3, Group: 1}
  2 %zero.res = allocactivation  { Ty: float<4 x 162 x 102 x 16>} // size: 4230144 // Users: @out 10,
@in 5, @out 3
  3 %zero = splat @out %zero.res { Value: 0.000000e+00}
  4 %relu1.res = allocactivation  { Ty: float<4 x 162 x 102 x 16>} // size: 4230144 // Users: @out 11,
 @in 7, @out 5
  5 %relu1 = elementmax @out %relu1.res, @in %Conv1.res, @in %zero.res
  6 %SoftMax.res = allocactivation  { Ty: float<4 x 162 x 102 x 16>} // size: 4230144 // Users: @out 1
2, @in 8, @out 7
  7 %SoftMax = softmax @out %SoftMax.res, @in %relu1.res
  8 %Save0 = copy @out %Save, @in %SoftMax.res
  9 %dealloc = deallocactivation @out %Conv1.res // size: 4230144
  10 %dealloc0 = deallocactivation @out %zero.res // size: 4230144
  11 %dealloc1 = deallocactivation @out %relu1.res // size: 4230144
  12 %dealloc2 = deallocactivation @out %SoftMax.res // size: 4230144
}