Xilinx ACAPのAI Engines について

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

C4MLでのXilinxの発表資料：Compiling Deep Neural Networks for ACAP Devicesの23ページ～34ページまでにACAPのAI Enginesの説明があります。

26ページ目
　・AI Engines : Tile-Based Architecture

　　ISA-based Vector Processor
　　Local Memory with Data Mover
　　Interconnect (Non-Blocking Interconnect & Cascade Interface)

ISA-based Vecot Processorは、プログラミング可能。

27ページ目
　・1GHz+ Processor Core

　　Scalar Unitは、32-bit Scalar RISC Processor
　　　7+ operations/clock cycle ( 2 Vector Loads / 1 Mult / 1 Store / 2 Scalar Ops / Stream Access )
　　Vector Unitは、Vector Processor (512-bit SIMD Datapath)
　　　Multiple vector lines (Vector Datapath, 8 / 16 / 32-bit & SPFP operands )
　　Local, Shareable Memory (32KB Local, 128KB Addressable)

    128 Macs / Clock Cycle per Core (INT8)

このプロセッサって、RISC-Vかな、それともカスタムコアかな？

28ページ目
　・Arrary Architecture 

　　Modular and scalable architecture
      More tiles = more compute
      Up to 400 per device
      Versal AI Core VC1902 devicde
      Tape-out Dec 2018
　　Array of AIEngines
　　　Increase in compute, memory and communication bandwidth

VC1902というデバイス(2018.12にテープアウト)だと、400個以上のコア

29-30ページ目

　・Direct Memory Access

　　違うAI CoresのLocal, Shareable Memoryに直接アクセス => これって、Shareable Memoryだけかな？

31ページ目

　・Remote Memory Access

　　Data Mover を利用して、Local, Shareable Memory <=> Local, Shareable Memory

32ページ目

　・Explicit Data Movement Architecture

    Memory Communication
      Dataflow Pipeline
　  　Dataflow Graph

    Streaming Communication
      Non-Neighbor
      Streaming Multicast
      Cascade Streaming

このページ結構重要かも。。。一般的なプロセッサベースのAI Enginesでは、ここまでの機能は無いね！
Xilinxは画像処理などのハードウェアを知っているからなのか？

33ページ目

　・AI Engine Integration wit Versal ACAP

    Dataflow connections between AI Cores and PL

      TB/s of Interface Bandwidth

    AI cores share external memory with PS L PL

NOC経由で、AI Engines と PL 部を接続可能。。。

34ページ目

　・AI Engine Scale Out

    複数の Versal ACAP を接続して、Scale Out するって

凄いなー。

Vengineerの妄想

人生を妄想しています。

Xilinx ACAPのAI Engines について