!SeeDotで遊んでみる ::STM32F407 Discovery KitをArduinoで使う. * Arduino IDEをインストール * ボードマネージャのURLを追加 https://raw.githubusercontent.com/stm32duino/BoardManagerFiles/master/STM32/package_stm_index.json * ツール → ボードマネージャ で,STM32 Coreをインストール * ボードでDiscoveryを,Board part numberでSTM32F407G-DISC1を選択 * サンプルの 01.Basics→Blinkが動作するのを確認する(LD4がチカチカする) ::SeeDotを使う 仮想環境用意して,https://github.com/microsoft/EdgeML/tree/master/Tools/SeeDotに書いてある手順で実行する. * venvを用意 python3 -m venv EdgeML source ./EdgeML/bin/activate * GitHubからcloneして環境準備 git clone https://github.com/Microsoft/EdgeML cd EdgeML/tf/ pip install -r requirements-cpu.txt pip install -e . * usps10でProtoNNを学習 cd examples/ProtoNN python fetch_usps.py python process_usps.py mkdir usps10/output python protoNN_example.py --data-dir ./usps10 --projection-dim 25 --num-prototypes 55 --epochs 100 -sW 0.3 -o usps10/output *Arudino向けにビルド cd ../../../Tools/SeeDot mkdir arduino python SeeDot.py -a protonn --train ../../tf/examples/ProtoNN/usps10/train.npy --test ../../tf/examples/ProtoNN/usps10/test.npy --model ../../tf/examples/ProtoNN/usps10/output -o arduino ** こんな感じですすむ (EdgeML) miyo@tama:% python SeeDot.py -a protonn --train ../../tf/examples/ProtoNN/usps10/train.npy --test ../../tf/examples/ProtoNN/usps10/test.npy --model ../../tf/examples/ProtoNN/usps10/output -o arduino ================================ Executing on protonn for Arduino -------------------------------- Train file: ../../tf/examples/ProtoNN/usps10/train.npy Test file: ../../tf/examples/ProtoNN/usps10/test.npy Model directory: ../../tf/examples/ProtoNN/usps10/output ================================ ----------------------- Collecting profile data ----------------------- Generating input files for float training dataset...done Build...success ... Accuracy is 89.985% ------------------------------ Generating code for arduino... ------------------------------ Generating input files for fixed testing dataset...done Generating code...completed Arduino sketch dumped in the folder arduino ** できたもの (EdgeML) miyo@tama:% ls arduino arduino.ino config.h input/ library.h model.h predict.cpp predict.h ::ボードとの接続 * UART2を使う.PA2がTX, PA3がRX HardwareSerial Serial1(USART2); ** F407上のシリアル通信について - ~/.arduino15/packages/STM32/hardware/stm32/1.5.0/variants/DISCO_F407VG/PeripheralPins.c * なんかでた 4596: Predicted label: 9; True label: 9; Correct prediction 4597: Predicted label: 9; True label: 9; Correct prediction 4598: Predicted label: 9; True label: 9; Correct prediction 4599: Predicted label: 9; True label: 9; Correct prediction 4600: Predicted label: 9; True label: 9; Correct prediction ------------------------ Average prediction time: 1380.68 ------------------------ 4601: Predicted label: 9; True label: 9; Correct prediction 4602: Predicted label: 9; True label: 9; Correct prediction 4603: Predicted label: 9; True label: 9; Correct prediction * ソースコードによると,micros()を使って測定した値をイテレーション回数で割ってるみたい ** 入力データはflash上に用意されているデータ ** 1イテレーションあたり1380.68u秒で推論できてるよ,ってことみたい. * Accuracy Modeにすると,シリアルから入力を受けつけて推論できるみたい ::メモ * usps - https://www.kaggle.com/bistaumanga/usps-dataset !TVM/AWS-F1で遊んでみた(失敗) https://github.cfom/dmlc/tvm/blob/master/docs/deploy/aws_fpga.md をやってみる. まだうまくいってない. ::やってみたこと ビルド用にセットアップしたc4.4xlargeマシンにログインして,AWS-F1用の環境変数を % source src/project_data/aws-fpga/sdaccel_setup.sh % source ${XILINX_SDX}/settings64.sh でセット. TVMを,https://docs.tvm.ai/install/from_source.html を参考にビルドする. LLVMがいるみたいなのでLLVM 4.0.1をビルド.CMakeが古いのでCMakeから... % wget https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz % wget http://releases.llvm.org/4.0.1/llvm-4.0.1.src.tar.xz % wget http://releases.llvm.org/4.0.1/cfe-4.0.1.src.tar.xz % tar xvf cfe-4.0.1.src.tar.xz % tar xvf llvm-4.0.1.src.tar.xz % mv cfe-4.0.1.src llvm-4.0.1.src/tools/clang % cd llvm-4.0.1.src % mkdir build; cd build % cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/llvm-4.0.1 ../ % make -j8 && make install % export PATH=$HOME/llvm-4.0.1/bin:$PATH % sudo yum install python36 python36-devel python36-pip % sudo pip3 install numpy decorator で準備してから % git clone --recursive https://github.com/dmlc/tvm % cd tvm % git submodule init % git submodule update % mkdir build % cp cmake/config.cmake build % cd build config.cmake の set(USE_LLVM OFF) set(USE_SDACCEL OFF) set(USE_OPENCL OFF) を set(USE_LLVM ON) set(USE_SDACCEL ON) set(USE_OPENCL ON) に変更して, % cmake .. % make -j8 おわったら, % export TVM_HOME=$HOME/tvm % export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH} で,利用の準備が完了. エミュレーション環境の設定を % emconfigutil --platform ${AWS_PLATFORM} --nd 1 % sudo cp emconfig.json $(dirname $(which python)) build.pyとrun.pyを用意して, % export XCL_EMULATION_MODE=1 % export XCL_TARGET=sw_emu % python3 build.py と実行すると TypeError: string argument without an encoding とエラーが. $TVM_HOME/python/tvm/contrib/sdaccel.py の out_file.write(bytes(code)) を out_file.write(bytes(code, 'UTF-8')) に変更して, python3 build.py myadd.soとかができるので, export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH python3 run.py で実行. [centos@ip-172-31-23-90 tvm-test]$ python3 run.py ERROR: xclProbe-scan failed at fpga_pci_get_all_slot_specs xclProbe found 0 FPGA slots with xocl driver running ERROR: [SDx-EM 08] Please set XCL_EMULATION_MODE to "hw_emu" to run hardware emulation. ERROR: [SDx-EM 09] Please set XCL_EMULATION_MODE to "sw_emu" to run software emulation. ERROR: No devices found [03:43:37] /home/centos/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ... [03:43:37] /home/centos/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ... Traceback (most recent call last): File "run.py", line 17, in a = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx) File "/home/centos/tvm/python/tvm/ndarray.py", line 214, in array return empty(arr.shape, arr.dtype, ctx).copyfrom(arr) File "/home/centos/tvm/python/tvm/_ffi/ndarray.py", line 132, in empty ctypes.byref(handle))) File "/home/centos/tvm/python/tvm/_ffi/base.py", line 314, in check_call raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (2) /home/centos/tvm/build/libtvm.so(TVMArrayAlloc+0x9c) [0x7f743f29588c] [bt] (1) /home/centos/tvm/build/libtvm.so(tvm::runtime::NDArray::Empty(std::vector >, DLDataType, DLContext)+0x1b8) [0x7f743f295758] [bt] (0) /home/centos/tvm/build/libtvm.so(+0xec74c6) [0x7f743f2e74c6] File "/home/centos/tvm/src/runtime/opencl/opencl_device_api.cc", line 123 TVMError: Check failed: context != nullptr: No OpenCL device といわれる.ERRORの通り, export XCL_EMULATION_MODE=sw_emu として実行. [centos@ip-172-31-23-90 tvm-test]$ python3 run.py ERROR: xclProbe-scan failed at fpga_pci_get_all_slot_specs xclProbe found 0 FPGA slots with xocl driver running ERROR: device::load_binary binary target=Bin, no Hw HAL handle Traceback (most recent call last): File "run.py", line 21, in fadd(a, b, c) File "/home/centos/tvm/python/tvm/_ffi/function.py", line 153, in __call__ return f(*args) File "/home/centos/tvm/python/tvm/_ffi/_ctypes/function.py", line 209, in __call__ raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (4) /home/centos/tvm/build/libtvm.so(TVMFuncCall+0x46) [0x7f8d3998ac76] [bt] (3) /home/centos/tvm/build/libtvm.so(+0xed1998) [0x7f8d399fb998] [bt] (2) /home/centos/tvm/build/libtvm.so(+0xed159a) [0x7f8d399fb59a] [bt] (1) /home/centos/tvm/build/libtvm.so(+0xecd99f) [0x7f8d399f799f] [bt] (0) /home/centos/tvm/build/libtvm.so(+0x722392) [0x7f8d3924c392] File "/home/centos/tvm/src/runtime/opencl/opencl_module.cc", line 219 File "/home/centos/tvm/src/runtime/module_util.cc", line 73 TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: err == CL_SUCCESS: OpenCL Error, code=-44: CL_INVALID_PROGRAM [centos@ip-172-31-23-90 tvm-test]$ FPGAささってるマシンじゃないとだめな想定のようにみえる.とりあえず,合成だけでもしておく. % unset XCL_EMULATION_MODE % export XCL_TARGET=hw % python3 build.py とすると,最後に同様のエラーはでるけど,xclbinの合成はできた.おわったら % $SDACCEL_DIR/tools/create_sdaccel_afi.sh \ -xclbin=myadd.xclbin \ -o=myadd \ -s3_bucket=[バケット名] \ -s3_dcp_key=[DCP保存フォルダ名] \ -s3_logs_key=[ログ保存フォルダ名] で,AWS-F1用のイメージ作成処理をキック % cat *_afi_id.txt でFpgaImageIdを確認して, % aws ec2 describe-fpga-images --fpga-image-ids [FpgaImageId] で,Stateがpendingからavailableになったら完了. ::AWS-F1インスタンスでトライ AWS-F1インスタンスを起動して,tvm,llvmなどの一切合切をc4インスタンスからコピーして % sudo -s % source $AWS_FPGA_REPO_DIR/sdaccel_setup.sh % export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH % export TVM_HOME=/home/centos/tvm % export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH} と準備. % export XCL_EMULATION_MODE=sw_emu % export XCL_TARGET=sw_emu % python3 build.py % python3 run.py とすると, [root@ip-172-31-62-53 tvm-test]# python3 run.py xclProbe found 1 FPGA slots with xocl driver running [0.5068266 0.1325183 0.9167701 ... 0.46502367 0.02036605 0.5523464 ] [0.6834595 0.8389502 0.16160455 ... 0.9921764 0.5801108 0.86317337] [0. 0. 0. ... 0. 0. 0.] [1.1902862 0.9714685 1.0783746 ... 1.4572 0.60047686 1.4155197 ] [root@ip-172-31-62-53 tvm-test]# と計算できた(出力用にprintを適当に追加した) % export XCL_EMULATION_MODE=hw_emu % export XCL_TARGET=hw_emu % python3 build.py % python3 run.py では, [root@ip-172-31-62-53 tvm-test]# python3 run.py xclProbe found 1 FPGA slots with xocl driver running [0.64612037 0.24518912 0.6705971 ... 0.75197536 0.02399846 0.12009709] [0.8558938 0.9514521 0.5152762 ... 0.3747665 0.5249482 0.58834535] [0. 0. 0. ... 0. 0. 0.] INFO: [SDx-EM 01] Hardware emulation runs simulation underneath. Using a large data set will result in long simulation times. It is recommended that a small dataset is used for faster execution. This flow does not use cycle accurate models and hence the performance data generated is approximate. [1.5020142 1.1966412 1.1858733 ... 1.1267419 0.5489466 0.70844245] INFO: [SDx-EM 22] [Wall clock time: 04:39, Emulation time: 0.0579385 ms] Data transfer between kernel(s) and global memory(s) myadd_kernel0_1:m_axi_gmem-DDR RD = 8.000 KB WR = 4.000 KB と計算できたみたい. FPGAでは,と, % unset XCL_EMULATION_MODE % export XCL_TARGET=hw % python3 run.py とやってみた. [root@ip-172-31-62-53 tvm-test]# python3 run.py xclProbe found 1 FPGA slots with xocl driver running xclAllocBO ERROR: AllocBO IOCTL failed ERROR: std::bad_alloc ERROR: Operation failed due to earlier error 'std::bad_alloc' Traceback (most recent call last): File "run.py", line 18, in b = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx) File "/home/centos/tvm/python/tvm/ndarray.py", line 214, in array return empty(arr.shape, arr.dtype, ctx).copyfrom(arr) File "/home/centos/tvm/python/tvm/_ffi/ndarray.py", line 254, in copyfrom check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes)) File "/home/centos/tvm/python/tvm/_ffi/base.py", line 314, in check_call raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (2) /home/centos/tvm/build/libtvm.so(TVMArrayCopyFromBytes+0x768) [0x7fe99b3b9368] [bt] (1) /home/centos/tvm/build/libtvm.so(+0xecb6f2) [0x7fe99b4106f2] [bt] (0) /home/centos/tvm/build/libtvm.so(+0x722392) [0x7fe99ac67392] File "/home/centos/tvm/src/runtime/opencl/opencl_device_api.cc", line 171 TVMError: Check failed: e == CL_SUCCESS: OpenCL Error, code=-5: CL_OUT_OF_RESOURCES で,固まってしまった. ためしに,F1インスタンスでも % python3 build.py % python3 run.py としてみたが,かわらず.残念.