- 追加された行はこのように表示されます。
- 削除された行は
このように表示されます。
!SeeDotで遊んでみる
::STM32F407 Discovery KitをArduinoで使う.
* Arduino IDEをインストール
* ボードマネージャのURLを追加 https://raw.githubusercontent.com/stm32duino/BoardManagerFiles/master/STM32/package_stm_index.json
* ツール → ボードマネージャ で,STM32 Coreをインストール
* ボードでDiscoveryを,Board part numberでSTM32F407G-DISC1を選択
* サンプルの 01.Basics→Blinkが動作するのを確認する(LD4がチカチカする)
::SeeDotを使う
仮想環境用意して,https://github.com/microsoft/EdgeML/tree/master/Tools/SeeDotに書いてある手順で実行する.
* venvを用意
python3 -m venv EdgeML
source ./EdgeML/bin/activate
* GitHubからcloneして環境準備
git clone https://github.com/Microsoft/EdgeML
cd EdgeML/tf/
pip install -r requirements-cpu.txt
pip install -e .
* usps10でProtoNNを学習
cd examples/ProtoNN
python fetch_usps.py
python process_usps.py
mkdir usps10/output
python protoNN_example.py --data-dir ./usps10 --projection-dim 25 --num-prototypes 55 --epochs 100 -sW 0.3 -o usps10/output
*Arudino向けにビルド
cd ../../../Tools/SeeDot
mkdir arduino
python SeeDot.py -a protonn --train ../../tf/examples/ProtoNN/usps10/train.npy --test ../../tf/examples/ProtoNN/usps10/test.npy --model ../../tf/examples/ProtoNN/usps10/output -o arduino
** こんな感じですすむ
(EdgeML) miyo@tama:% python SeeDot.py -a protonn --train ../../tf/examples/ProtoNN/usps10/train.npy --test ../../tf/examples/ProtoNN/usps10/test.npy --model ../../tf/examples/ProtoNN/usps10/output -o arduino
================================
Executing on protonn for Arduino
--------------------------------
Train file: ../../tf/examples/ProtoNN/usps10/train.npy
Test file: ../../tf/examples/ProtoNN/usps10/test.npy
Model directory: ../../tf/examples/ProtoNN/usps10/output
================================
-----------------------
Collecting profile data
-----------------------
Generating input files for float training dataset...done
Build...success
...
Accuracy is 89.985%
------------------------------
Generating code for arduino...
------------------------------
Generating input files for fixed testing dataset...done
Generating code...completed
Arduino sketch dumped in the folder arduino
** できたもの
(EdgeML) miyo@tama:% ls arduino
arduino.ino config.h input/ library.h model.h predict.cpp predict.h
::ボードとの接続
* UART2を使う.PA2がTX, PA3がRX
HardwareSerial Serial1(USART2);
** F407上のシリアル通信について - ~/.arduino15/packages/STM32/hardware/stm32/1.5.0/variants/DISCO_F407VG/PeripheralPins.c
* なんかでた
4596: Predicted label: 9; True label: 9; Correct prediction
4597: Predicted label: 9; True label: 9; Correct prediction
4598: Predicted label: 9; True label: 9; Correct prediction
4599: Predicted label: 9; True label: 9; Correct prediction
4600: Predicted label: 9; True label: 9; Correct prediction
------------------------
Average prediction time:
1380.68
------------------------
4601: Predicted label: 9; True label: 9; Correct prediction
4602: Predicted label: 9; True label: 9; Correct prediction
4603: Predicted label: 9; True label: 9; Correct prediction
* ソースコードによると,micros()を使って測定した値をイテレーション回数で割ってるみたい
** 入力データはflash上に用意されているデータ
** 1イテレーションあたり1380.68u秒で推論できてるよ,ってことみたい.
* Accuracy Modeにすると,シリアルから入力を受けつけて推論できるみたい
::メモ
* usps - https://www.kaggle.com/bistaumanga/usps-dataset
!TVM/AWS-F1で遊んでみた(失敗)
https://github.cfom/dmlc/tvm/blob/master/docs/deploy/aws_fpga.md をやってみる.
まだうまくいってない.
::やってみたこと
ビルド用にセットアップしたc4.4xlargeマシンにログインして,AWS-F1用の環境変数を
% source src/project_data/aws-fpga/sdaccel_setup.sh
% source ${XILINX_SDX}/settings64.sh
でセット.
TVMを,https://docs.tvm.ai/install/from_source.html を参考にビルドする.
LLVMがいるみたいなのでLLVM 4.0.1をビルド.CMakeが古いのでCMakeから...
% wget https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
% wget http://releases.llvm.org/4.0.1/llvm-4.0.1.src.tar.xz
% wget http://releases.llvm.org/4.0.1/cfe-4.0.1.src.tar.xz
% tar xvf cfe-4.0.1.src.tar.xz
% tar xvf llvm-4.0.1.src.tar.xz
% mv cfe-4.0.1.src llvm-4.0.1.src/tools/clang
% cd llvm-4.0.1.src
% mkdir build; cd build
% cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/llvm-4.0.1 ../
% make -j8 && make install
% export PATH=$HOME/llvm-4.0.1/bin:$PATH
% sudo yum install python36 python36-devel python36-pip
% sudo pip3 install numpy decorator
で準備してから
% git clone --recursive https://github.com/dmlc/tvm
% cd tvm
% git submodule init
% git submodule update
% mkdir build
% cp cmake/config.cmake build
% cd build
config.cmake の
set(USE_LLVM OFF)
set(USE_SDACCEL OFF)
set(USE_OPENCL OFF)
を
set(USE_LLVM ON)
set(USE_SDACCEL ON)
set(USE_OPENCL ON)
に変更して,
% cmake ..
% make -j8
おわったら,
% export TVM_HOME=$HOME/tvm
% export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH}
で,利用の準備が完了.
エミュレーション環境の設定を
% emconfigutil --platform ${AWS_PLATFORM} --nd 1
% sudo cp emconfig.json $(dirname $(which python))
build.pyとrun.pyを用意して,
% export XCL_EMULATION_MODE=1
% export XCL_TARGET=sw_emu
% python3 build.py
と実行すると
TypeError: string argument without an encoding
とエラーが.
$TVM_HOME/python/tvm/contrib/sdaccel.py の
out_file.write(bytes(code))
を
out_file.write(bytes(code, 'UTF-8'))
に変更して,
python3 build.py
myadd.soとかができるので,
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
python3 run.py
で実行.
[centos@ip-172-31-23-90 tvm-test]$ python3 run.py
ERROR: xclProbe-scan failed at fpga_pci_get_all_slot_specs
xclProbe found 0 FPGA slots with xocl driver running
ERROR: [SDx-EM 08] Please set XCL_EMULATION_MODE to "hw_emu" to run hardware emulation.
ERROR: [SDx-EM 09] Please set XCL_EMULATION_MODE to "sw_emu" to run software emulation.
ERROR: No devices found
[03:43:37] /home/centos/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
[03:43:37] /home/centos/tvm/src/runtime/opencl/opencl_device_api.cc:263: No OpenCL platform matched given existing options ...
Traceback (most recent call last):
File "run.py", line 17, in <module>
a = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
File "/home/centos/tvm/python/tvm/ndarray.py", line 214, in array
return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
File "/home/centos/tvm/python/tvm/_ffi/ndarray.py", line 132, in empty
ctypes.byref(handle)))
File "/home/centos/tvm/python/tvm/_ffi/base.py", line 314, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (2) /home/centos/tvm/build/libtvm.so(TVMArrayAlloc+0x9c) [0x7f743f29588c]
[bt] (1) /home/centos/tvm/build/libtvm.so(tvm::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DLDataType, DLContext)+0x1b8) [0x7f743f295758]
[bt] (0) /home/centos/tvm/build/libtvm.so(+0xec74c6) [0x7f743f2e74c6]
File "/home/centos/tvm/src/runtime/opencl/opencl_device_api.cc", line 123
TVMError: Check failed: context != nullptr: No OpenCL device
といわれる.ERRORの通り,
export XCL_EMULATION_MODE=sw_emu
として実行.
[centos@ip-172-31-23-90 tvm-test]$ python3 run.py
ERROR: xclProbe-scan failed at fpga_pci_get_all_slot_specs
xclProbe found 0 FPGA slots with xocl driver running
ERROR: device::load_binary binary target=Bin, no Hw HAL handle
Traceback (most recent call last):
File "run.py", line 21, in <module>
fadd(a, b, c)
File "/home/centos/tvm/python/tvm/_ffi/function.py", line 153, in __call__
return f(*args)
File "/home/centos/tvm/python/tvm/_ffi/_ctypes/function.py", line 209, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (4) /home/centos/tvm/build/libtvm.so(TVMFuncCall+0x46) [0x7f8d3998ac76]
[bt] (3) /home/centos/tvm/build/libtvm.so(+0xed1998) [0x7f8d399fb998]
[bt] (2) /home/centos/tvm/build/libtvm.so(+0xed159a) [0x7f8d399fb59a]
[bt] (1) /home/centos/tvm/build/libtvm.so(+0xecd99f) [0x7f8d399f799f]
[bt] (0) /home/centos/tvm/build/libtvm.so(+0x722392) [0x7f8d3924c392]
File "/home/centos/tvm/src/runtime/opencl/opencl_module.cc", line 219
File "/home/centos/tvm/src/runtime/module_util.cc", line 73
TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: err == CL_SUCCESS: OpenCL Error, code=-44: CL_INVALID_PROGRAM
[centos@ip-172-31-23-90 tvm-test]$
FPGAささってるマシンじゃないとだめな想定のようにみえる.とりあえず,合成だけでもしておく.
% unset XCL_EMULATION_MODE
% export XCL_TARGET=hw
% python3 build.py
とすると,最後に同様のエラーはでるけど,xclbinの合成はできた.おわったら
% $SDACCEL_DIR/tools/create_sdaccel_afi.sh \
-xclbin=myadd.xclbin \
-o=myadd \
-s3_bucket=[バケット名] \
-s3_dcp_key=[DCP保存フォルダ名] \
-s3_logs_key=[ログ保存フォルダ名]
で,AWS-F1用のイメージ作成処理をキック
% cat *_afi_id.txt
でFpgaImageIdを確認して,
% aws ec2 describe-fpga-images --fpga-image-ids [FpgaImageId]
で,Stateがpendingからavailableになったら完了.
::AWS-F1インスタンスでトライ
AWS-F1インスタンスを起動して,tvm,llvmなどの一切合切をc4インスタンスからコピーして
% sudo -s
% source $AWS_FPGA_REPO_DIR/sdaccel_setup.sh
% export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
% export TVM_HOME=/home/centos/tvm
% export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH}
と準備.
% export XCL_EMULATION_MODE=sw_emu
% export XCL_TARGET=sw_emu
% python3 build.py
% python3 run.py
とすると,
[root@ip-172-31-62-53 tvm-test]# python3 run.py
xclProbe found 1 FPGA slots with xocl driver running
[0.5068266 0.1325183 0.9167701 ... 0.46502367 0.02036605 0.5523464 ]
[0.6834595 0.8389502 0.16160455 ... 0.9921764 0.5801108 0.86317337]
[0. 0. 0. ... 0. 0. 0.]
[1.1902862 0.9714685 1.0783746 ... 1.4572 0.60047686 1.4155197 ]
[root@ip-172-31-62-53 tvm-test]#
と計算できた(出力用にprintを適当に追加した)
% export XCL_EMULATION_MODE=hw_emu
% export XCL_TARGET=hw_emu
% python3 build.py
% python3 run.py
では,
[root@ip-172-31-62-53 tvm-test]# python3 run.py
xclProbe found 1 FPGA slots with xocl driver running
[0.64612037 0.24518912 0.6705971 ... 0.75197536 0.02399846 0.12009709]
[0.8558938 0.9514521 0.5152762 ... 0.3747665 0.5249482 0.58834535]
[0. 0. 0. ... 0. 0. 0.]
INFO: [SDx-EM 01] Hardware emulation runs simulation underneath. Using a large data set will result in long simulation times. It is recommended that a small dataset is used for faster execution. This flow does not use cycle accurate models and hence the performance data generated is approximate.
[1.5020142 1.1966412 1.1858733 ... 1.1267419 0.5489466 0.70844245]
INFO: [SDx-EM 22] [Wall clock time: 04:39, Emulation time: 0.0579385 ms] Data transfer between kernel(s) and global memory(s)
myadd_kernel0_1:m_axi_gmem-DDR RD = 8.000 KB WR = 4.000 KB
と計算できたみたい.
FPGAでは,と,
% unset XCL_EMULATION_MODE
% export XCL_TARGET=hw
% python3 run.py
とやってみた.
[root@ip-172-31-62-53 tvm-test]# python3 run.py
xclProbe found 1 FPGA slots with xocl driver running
xclAllocBO ERROR: AllocBO IOCTL failed
ERROR: std::bad_alloc
ERROR: Operation failed due to earlier error 'std::bad_alloc'
Traceback (most recent call last):
File "run.py", line 18, in <module>
b = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
File "/home/centos/tvm/python/tvm/ndarray.py", line 214, in array
return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
File "/home/centos/tvm/python/tvm/_ffi/ndarray.py", line 254, in copyfrom
check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
File "/home/centos/tvm/python/tvm/_ffi/base.py", line 314, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (2) /home/centos/tvm/build/libtvm.so(TVMArrayCopyFromBytes+0x768) [0x7fe99b3b9368]
[bt] (1) /home/centos/tvm/build/libtvm.so(+0xecb6f2) [0x7fe99b4106f2]
[bt] (0) /home/centos/tvm/build/libtvm.so(+0x722392) [0x7fe99ac67392]
File "/home/centos/tvm/src/runtime/opencl/opencl_device_api.cc", line 171
TVMError: Check failed: e == CL_SUCCESS: OpenCL Error, code=-5: CL_OUT_OF_RESOURCES
で,固まってしまった.
ためしに,F1インスタンスでも
% python3 build.py
% python3 run.py
としてみたが,かわらず.残念.