Intel® Neural Compute Stick 2 (Intel® NCS2)

用于人工智能推理的即插即用开发套件

  • 在英特尔® Movidius™ Myriad™ X 视觉处理器 (VPU) 之上以优异的每瓦性能和性价比构建并扩展
  • 开始在 Windows® 10、Ubuntu* 或 macOS* 上迅速开发
  • 在通用框架和现成的示例应用上开发
  • 在无云计算依赖性的情况下运行
  • 利用低成本边缘设备(如 Raspberry Pi* 3 和其他 ARM* 主机设备)制作原型


OpenVino 安装 NCS2 驱动

$ source /opt/intel/openvino_2022/setupvars.sh
$ /opt/intel/openvino_2022/install_dependencies/install_NCS_udev_rules.sh

$ ./hello_query_device 
[ INFO ] OpenVINO Runtime version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] Available devices: 
[ INFO ] CPU
[ INFO ]        SUPPORTED_PROPERTIES: 
[ INFO ]                Immutable: AVAILABLE_DEVICES : ""
[ INFO ]                Immutable: RANGE_FOR_ASYNC_INFER_REQUESTS : 1 1 1
[ INFO ]                Immutable: RANGE_FOR_STREAMS : 1 4
[ INFO ]                Immutable: FULL_DEVICE_NAME : Intel(R) Celeron(R) J6413 @ 1.80GHz
[ INFO ]                Immutable: OPTIMIZATION_CAPABILITIES : FP32 FP16 INT8 BIN EXPORT_IMPORT
[ INFO ]                Immutable: CACHE_DIR : ""
[ INFO ]                Mutable: NUM_STREAMS : 1
[ INFO ]                Mutable: AFFINITY : CORE
[ INFO ]                Mutable: INFERENCE_NUM_THREADS : 0
[ INFO ]                Mutable: PERF_COUNT : NO
[ INFO ]                Mutable: INFERENCE_PRECISION_HINT : f32
[ INFO ]                Mutable: PERFORMANCE_HINT : ""
[ INFO ]                Mutable: PERFORMANCE_HINT_NUM_REQUESTS : 0
[ INFO ] 
[ INFO ] GNA
[ INFO ]        SUPPORTED_PROPERTIES: 
[ INFO ]                Immutable: AVAILABLE_DEVICES : GNA_SW
[ INFO ]                Immutable: OPTIMAL_NUMBER_OF_INFER_REQUESTS : 1
[ INFO ]                Immutable: RANGE_FOR_ASYNC_INFER_REQUESTS : 1 1 1
[ INFO ]                Immutable: OPTIMIZATION_CAPABILITIES : INT16 INT8 EXPORT_IMPORT
[ INFO ]                Immutable: FULL_DEVICE_NAME : GNA_SW
[ INFO ]                Immutable: GNA_LIBRARY_FULL_VERSION : 3.0.0.1455
[ INFO ]                Mutable: GNA_SCALE_FACTOR_PER_INPUT : ""
[ INFO ]                Mutable: GNA_FIRMWARE_MODEL_IMAGE : ""
[ INFO ]                Mutable: GNA_DEVICE_MODE : GNA_SW_EXACT
[ INFO ]                Mutable: GNA_HW_EXECUTION_TARGET : UNDEFINED
[ INFO ]                Mutable: GNA_HW_COMPILE_TARGET : UNDEFINED
[ INFO ]                Mutable: GNA_PWL_DESIGN_ALGORITHM : UNDEFINED
[ INFO ]                Mutable: GNA_PWL_MAX_ERROR_PERCENT : 1.000000
[ INFO ]                Mutable: PERFORMANCE_HINT : ""
[ INFO ]                Mutable: INFERENCE_PRECISION_HINT : undefined
[ INFO ]                Mutable: PERFORMANCE_HINT_NUM_REQUESTS : 1
[ INFO ]                Mutable: LOG_LEVEL : LOG_NONE
[ INFO ] 
[ INFO ] GPU
[ INFO ]        SUPPORTED_PROPERTIES: 
[ INFO ]                Immutable: AVAILABLE_DEVICES : 0
[ INFO ]                Immutable: RANGE_FOR_ASYNC_INFER_REQUESTS : 1 2 1
[ INFO ]                Immutable: RANGE_FOR_STREAMS : 1 2
[ INFO ]                Immutable: OPTIMAL_BATCH_SIZE : 1
[ INFO ]                Immutable: MAX_BATCH_SIZE : 1
[ INFO ]                Immutable: FULL_DEVICE_NAME : Intel(R) UHD Graphics [0x4555] (iGPU)
[ INFO ]                Immutable: DEVICE_TYPE : integrated
[ INFO ]                Immutable: DEVICE_GOPS : f16 409.6 f32 204.8 i8 204.8 u8 204.8
[ INFO ]                Immutable: OPTIMIZATION_CAPABILITIES : FP32 BIN FP16
[ INFO ]                Immutable: GPU_DEVICE_TOTAL_MEM_SIZE : 3162546176
[ INFO ]                Immutable: GPU_UARCH_VERSION : 11.0.0
[ INFO ]                Immutable: GPU_EXECUTION_UNITS_COUNT : 16
[ INFO ]                Immutable: GPU_MEMORY_STATISTICS : ""
[ INFO ]                Mutable: PERF_COUNT : NO
[ INFO ]                Mutable: MODEL_PRIORITY : MEDIUM
[ INFO ]                Mutable: GPU_HOST_TASK_PRIORITY : MEDIUM
[ INFO ]                Mutable: GPU_QUEUE_PRIORITY : MEDIUM
[ INFO ]                Mutable: GPU_QUEUE_THROTTLE : MEDIUM
[ INFO ]                Mutable: GPU_ENABLE_LOOP_UNROLLING : YES
[ INFO ]                Mutable: CACHE_DIR : ""
[ INFO ]                Mutable: PERFORMANCE_HINT : ""
[ INFO ]                Mutable: COMPILATION_NUM_THREADS : 4
[ INFO ]                Mutable: NUM_STREAMS : 1
[ INFO ]                Mutable: PERFORMANCE_HINT_NUM_REQUESTS : 0
[ INFO ]                Mutable: DEVICE_ID : 0
[ INFO ] 
[ INFO ] MYRIAD
[ INFO ]        SUPPORTED_PROPERTIES: 
[ INFO ]                Mutable: AVAILABLE_DEVICES : 1.4-ma2480
[ INFO ]                Mutable: FULL_DEVICE_NAME : Intel Movidius Myriad X VPU
[ INFO ]                Mutable: OPTIMIZATION_CAPABILITIES : EXPORT_IMPORT FP16
[ INFO ]                Mutable: RANGE_FOR_ASYNC_INFER_REQUESTS : 3 6 1
[ INFO ]                Mutable: DEVICE_THERMAL : EMPTY VALUE
[ INFO ]                Mutable: DEVICE_ARCHITECTURE : MYRIAD
[ INFO ]                Mutable: NUM_STREAMS : AUTO
[ INFO ]                Mutable: PERFORMANCE_HINT : ""
[ INFO ]                Mutable: PERFORMANCE_HINT_NUM_REQUESTS : 0
[ INFO ]


测试


用自己的yolov5s模型测试报错

$ ./benchmark_app -m ~/model/model.xml -i ~/model/1.jpg -d "MULTI:MYRIAD,GPU,CPU"
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] For input  1 files were added. 
[Step 2/11] Loading Inference Engine
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] Device info: 
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] GPU
[ INFO ] Intel GPU plugin version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] MULTI
[ INFO ] MultiDevicePlugin version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] MYRIAD
[ INFO ] openvino_intel_myriad_plugin version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(MYRIAD) performance hint will be set to THROUGHPUT.
[ WARNING ] Performance hint was not explicitly specified in command line. Device(GPU) performance hint will be set to THROUGHPUT.
[ WARNING ] GPU throttling is turned on. Multi-device execution with the CPU + GPU performs best with GPU throttling hint, which releases another CPU thread (that is otherwise used by the GPU driver for active polling).
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 29.29 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Number of test configurations is calculated basing on number of input images
[ WARNING ] images: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
    images (node: images) : u8 / [N,C,H,W]
Network outputs:
    output (node: output) : f32 / [...]
    712 (node: 712) : f32 / [...]
    732 (node: 732) : f32 / [...]
[Step 7/11] Loading the model to the device
W: [global] [    552256] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
[ INFO ] Load network took 18661.76 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: MYRIAD
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 12 }
[ INFO ]   { MULTI_DEVICE_PRIORITIES , MYRIAD,GPU,CPU }
[ INFO ] Device: GPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 12 }
[ INFO ]   { MULTI_DEVICE_PRIORITIES , MYRIAD,GPU,CPU }
[ INFO ] Device: CPU
[ INFO ]   { NETWORK_NAME , torch-jit-export }
[ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 12 }
[ INFO ]   { MULTI_DEVICE_PRIORITIES , MYRIAD,GPU,CPU }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] Some image input files will be duplicated: 12 files are required but only 1 are provided
[ WARNING ] Image is resized from (1280, 720) to (1280, 1280)
W: [global] [    553256] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
W: [global] [    554256] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
W: [global] [    555255] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
W: [global] [    556255] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
W: [global] [    557255] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
[ INFO ] Test Config 0
[ INFO ] images  ([N,C,H,W], u8, {1, 3, 1280, 1280}, static):   /home/aibox/model/1.jpg
[Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 60000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
W: [global] [    558254] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
                                                                                                                                                         
[ INFO ] First inference took 856.61 ms
W: [global] [    559254] [WatchdogThread] XLinkWriteDataWithTimeout:185 XLinkWriteDataWithTimeout is not fully supported yet. The XLinkWriteData method is called instead. Desired timeout = 12000
...

报错:经之后验证似乎是benchmark_app的问题,并不是模型的问题。

注意: NCS2只支持FP16

使用NCS2没有跑出正确结果


测试官方文档上的例子

$ git clone --depth 1 https://github.com/openvinotoolkit/open_model_zoo
$ cd open_model_zoo/tools/model_tools
$ python3 -m pip install --upgrade pip
$ python3 -m pip install -r requirements.in
$ python3 downloader.py --name squeezenet1.1
$ python3 converter.py --name squeezenet1.1

$ ~/inference_engine_cpp_samples_build/intel64/Release/hello_classification ~/open_model_zoo/tools/model_tools/public/squeezenet1.1/FP32/squeezenet1.1.xml ~/model/1.jpg MYRIAD
[ INFO ] OpenVINO Runtime version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] Loading model files: /home/aibox/open_model_zoo/tools/model_tools/public/squeezenet1.1/FP32/squeezenet1.1.xml
[ INFO ] model name: squeezenet1.1
[ INFO ]     inputs
[ INFO ]         input name: data
[ INFO ]         input type: f32
[ INFO ]         input shape: {1, 3, 227, 227}
[ INFO ]     outputs
[ INFO ]         output name: prob
[ INFO ]         output type: f32
[ INFO ]         output shape: {1, 1000, 1, 1}

Top 10 results:

Image /home/aibox/model/1.jpg

classid probability
------- -----------
619     0.2119141  
641     0.1959229  
922     0.1434326  
719     0.0561829  
415     0.0392151  
950     0.0392151  
692     0.0234222  
721     0.0191193  
988     0.0153503  
964     0.0146637
$ python3 downloader.py --name mobilenet-v2
$ python3 converter.py --name mobilenet-v2

$ ~/inference_engine_cpp_samples_build/intel64/Release/hello_classification ~/open_model_zoo/tools/model_tools/public/mobilenet-v2/FP32/mobilenet-v2.xml ~/model/1.jpg MYRIAD
[ INFO ] OpenVINO Runtime version ......... 2022.1.0
[ INFO ] Build ........... 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] 
[ INFO ] Loading model files: /home/aibox/open_model_zoo/tools/model_tools/public/mobilenet-v2/FP32/mobilenet-v2.xml
[ INFO ] model name: MOBILENET_V2
[ INFO ]     inputs
[ INFO ]         input name: data
[ INFO ]         input type: f32
[ INFO ]         input shape: {1, 3, 224, 224}
[ INFO ]     outputs
[ INFO ]         output name: prob
[ INFO ]         output type: f32
[ INFO ]         output shape: {1, 1000, 1, 1}

Top 10 results:

Image /home/aibox/model/1.jpg

classid probability
------- -----------
725     0.0995483  
953     0.0653076  
954     0.0500793  
700     0.0444946  
729     0.0383606  
934     0.0344238  
926     0.0308380  
411     0.0287476  
618     0.0236359  
923     0.0223999


参考NCS2官网

Get Started with Intel® Neural Compute Stick 2

Note This article is based on the 2019 R1 release of the Intel® Distribution of OpenVINO™ toolkit.

2019.R1已经下载不到了,资料陈旧!!!没有太大参考意义

树莓派openvinotoolkit下载仓库 需要梯子 2022.?开始已经不提供raspbian版本