Visionworks openvx

[TOC]

Visionworks OpenVX

OpenVX

heterogeneous computation framework

Spec

OpenVX 1.2源碼解析 — 目錄結構

除了官方的參考實作外,下方是不同廠商的實作,有些有開放原始碼有些則是包裝程動態函式庫.

  1. Intel Computer Vision SDK
  2. AMD OVX : https://github.com/GPUOpen-ProfessionalCompute-Libraries/amdovx-core –>
  3. TI OVX:
  4. Nvidia Vision Works:

以上是有通過conformance test的廠商,另外ARM 也有類似的SDK(compute library)而且初期開發時在架構上也是參考OpenVX。

  1. ARM compute library:

雖然一開始OpenVX是針對電腦視覺運算設計的軟體框架,但由於類神經網路的編程模式(programming model)跟熱門程度讓Khronos OpenVX工作小組也特別訂定了Neural Network Extension使得OpenVX也加入了深度學習的戰場。

VisionWorks

NVIDIA VisionWorks toolkit is a software development package for computer vision (CV) and image processing. VisionWorks™ implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling developers to realize CV applications on a scalable and flexible platform.

VisionWorks includes the following primitives:

IMAGE ARITHMETIC

FLOW & DEPTH

GEOMETRIC TRANSFORMS

FILTERS

FEATURES

ANALYSIS

OpenVX for us

Requirements

User defined processing

Yes. user node, base it on the Advanced Tiling Extensions (see the Intel’s Extensions to the OpenVX* API: Advanced Tiling chapter)

Support optimization of duplicate processing

ref:

optimization tips

Open source framework (if available)

OpenVino

Requirements

Software Requirements

A Windows build environment needs these components:

Get the Software

Your license includes the full version of the product. To access the toolkit:

  1. Make sure your system meets the minimum requirements listed on this page.
  2. Complete the registration form.
  3. Download the product.

Register & Download

AMD OpenVX

Features

Pre-requisites

Build Instructions

Build this project to generate AMD OpenVX library and RunVX executable.

Build using Visual Studio Professional 2013 on 64-bit Windows 10/8.1/7

Test

Download to C:\Users\test\Downloads

Build SW according to guidelines, especially

Demo

C:\Users\test\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx exa
mples\gdf\canny.gdf

***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****

runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL,  PASS,     1,      ,  8.60,  8.60,  0.00,  0.00,  0.00,  0.00 (medi
an 8.598)
> total elapsed time:   0.11 sec
Abort: Press any key to exit...

1571127470266

canny.gdf

# create input and output images
data input  = image:480,360,RGB2
data output = image:480,360,U008

# specify input source for input image and request for displaying input and output images
read input  examples/images/face1.jpg
view input  inputWindow
view output edgesWindow

# compute luma image channel from input RGB image
data yuv  = image-virtual:0,0,IYUV
data luma = image-virtual:0,0,U008
node org.khronos.openvx.color_convert input yuv
node org.khronos.openvx.channel_extract yuv !CHANNEL_Y luma

# compute edges in luma image using Canny edge detector
data hyst = threshold:RANGE,UINT8:INIT,80,100
data gradient_size = scalar:INT32,3
node org.khronos.openvx.canny_edge_detector luma hyst gradient_size !NORM_L1 output

graph TB
input --> |color_convert| yuv
yuv --> |channel_extract| luma
luma --> |merge| merged
hyst --> merged
gradient_size --> merged
merged --> |canny_edge_detector| output

runvx

usage

C:\Users\test\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx

***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****

runvx.exe 0.9.7

Usage:
  runvx.exe [options] [file] <file.gdf> [argument(s)]
  runvx.exe [options] node <kernelName> [argument(s)]
  runvx.exe [options] shell [argument(s)]

The argument(s) are data objects created using <data-description> syntax.
These arguments can be accessed from inside GDF as $1, $2, etc.

The available command-line options are:
  -h
      Show full help.
  -v
      Turn on verbose logs.
  -root:<directory>
      Replace ~ in filenames with <directory> in the command-line and
      GDF file. The default value of '~' is current working directory.
  -frames:[<start>:]<end>|eof|live
      Run the graph/node for specified frames or until eof or just as live.
      Use live to indicate that input is live until aborted by user.
  -affinity:CPU|GPU[<device-index>]
      Set context affinity to CPU or GPU.
  -dump-profile
      Print performance profiling information after graph launch.
  -enable-profile
      use directive VX_DIRECTIVE_AMD_ENABLE_PROFILE_CAPTURE when graph is create
d
  -discard-compare-errors
      Continue graph processing even if compare mismatches occur.
  -disable-virtual
      Replace all virtual data types in GDF with non-virtual data types.
      Use of this flag (i.e. for debugging) can make a graph run slower.

dump profile

C:\Users\test\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx -du
mp-profile examples\gdf\canny.gdf

***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****

runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL,  PASS,     1,      ,  8.62,  8.62,  0.00,  0.00,  0.00,  0.00 (medi
an 8.621)
> total elapsed time:   0.07 sec
> graph profile:
 COUNT,tmp(ms),avg(ms),min(ms),max(ms),DEV,KERNEL
     1,  8.621,  8.621,  8.621,  8.621,CPU,GRAPH
     1,  1.196,  1.196,  1.196,  1.196,CPU,com.amd.openvx.ColorConvert_Y_RGB
     1,  4.905,  4.905,  4.905,  4.905,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
     1,  2.305,  2.305,  2.305,  2.305,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
     1,  0.208,  0.208,  0.208,  0.208,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY

Abort: Press any key to exit...

Test if CSE works

input

# create input and output images
data input  = image:480,360,RGB2
data output = image:480,360,U008
data output2 = image:480,360,U008

# specify input source for input image and request for displaying input and output images
read input  examples/images/face1.jpg
view input  inputWindow
view output edgesWindow

# compute luma image channel from input RGB image
data yuv  = image-virtual:0,0,IYUV
data yuv2  = image-virtual:0,0,IYUV
data luma = image-virtual:0,0,U008
data luma2 = image-virtual:0,0,U008
node org.khronos.openvx.color_convert input yuv
node org.khronos.openvx.color_convert input yuv2
node org.khronos.openvx.channel_extract yuv !CHANNEL_Y luma
node org.khronos.openvx.channel_extract yuv2 !CHANNEL_Y luma2

# compute edges in luma image using Canny edge detector
data hyst = threshold:RANGE,UINT8:INIT,80,100
data gradient_size = scalar:INT32,3
node org.khronos.openvx.canny_edge_detector luma hyst gradient_size !NORM_L1 output
node org.khronos.openvx.canny_edge_detector luma2 hyst gradient_size !NORM_L1 output2

Output

C:\Users\test\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2>runvx -du
mp-profile examples\gdf\canny.gdf

***** VIDEOINPUT LIBRARY - 0.1995 - TFW07 *****

runvx.exe 0.9.7
OK: using AMD OpenVX 0.9.7
OK: enabled graph scheduling in separate threads
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms
,clread-ms
OK: capturing 480x360 image(s) into 480x360 RGB image buffer
csv,OVERALL,  PASS,     1,      , 17.13, 17.13,  0.00,  0.00,  0.00,  0.00 (medi
an 17.127)
> total elapsed time:   0.07 sec
> graph profile:
 COUNT,tmp(ms),avg(ms),min(ms),max(ms),DEV,KERNEL
     1, 17.127, 17.127, 17.127, 17.127,CPU,GRAPH
     1,  1.202,  1.202,  1.202,  1.202,CPU,com.amd.openvx.ColorConvert_Y_RGB
     1,  1.192,  1.192,  1.192,  1.192,CPU,com.amd.openvx.ColorConvert_Y_RGB
     1,  4.857,  4.857,  4.857,  4.857,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
     1,  4.838,  4.838,  4.838,  4.838,CPU,com.amd.openvx.CannySobel_U16_U8_3x3_
L1NORM
     1,  2.312,  2.312,  2.312,  2.312,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
     1,  2.302,  2.302,  2.302,  2.302,CPU,com.amd.openvx.CannySuppThreshold_U8X
Y_U16_3x3
     1,  0.209,  0.209,  0.209,  0.209,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY

     1,  0.207,  0.207,  0.207,  0.207,CPU,com.amd.openvx.CannyEdgeTrace_U8_U8XY

Abort: Press any key to exit...


Q: Why CSE not work?

TODO:

API

//vx_api.h
VX_API_ENTRY vx_graph VX_API_CALL vxCreateGraph(vx_context context);
VX_API_ENTRY vx_status VX_API_CALL vxVerifyGraph(vx_graph graph);
VX_API_ENTRY vx_status VX_API_CALL vxProcessGraph(vx_graph graph);
VX_API_ENTRY vx_image VX_API_CALL vxCreateVirtualImage(vx_graph graph, vx_uint32 width, vx_uint32 height, vx_df_image color);

//vx_node.h
VX_API_ENTRY vx_node VX_API_CALL vxColorConvertNode(vx_graph graph, vx_image input, vx_image output);

OpenCV G-API

Intro

G-API Intro

Features

API

//core.hpp
GAPI_EXPORTS GMat resize(const GMat& src, const Size& dsize, double fx = 0, double fy = 0, int interpolation = INTER_LINEAR);

//GComputation.hpp
class GComputation{
    ...
    GComputation(GProtoInputArgs &&ins,
                 GProtoOutputArgs &&outs);             // Arg-to-arg overload
	void apply(GRunArgs &&ins, GRunArgsP &&outs, GCompileArgs &&args = {});
...
}

implementation

of G-API apply function

GComputation -> GComputation2: apply
GComputation2 -> GCompiler: compile
GCompiler -> Graph: build graph
Graph --> GComputation2: return ade::Graph
GComputation2 -> Graph: exec the graph

ref:

Vision grab post processing

Study if OpenVINO or OpenCV supports

Lib CSE partially inputs
OpenVINO x x
AMDOVX x x
OpenCV G-API x x
Intel TBB x v
behavior: the ready nodes are called then exit
Code: C:\test\codes\lua\src\tbbtest\test_tbb_behavior.cpp
Tensorflow v  

TODO

Test if can be called multiples like following

while true
    modify input
    vxProcessGraph()

ref: http://projects.eees.dei.unibo.it/adrenaline/tutorial-02-execute-openvx-examples/

OpenVX讀書筆記

summary

  high level low level
ovx strong typed
eg VX_API_ENTRY vx_node VX_API_CALL vxColorConvertNode(vx_graph graph, vx_image input, vx_image output);
weak typed, eg
OpenVX.dll!agoCreateNode(_vx_graph * graph, int kernel_id)
tbb strong typed
make_edge(tbb::flow::output_port<1>(gpu_slm_split_n), tbb::flow::input_port<1>(gpu_slm_mat_mult_n))
tbb::flow::function_node< validation_args_type > mat_validation_n(g, tbb::flow::unlimited, {
// Get references to matrixes
const tbb::flow::gfx_buffer& GPU_SLM_MAT = std::get<0>(result);
const tbb::flow::gfx_buffer& CPU_SLM_MAT = std::get<1>(result);
const tbb::flow::gfx_buffer& CPU_NAIVE_MAT = std::get<2>(result);

// Verify results
// Check that slm algorithm produces correct results on CPU:
validate_mat("matrix multiply: 'SLM' CPU vs. CPU", SIZE_Y, SIZE_X, CPU_SLM_MAT.data(), CPU_NAIVE_MAT.data());
// Verify Gen results:
validate_mat("matrix multiply: SLM Gen vs. CPU", SIZE_Y, SIZE_X, GPU_SLM_MAT.data(), CPU_NAIVE_MAT.data());
});
Not sure
G-API strong typed TODO

// ovx: \vis_bep_12\C\Users\test\Downloads\amdovx-core-0.9-beta2\amdovx-core-0.9-beta2 // tbb: C:\Users\test\Downloads\tbb2017_20170604oss_win\tbb2017_20170604oss

How to register Kernel

Define a enum

VX_KERNEL_COLOR_CONVERT = VX_KERNEL_BASE(VX_ID_KHRONOS, VX_LIBRARY_KHR_BASE) + 0x1,

Registrtion

OVX_KERNEL_ENTRY( VX_KERNEL_COLOR_CONVERT         , ColorConvert, "color_convert",             AIN_AOUT,             ATYPE_II           , false ), 

the parameters meaning

#define OVX_KERNEL_ENTRY(kernel_id,name,kname,argCfg,argType,validRectReset) \
#define ATYPE_II                               { VX_TYPE_IMAGE, VX_TYPE_IMAGE }

Implement “DramaDivideNode” operation, it is used to select the best suited for this PC architecture

int agoDramaDivideNode(AgoNodeList * nodeList, AgoNode * anode)
{
	// save parameter list
	vx_uint32 paramCount = anode->paramCount;
	AgoData * paramList[AGO_MAX_PARAMS]; memcpy(paramList, anode->paramList, sizeof(paramList));
	// divide the node depending on the type
	int status = -1;
	switch (anode->akernel->id)
	{
		case VX_KERNEL_COLOR_CONVERT:
			status = agoDramaDivideColorConvertNode(nodeList, anode);
			break;

the function is called by optimize function

>	OpenVX.dll!agoCreateNode(_vx_graph * graph, int kernel_id) Line 2699	C++
 	OpenVX.dll!agoDramaDivideAppend(AgoNodeList * nodeList, _vx_node * anode, int new_kernel_id, _vx_reference * * paramList, unsigned int paramCount) Line 37	C++
 	OpenVX.dll!agoDramaDivideAppend(AgoNodeList * nodeList, _vx_node * anode, int new_kernel_id) Line 56	C++
 	OpenVX.dll!agoDramaDivideColorConvertNode(AgoNodeList * nodeList, _vx_node * anode) Line 244	C++
 	OpenVX.dll!agoDramaDivideNode(AgoNodeList * nodeList, _vx_node * anode) Line 1818	C++
 	OpenVX.dll!agoOptimizeDramaDivide(_vx_graph * agraph) Line 1962	C++
 	OpenVX.dll!agoOptimizeDrama(_vx_graph * agraph) Line 522	C++
 	OpenVX.dll!agoOptimizeGraph(_vx_graph * agraph) Line 209	C++
 	OpenVX.dll!vxVerifyGraph(_vx_graph * graph) Line 2450	C++
 	runvx.exe!CVxEngine::ProcessGraph(std::vector<char const *,std::allocator<char const *> > * graphNameList, unsigned __int64 beginIndex) Line 285	C++

How to schedule graph?

What optimization is done in optimize()?

Choose the best

Powered by Jekyll and Theme by solid

本站总访问量