CN110163116A

CN110163116A - Method by accelerating OpenPose reasoning to obtain human body attitude

Info

Publication number: CN110163116A
Application number: CN201910347091.9A
Authority: CN
Inventors: 张德园; 王俊远; 石祥滨; 刘芳; 武卫东; 刘翠微; 李照奎; 吴杰宏; 毕静; 颜卓; 李浩文; 代海龙; 杨啸宇
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Tuwei Technology Co ltd
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-08-23

Abstract

The invention discloses a kind of methods by accelerating OpenPose reasoning to obtain human body attitude, include the following steps: S1: obtaining the video flowing comprising human body attitude information of input by OpenCV, and obtain single-frame images；S2: handling single-frame images, obtains the input data of Optimized model；S3: model structure reconstruct；S4: the precision of OpenPose Model Parameter is reduced；S5: optimum results are obtained；S6: output data is obtained；S7: human body attitude data are obtained.This method is reconstructed OpenPose network structure using TensorRT, and optimize the precision of network parameter, obtain that inference speed is fast, the accurate Optimized model of the reasoning results, using the Optimized model good basis can be laid in practical application deployment for model with quick obtaining human body attitude data.

Description

Method by accelerating OpenPose reasoning to obtain human body attitude

Technical field

The present invention relates to computer science and depth learning technology field, specifically provide a kind of by accelerating OpenPose The method that reasoning obtains human body attitude accelerates net by reconstructing the network structure of depth model and reducing the precision of model parameter The speed of network reasoning.

Background technique

Currently based on deep learning application present explosive growth, image recognition, speech recognition, natural language processing, The functions such as image retrieval have become the daily necessary tool of people, thereupon to the Reasoning Efficiency of deep learning and response speed Then more stringent requirements are proposed for degree；Deep learning is divided into training and deployment two parts need to disappear under trained operation is general online A large amount of GPU is consumed, a bigger batchsize in contrast can be generally given, because its requirement of real-time is relatively low, What general training model was given is 128, can adequately utilize GPU equipment.But to just difference, reasoning only need when reasoning A forward calculation is done, input is obtained to the result of prediction by neural network.And it is possible there are many actual deployments of reasoning, May deployment beyond the clouds, such as the voice input on common mobile phone, at present all or cloud, that is, first by speaker Sound pass to cloud, cloud returns again to data to come after handling well；It is also possible to be deployed in built-in end, for example, it is Embedded Camera, unmanned plane, robot or vehicle-mounted automatic Pilot, as this embedded or automatic Pilot, its feature is to real-time Property require it is very high.

In this stage of training, if model is slow, bigger cluster, more machines can be used, bigger number is done According to even model is trained parallel parallel.And the problem of end is more than cost is disposed, if method is not proper, even if using special Not good GPU, is also unable to satisfy the requirement of real-time of reasoning, if not doing and optimizing because model is done badly, it may be necessary to two 300 milliseconds can just finish a reasoning, can not be applied in the higher built-in end of requirement of real-time.

Summary of the invention

In consideration of it, the purpose of the present invention is to provide a kind of by accelerating OpenPose reasoning to obtain the side of human body attitude Method, to solve the problems, such as that OpenPose model inference time in actual deployment is longer.

It is provided by the invention lower to hardware device requirement by the method for accelerating OpenPose reasoning to obtain human body attitude, By the way that model is reconstructed, it can speed up the speed of OpenPose reasoning, apply for OpenPose and provided in real life Good basis.

Present invention provide the technical scheme that the method by accelerating OpenPose reasoning to obtain human body attitude, including such as Lower step:

S1: it obtains video flowing: obtaining the video flowing comprising human body attitude information of input by OpenCV, and obtain single frames Image, wherein described image is the image of the BGR format in 3 channels；

S2: single-frame images processing: the data buffer zone of one TensorRT of creation, the data buffer zone be used in GPU and Data are transmitted between memory；An input array is created, size is N × C × W × H, wherein N expression is once input to The quantity of picture in TensorRT, C, H, W respectively indicate number of channels, picture altitude and the width of input picture；Take out single frames The data in each channel in image, and the data in each channel are saved in respectively in input array according to the sequence of BGR；It will be defeated Enter array to pass in the data buffer zone of TensorRT, the input as Optimized model；

S3: model structure reconstruct: loading the model of original OpenPose, obtains network structure, then passes through TensorRT Convolutional layer, bias layer and active coating in network is reconstructed, is combined into one layer；

S4: data precision is reduced: using TensoRT by the parameter optimization of the single precision fp32 in OpenPose model at half The parameter of precision fp16, the model of the OpenPose after being optimized；

S5: optimum results are obtained: the OpenPose mould after the buffered data obtained in S2 to be input to the optimization of S4 acquisition In type, after the network reasoning of optimization, optimum results are obtained, later, the data buffer zone of TensorRT are updated with the result；

S6: obtaining output data: the updated buffered data in data buffer zone copied in memory from GPU, creates An array identical with OpenPose network output size is built as output array, the data in data buffer zone are saved in It exports in array；

S7: human body attitude data are obtained: output array is carried out using human body attitude some algorithm is generated in OpenPose Processing, obtains human body attitude, analyzes for subsequent human body attitude.

A kind of method by accelerating OpenPose reasoning to obtain human body attitude provided by the invention, passes through change The structure of OpenPose network and the precision for reducing network parameter can accelerate the speed of network reasoning, obtain accurate people faster Body attitude data；The data format in TensorRT is converted input data into first, is then optimized using TensorRT The data cached method using in OpenPose algorithm obtained after reasoning is finally obtained the posture of human body by OpenPose model Data.The present invention accelerates the reasoning process of network by TensorRT, and the model occupied space after on the one hand optimizing is smaller, convenient Deployment is in the actual environment；On the other hand, after model optimization, the requirement to hardware is lower, can save in large scale deployment Many costs.

Specific embodiment

The present invention is further explained below in conjunction with specific embodiment, but the not limitation present invention.

The present invention provides a kind of methods by accelerating OpenPose reasoning to obtain human body attitude, include the following steps:

In normal deep learning, convolutional layer, bias layer and active coating need to call the corresponding interface of cuDNN three times, but Some network layers can be merged in TensorRT, current network is on the one hand deeper and deeper, and it is on the other hand more and more wider, It may do the convolution of several same sizes parallel, these convolutional calculations could be incorporated into fact to be come together to do, such as Concat layer in OpenPose, a branch of network obtain the matrix that a size is N × 38 × 45 × 80, another point The matrix that size is N × 19 × 45 × 80 is calculated in branch, and N represents the quantity of input picture, is merged together, and forms one big This two layers can also be incorporated directly into together by the small matrix for N × 57 × 45 × 80, TensorRT, not need to define in a network Union operation；

In order to guarantee data precision of the model in training, when network training, all uses the data of single precision fp32, still One disadvantage of the high data of service precision will also be calculated in reasoning by a large amount of, it is demonstrated experimentally that with lower essence Degree does reasoning equally and can achieve good detection effect, so using TensoRT by the single precision in OpenPose model Parameter of the parameter optimization of fp32 at half precision fp16, the model of the OpenPose after being optimized, meanwhile, TensorRT can also The Tensor Core module in GPU is called, the inference speed of network is accelerated；

S6: obtaining output data: the updated buffered data in data buffer zone copied in memory from GPU, creates Identical with an OpenPose network output size array is built as output array, by taking the example in S2 as an example, number herein Group size is N × 57 × 45 × 80, and the data in data buffer zone are saved in output array；

The method for obtaining human body attitude by acceleration OpenPose reasoning, by changing the structure of OpenPose network simultaneously The precision for reducing network parameter can accelerate the speed of network reasoning, obtain accurate human body attitude data faster；It first will be defeated Enter data conversion into the data format in TensorRT, then optimizes OpenPose model using TensorRT, finally by reasoning The data cached method using in OpenPose algorithm obtained afterwards obtains the attitude data of human body.The present invention passes through TensorRT Accelerate the reasoning process of network, the model occupied space after on the one hand optimizing is smaller, facilitates deployment in the actual environment；Another party Face, after model optimization, the requirement to hardware is lower, and many costs can be saved in large scale deployment.

A specific embodiment of the invention is write according to progressive mode, and each embodiment is highlighted Difference, similar portion can be with cross-reference.

Embodiments of the present invention are elaborated above, but present invention is not limited to the embodiments described above, Those of ordinary skill in the art within the scope of knowledge, can also make various without departing from the purpose of the present invention Variation.

Claims

1. the method by accelerating OpenPose reasoning to obtain human body attitude, which comprises the steps of:

S1: it obtains video flowing: obtaining the video flowing comprising human body attitude information of input by OpenCV, and obtain single frames figure Picture, wherein described image is the image of the BGR format in 3 channels；

S2: single-frame images processing: the data buffer zone of one TensorRT of creation, the data buffer zone are used in GPU and memory Between transmit data；An input array is created, size is N × C × W × H, wherein N expression is once input in TensorRT The quantity of picture, C, H, W respectively indicate number of channels, picture altitude and the width of input picture；It takes out each in single-frame images The data in channel, and the data in each channel are saved in respectively in input array according to the sequence of BGR；Input array is passed to Input in the data buffer zone of TensorRT, as Optimized model；

S3: model structure reconstruct: loading the model of original OpenPose, obtains network structure, then by TensorRT to net Convolutional layer, bias layer and active coating in network are reconstructed, and are combined into one layer；

S4: data precision is reduced: using TensoRT by the parameter optimization of the single precision fp32 in OpenPose model at half precision The parameter of fp16, the model of the OpenPose after being optimized；

S5: optimum results are obtained: in the OpenPose model after the buffered data obtained in S2 to be input to the optimization of S4 acquisition, After the network reasoning of optimization, optimum results are obtained, later, the data buffer zone of TensorRT are updated with the result；

S6: obtaining output data: the updated buffered data in data buffer zone copied in memory from GPU, creation one Data in data buffer zone are saved in output as output array by a array identical with OpenPose network output size In array；

S7: human body attitude data are obtained: output array are handled using human body attitude some algorithm is generated in OpenPose, Human body attitude is obtained, is analyzed for subsequent human body attitude.