CN114612825B

CN114612825B - Target detection method based on edge equipment

Info

Publication number: CN114612825B
Application number: CN202210230959.9A
Authority: CN
Inventors: 何臻力; 索珈顺; 王汝欣; 何婧
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2024-03-19
Anticipated expiration: 2042-03-09
Also published as: CN114612825A

Abstract

The invention discloses a target detection method based on edge equipment, which comprises the steps of selecting a target detection model according to actual needs, setting detection time, detection power consumption and detection precision priority, setting detection time upper limit, acquiring deep learning frames supporting different model input resolutions for the edge equipment needing target detection, forming a deep learning frame set, acquiring inference time and average power consumption of each deep learning frame, optimizing the deep learning frames based on the detection time upper limit and three performance priorities, optimizing and configuring the target detection model according to the screened deep learning frames, training the target detection model, and then deploying the target detection model on the edge equipment for target detection. According to the invention, the deep learning framework of the target detection model on the edge equipment is optimized, and the detection time, the detection power consumption and the detection precision of the target detection model are comprehensively considered, so that the performance of target detection is improved.

Description

Target detection method based on edge equipment

Technical Field

The invention belongs to the technical field of edge calculation, and particularly relates to a target detection method based on edge equipment.

Background

The scheme belongs to the field of edge calculation. Currently, many AI algorithms are deployed on edge devices to achieve the goals of latency reduction, bandwidth saving, and privacy protection. The target detection algorithm is widely applied to the fields of industrial production, city monitoring, pedestrian detection and the like. It would be of great value to design a target detection method that would perform well on low cost edge devices.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a target detection method based on edge equipment, which improves the performance of target detection by optimizing a deep learning frame of a target detection model on the edge equipment.

In order to achieve the above object, the edge device-based object detection method of the present invention includes the steps of:

s1: selecting a target detection model according to actual requirements, and then setting the priority of three detection performance indexes of detection time, detection power consumption and detection precision of the target detection model and the upper limit T of the detection time of the target detection model;

s2: for edge equipment needing target detection, firstly determining deep learning frames supported by the edge equipment, and recording the number of the supported deep learning frames as N; obtaining the input resolutions of the models supported by the target detection models when the target detection models run on the edge equipment and are used for each deep learning framework, and recording the number M of the input resolutions of the models supported by the nth deep learning framework _n N=1, 2, …, N, the nth deep learning frame supporting the mth model input resolution is denoted as f _n,m ，m＝1,2,…,M _n All deep learning frames f _n,m Forming a deep learning frame set F, and then acquiring each deep learning frame F _n,m Time of inference at edge device t _n,m And average power consumption w _n,m ；

S3: for the deep learning framework set F, the time t is inferred _n,m Deleting the deep learning frames exceeding the upper limit T of the detection time to obtain a deep learning frame set F' after preliminary screening;

if the highest priority in the detection performance indexes is the detection time, the deep learning frame with the least detection time in the deep learning frame set F' is taken as the deep learning frame to be usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>If the number of the deep learning frames is greater than 1, continuing to screen according to the detection precision and the priority of the detection power consumption to obtain an optimal deep learning frame;

if the highest priority in the detection performance indexes is the detection power consumption, the deep learning frame with the least average power consumption in the deep learning frame set F' is taken as the deep learning frame to be usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>If the number of the deep learning frames is greater than 1, continuing to screen according to the detection precision and the priority of the detection time to obtain an optimal deep learning frame;

if the highest priority among the detection performance indexes is the detection accuracy, the deep learning frame with the largest model input resolution in the deep learning frame set F' is taken as the deep learning frame usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>If the number of the deep learning frames is greater than 1, continuing to screen according to the detection time and the priority of the detection power consumption to obtain an optimal deep learning frame;

s4: deep learning frame obtained through screening in step S3And input resolution of the object detection model +.>Optimally configuring a target detection model, and then collecting training samples to train the target detection model;

s5: and (3) deploying the target detection model trained in the step (S4) on edge equipment, and carrying out target detection on the video image acquired by the camera.

According to the target detection method based on the edge equipment, a target detection model is selected according to actual requirements, the detection time, the detection power consumption and the detection precision priority are set, for the edge equipment needing target detection, a deep learning frame supporting different model input resolutions is obtained to form a deep learning frame set, the deducing time and the average power consumption of each deep learning frame are obtained, the deep learning frame is optimized based on the detection time upper limit and the three performance priorities, the target detection model is optimally configured according to the deep learning frame obtained through screening, and the target detection model is deployed on the edge equipment after training.

According to the invention, the deep learning framework of the target detection model on the edge equipment is optimized, and the detection time, the detection power consumption and the detection precision of the target detection model are comprehensively considered, so that the performance of target detection is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of an edge device-based object detection method of the present invention;

FIG. 2 is a schematic diagram of a model input resolution preference supported by a deep learning framework.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

Fig. 1 is a flowchart of an embodiment of an edge device-based object detection method of the present invention. As shown in fig. 1, the method for detecting an object based on an edge device of the present invention comprises the following specific steps:

s101: setting a target detection model:

selecting a target detection model according to actual requirements, and then setting the priority of three detection performance indexes of detection time, detection power consumption and detection precision of the target detection model and the upper limit T of the detection time of the target detection model;

s102: determining a deep learning framework supported by the edge device:

for the edge device needing target detection, firstly determining the deep learning frames supported by the edge device, and recording the number of the supported deep learning frames as N. Obtaining the input resolutions of the models supported by the target detection models when the target detection models run on the edge equipment and are used for each deep learning framework, and recording the number M of the input resolutions of the models supported by the nth deep learning framework _n N=1, 2, …, N, the nth deep learning frame supporting the mth model input resolution is denoted as f _n,m ，m＝1,2,…,M _n All deep learning frames f _n,m Forming a deep learning frame set F, and then acquiring each deep learning frame F _n,m Time of inference at edge device t _n,m And average power consumption w _n,m 。

The invention aims at a target detection method based on deep learning, and before starting a deep learning project, the selection of a proper framework is very important, because the selection of the proper framework can play a role in half effort. Therefore, before the edge device is started to detect the target, the invention firstly determines the deep learning framework supported by the edge device. Currently, there are more popular deep learning frameworks of PaddlePaddle, tensorflow, caffe, theano, MXNet, torch, pyTorch, tensorRT 16-bit and TensoorRT 32-bit.

When determining the model input resolution supported by the deep learning frame, since the resolution of the image acquired by the camera is generally greater than the model input resolution, scaling is required, how to make the scaled acquired image contain more effective information as much as possible, and primary screening can be performed on the model input resolution supported by the deep learning frame according to the resolution of the image acquired by the camera, that is, the supported model input resolution of the deep learning frame selects the resolution with the aspect ratio closest to the aspect ratio of the resolution of the image acquired by the camera, that is, the resolution with the difference between the ratios smaller than the preset threshold. FIG. 2 is a schematic diagram of a model input resolution preference supported by a deep learning framework. As shown in fig. 2, assuming that the size ratio of the camera captured images is 4:3, the model input resolution supported by the deep learning framework is 416×416 (1:1), 416×320 (13:10), 416×256 (13:8). As can be seen, when the model input resolution is 416×416, the scaled camera acquires an image as shown in fig. 2 (a), the image is filled up and down by solid color blocks, and the actual effective resolution of the image is 416×312; when the model input resolution is 416×320, the scaled camera acquired image is fig. 2 (b), and the actual effective resolution of the image is 416×312; when the model input resolution is 416×256, the scaled camera capture image is fig. 2 (c), and the effective resolution of the image is reduced to 341×256. The accuracy of the different inputs is measured with the visdrop dataset assuming the target detection model is the YOLOv3 model. Table 1 is a table of statistics of detection accuracy after scaling of different model input resolutions in this embodiment.

Model input resolution	AP precision (%)	AP50 precision (%)	AP75 precision (%)
				416x416	8	18.39	6.02
416x320	8.05	18.49	5.99
				416x256	7.65	17.59	5.63

TABLE 1

As shown in table 1, there was little difference in accuracy between 1:1 resolution 416 x 416 and nearest 4:3 resolution 416 x 320, while accuracy was reduced when inferred at nearest 16:9 resolution 416 x 256. And the calculated amount of resolution 416 x 320 is smaller than resolution 416 x 416. Therefore, when the aspect ratio of the camera acquisition image is closest to the aspect ratio of the input resolution of the model, the scaling of the camera acquisition image can ensure that the accuracy is not reduced and the calculation amount of the model is reduced, so that the estimation speed is increased.

In this embodiment, the edge device is a GPU-based edge device Jetson Nano, and the supported deep learning frameworks include TensorFlow, pyTorch, tensoorRT-bit and TensoorRT 32-bit. Table 2 is an evaluation index of the ten flow framework on the edge device Jetson Nano in this embodiment.

TABLE 2

Table 3 shows the evaluation index of the pyretch framework on the edge device Jetson Nano in this embodiment.

TABLE 3 Table 3

Table 4 shows the evaluation index of the TensoorRT 16-bit framework on the edge device Jetson Nano in this example.

TABLE 4 Table 4

Table 5 shows the evaluation index of the TensoorRT 32-bit framework on the edge device Jetson Nano in this example.

TABLE 5

S103: determining an optimization mode of a target detection model:

in the invention, the target detection model is optimized from two aspects of an inference frame and input picture resolution, so that the performance of the target detection model can reach the expected as much as possible, and the specific method comprises the following steps:

for the deep learning framework set F, the time t is inferred _n,m And deleting the deep learning frames exceeding the upper limit T of the detection time to obtain a deep learning frame set F' after preliminary screening.

If the highest priority in the detection performance indexes is the detection time, the deep learning frame with the least detection time in the deep learning frame set F' is taken as the deep learning frame to be usedThe input resolution of the corresponding model is used as the input of the target detection modelResolution of->If the number of the deep learning frames is greater than 1, continuing to screen according to the detection precision and the priority of the detection power consumption to obtain the optimal deep learning frames.

If the highest priority in the detection performance indexes is the detection power consumption, the deep learning frame with the least average power consumption in the deep learning frame set F' is taken as the deep learning frame to be usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>If the number of the deep learning frames is greater than 1, continuing to screen according to the detection precision and the priority of the detection time to obtain the optimal deep learning frames.

If the highest priority among the detection performance indexes is the detection accuracy, the deep learning frame with the largest model input resolution in the deep learning frame set F' is taken as the deep learning frame usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>This is because the greater the input resolution, the higher the detection accuracy in the neural network model. If the number of the deep learning frames is greater than 1, continuing to screen according to the detection time and the priority of the detection power consumption to obtain the optimal deep learning frames.

S104: optimizing a target detection model:

and (3) optimally configuring the target detection model according to the target detection model optimization mode determined in the step (S103), and then collecting training samples to train the target detection model.

S105: and (3) target detection:

and (3) deploying the target detection model trained in the step (S104) on edge equipment, and carrying out target detection on the video image acquired by the camera.

Besides the edge device Jetson Nano in the embodiment, experiments are also performed on the edge device Jetson Xavier NX, and the experiments show that the invention can well operate on both edge devices.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. An edge device-based target detection method is characterized by comprising the following steps:

s2: for edge equipment needing target detection, firstly determining deep learning frames supported by the edge equipment, and recording the number of the supported deep learning frames as N; obtaining the input resolutions of the models supported by the target detection models when the target detection models run on the edge equipment and are used for each deep learning framework, and recording the number M of the input resolutions of the models supported by the nth deep learning framework _n N=1, 2, …, N, the nth deep learning frame supporting the mth model input resolution is denoted as f _n,m ，m＝1,2,…,M _n All deep learning frames f _n,m Forming a deep learning frame set F, and then acquiring each deep learning frame F _n,m At edge devicesTime t is inferred _n,m And average power consumption w _n,m ；

if the highest priority in the detection performance indexes is the detection time, the deep learning frame with the least detection time in the deep learning frame set F' is taken as the deep learning frame to be usedTaking the input resolution of the corresponding model as the input resolution of the target detection model +.>If the number of the deep learning frames is greater than 1, continuing to screen according to the detection precision and the detection power consumption preference level to obtain an optimal deep learning frame;

2. The method according to claim 1, wherein the supported model input resolution of the deep learning framework in step S1 is selected from resolutions having a difference between an aspect ratio and an aspect ratio of the resolution of the image acquired by the camera less than a preset threshold.