CN116883691A - High-frame-rate multipath target detection method for edge equipment - Google Patents

High-frame-rate multipath target detection method for edge equipment Download PDF

Info

Publication number
CN116883691A
CN116883691A CN202311149598.6A CN202311149598A CN116883691A CN 116883691 A CN116883691 A CN 116883691A CN 202311149598 A CN202311149598 A CN 202311149598A CN 116883691 A CN116883691 A CN 116883691A
Authority
CN
China
Prior art keywords
data
component
target
target detection
bbox
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311149598.6A
Other languages
Chinese (zh)
Other versions
CN116883691B (en
Inventor
区英杰
梁柱
董万里
谭焯康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Embedded Machine Tech Co ltd
Original Assignee
Guangzhou Embedded Machine Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Embedded Machine Tech Co ltd filed Critical Guangzhou Embedded Machine Tech Co ltd
Priority to CN202311149598.6A priority Critical patent/CN116883691B/en
Publication of CN116883691A publication Critical patent/CN116883691A/en
Application granted granted Critical
Publication of CN116883691B publication Critical patent/CN116883691B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a high-frame-rate multipath target detection method of edge equipment, which comprises the following steps: performing focus operation on the y component of the scaled yuv420 image data; and combining the y component, the u component and the v component after the focus operation, directly inputting the y component, the u component and the v component into a target detection model on the NPU, outputting a plurality of tuple data corresponding to each anchor, and obtaining a coordinate frame of a detection target by post-processing the tuple data on the CPU. According to the invention, the yuv420 data obtained from hardware encoding and decoding is used, so that the data access amount and time consumption required by hardware scaling are reduced by half, the operation amount and corresponding time delay of color space conversion are obviously reduced, and the method is convenient to be deployed on low-computation-force edge terminal equipment; meanwhile, a neon instruction is used for realizing a focus operator, the defect that npu cannot parallelize the interleaved data is overcome, and the overall performance is improved.

Description

High-frame-rate multipath target detection method for edge equipment
Technical Field
The invention relates to the field of pedestrian detection, in particular to a high-frame-rate multipath target detection method of edge equipment.
Background
Pedestrian detection is always a research hotspot in the field of intelligent video monitoring. Pedestrian detection can obtain rectangular frames of pedestrians in image and video frames, and can be used as a basis for a plurality of downstream applications, such as pedestrian tracking, crowd flow counting, perimeter alarm and the like, so that the detection of pedestrians generally needs a relatively high detection speed so that the downstream applications have enough resources to operate.
In general, the pedestrian detection algorithm needs to be deployed on the edge device, however, the computing power of the edge device is not high, and meanwhile, multiple paths of video streams need to be processed, so that the target detection algorithm needs to complete detection quickly and efficiently, however, the traditional detection algorithm cannot realize efficient detection due to the computing power problem.
For example, the technical scheme of the publication number CN115147383a, the patent name of which is "a lightweight YOLOv5 model-based insulator state rapid detection method" indicates that: because of the limited computing power and memory resources of the mobile terminal platform, the performance of the GPU is far lower than that of the PC terminal, and the performance of the GPU is at least 1/10 lower than that of the GPU. In order to meet the application requirements of the mobile terminal and the embedded platform, some lightweight convolutional neural networks such as MobileNet and SheffleNet are proposed, and researchers improve the lightweight convolutional models so that the lightweight convolutional models are well balanced between accuracy and speed. Therefore, in order to achieve the light weight of the model, the YOLOv5 model structure needs to be correspondingly adjusted and improved. The main network C3PNet in the original YOLOv5 model has the advantages of large quantity of parameters, large occupied memory quantity of the model, high calculation complexity and large requirement on the calculation capability of hardware. To address the above problems, this patent proposes to replace the backbone network C3Net in the YOLOv5 model with a lightweight network shufflenet v 2-stem. The SheffeNetV 2 is a lightweight network improved by analyzing defects of the SheffeNetV 1 and the MobileNetV2, and has the advantages of high precision and high speed. And integrating the lightweight SheffeNetV 2-Stem network into the YOLOv5 model to form a YOLOv 5-SheffeNetV 2S model, and completing the lightweight improvement of the YOLOv5 model. The YOLOv 5-SheffleNetV 2S model can reduce the parameter quantity and the calculation quantity of a network while meeting the detection precision, improve the detection speed of the model, ensure the balance between the accuracy and the detection speed of the model, minimize the volume of the network model and reduce the calculation capability requirement of the model on hardware.
This solution has the following drawbacks:
(1) By replacing the original csp skeleton of YOLOv5s to the shuffle 2 skeleton, the scheme is not friendly to interleaving data handling operations on edge devices in many npu ways, resulting in speed improvement on gpu and performance improvement on npu.
(2) This solution still requires RGB input, cannot avoid using color space conversion, increases the time consumption, and only ensures real-time processing (30 fps) of one video.
For another example, the disclosure number is CN115690642a, and the patent name is "a security detection method and system for a haisi hardware platform", and embodiments thereof provide a security detection method and system for a haisi hardware platform, which replaces a direct stretching image by an equal scaling manner, so that the detection rate of an algorithm is improved, a mosained enhancement algorithm can enhance a small target, samples become balanced, more characteristic information of data is reserved, and meanwhile, data samples during training are also increased; the structure of Focus in the YOLOv5 is changed to enable model reasoning to be faster, so that the YOLOv5 is shorter in time consumption and better in small target detection effect compared with a YOLOv3 model. The time consumption of the YOLOv5 algorithm actually measured on Hai Si 3516DV300 is about 30ms per frame, and the real-time performance is met.
This solution has the following drawbacks:
this solution still requires RGB input, cannot avoid using color space conversion, increases the time consumption, and only ensures real-time processing (30 fps) of one video.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art and provides a high-frame-rate multipath target detection method for edge equipment, which is applicable to security scenes, particularly equipment with algorithms deployed on edge ends, such as Hua Hai Si platform. Unlike other algorithms, which need to use RGB data, the target detection algorithm only uses yuv420 data obtained by hard encoding and decoding, and does not use yuv420 to convert into RGB, so that the operand of color space conversion and corresponding delay are avoided, and the target detection algorithm can be deployed on low-computation-power edge equipment. The algorithm only uses yuv data, so that the data access amount and time consumption required by hardware scaling can be reduced by half. Meanwhile, the algorithm realizes focus operator operation by using the neon instruction, overcomes the defect that npu cannot parallelize the interleaved data, and improves the overall performance.
The aim of the invention is achieved by the following technical scheme:
a high frame rate multipath target detection method for edge equipment comprises the following steps:
s1, edge equipment obtains h264 data streams transmitted by a plurality of network cameras and writes the h264 data streams into a memory;
s2, decoding the h264 data stream into original yuv420 image data through hardware coding and decoding resources on the edge equipment soc, and writing the original yuv420 image data into a memory;
s3, scaling the yuv420 image data to an input size conforming to a target detection model by using a hardware image scaling calculation unit on the edge equipment soc;
s4, splitting the scaled yuv420 image data into a y component representing brightness, a u component representing red chromaticity and a v component representing blue chromaticity;
s5, performing focus operation on the y component on the CPU to obtain a y component consistent with the height dimension and the width dimension of the u component and the v component; the u component and the v component remain unchanged;
since y and uv component dimensions of yuv420 are different, but the dimensions of each channel are required to be consistent in convolution operation, convolution operation in the target detection network cannot be directly performed, and normal convolution operation is required to be ensured by aligning the dimensions through focus operation.
And S6, merging the y component, the u component and the v component after the focus operation, directly inputting the merged y component, the u component and the v component into a target detection model on the NPU, outputting a plurality of tuple data corresponding to each anchor by the target detection model, and finally obtaining a coordinate frame of a detection target by post-processing the tuple data on the CPU.
Performing focus operation on the y component to obtain a data dimension (h/2,w/2, 4), wherein h is the height of the image, and w is the width of the image; the u and v components are kept unchanged, and the corresponding data dimension is (h/2,w/2, 1); and finally, carrying out merging operation on the last dimension in the sequence of y, u and v, changing the obtained data dimension of the input target detection model into (h/2,w/2, 6), and inputting the merged data to the NPU for calculating the neural network.
The input head of the yolov5 target detection model is modified specifically as follows, and since the data dimension after focus operation is (h/2,w/2, 6), the input channels of the first layer convolution operation of the neural network need to be replaced by 6 channels instead of the original 3 channels (rgb).
The y component is subjected to focus operation, which is specifically as follows:
the input is the y component, the dimensions are (h, w, 1);
the output is:
y_focus=concat(y[0::2,0::2],y[1::2,0::2],
y[0::2,1::2],y[1::2,1::2], index=-1);
where concat (..the index= -1) represents merging in the last dimension.
The target detection model uses a neon calculation unit to perform focus operation on the scaled yuv420 image data.
The process of the neon computing unit for performing the focus operation is as follows: the method comprises the steps of loading 256-bit interleaving data into two 128-bit registers through a vld2q_u8 instruction, assuming the two registers to be Q0 and Q1, respectively writing the data of the registers Q0 and Q1 back to a Cache through a vst1q_u8 instruction, repeating the operation, de-interleaving the whole interleaving data, and writing the interleaving data into a corresponding memory; the interleaved data is the y component representing luminance in yuv420 image data.
In step S3, the post-processing refers to parallelizing and judging whether the confidence in the tuple data is greater than a threshold value through the neon instruction, if so, the tuple data is calculated to be the coordinates of the specific frame, thereby being the coordinate frame of the detection target.
The post-processing loads the confidence in several tuple data into the Q register by using vld1q_u8, and then determines whether the confidence of the tuple data is greater than a threshold by the vcgtq_s8 instruction.
The target detection model is a yolov5 target detection model, and the output of the model is specifically as follows: for each anchor, 3 tuples [ bbox_x, bbox_y, bbox_w, bbox_h, confidence, class0, class1, ], data type int8, wherein bbox_x is the abscissa of the target frame, bbox_y is the ordinate of the target frame, bbox_w is the width of the target frame, bbox_h is the height of the target frame, confidence is the confidence of the target frame, class0, class1 is the class of the target. Because the confidence of the different tuples is continuous in memory, the neon can be used to determine the size of the confidence in the 16 tuples at the same time, which significantly increases the post-processing speed.
The edge terminal equipment obtains h264 data streams transmitted by a plurality of network cameras through Rtsp protocol.
The neon computing unit is built in the Cortex-A53 processor.
Meanwhile, the invention provides:
a server comprising a processor and a memory, the memory having stored therein at least one program loaded and executed by the processor to implement the high frame rate multi-way object detection method of an edge device described above.
A computer-readable storage medium having stored therein at least one program loaded and executed by a processor to implement the high frame rate multi-path target detection method of an edge device described above.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention modifies the input head of the target detection model yolov5, and supports yuv420 in a native manner, and the input data volume is halved compared with RGB, so that the corresponding first-layer convolution operation MAC is halved.
2. Compared with input RGB, the invention supports yuv420 input, can reduce half data transmission (because the uv component of yuv420 is downsampled, the proportion is 1/4, and the y component is not downsampled, the total data is 1+1/4+1/4=1.5, and RGB is not downsampled, 1+1+1=3, so the total data quantity is less than half), reduces transmission delay, takes Soc of Hai Si as an example, cpu is Cortex-A53, and the memory is ddr4 combination of 64bit width. The maximum bandwidth of the single core cpu for the memory copy is about 1GB/s, assuming that the image size is 1080p, the time consumption of the memory copy can be reduced by about 1.5ms after yuv420 is used.
3. The invention reduces the time consumption of half of the restore due to the reduction of half of the data volume, and simultaneously avoids the conversion operation of converting yuv into RGB color space once.
4. In the invention, a neon instruction in Cortex-A53 is used for realizing focus operation, and since the neon instruction supports multiple interleaving data types to read and write in a width of 128 bits. After neon instruction optimization, the total focus operation time is reduced to 0.2ms.
5. According to the invention, the comparison operation of the part of the neon instruction parallelization (the comparison operation refers to comparison of the confidence coefficient and the threshold value and judging whether the confidence coefficient is larger than the threshold value) can be performed simultaneously by comparing the confidence coefficient of 16 int8 in each instruction period, so that the comparison speed is improved by 16 times compared with the original comparison speed, and the time consumption of post-processing is reduced to about 0.2ms.
6. The invention can reasonably allocate the resources of the npu ive cpu, so that the waiting time on the pipeline is reduced, and the overall throughput of the whole system is improved: in the traditional detection algorithm, npu reasoning takes the longest time, so that the bottleneck of a pipeline is inferred at npu (all units on the pipeline are working at the same time), in the invention, the bottleneck of the pipeline reaches hardware scaling because the reasoning takes a short time and is even less than the time for image scaling, and the invention can reduce scaling time by using yuv420 instead, thereby improving the efficiency of the pipeline (all units take the same time when the pipeline efficiency is highest, and no units need to wait).
Drawings
Fig. 1 is a flowchart of a method for detecting a high frame rate multipath target of an edge device according to the present invention.
Fig. 2 is a diagram of a focus operator data flow.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Referring to fig. 1, a method for detecting a high frame rate multipath target of an edge device includes the following steps: the edge device obtains h264 data stream transmitted by each network camera through Rtsp protocol, writes the h264 data stream into a memory, decodes the h264 data stream into original yuv420 data through hardware coding and decoding resources on the soc, writes the yuv420 image data into the memory, then scales the yuv420 image data to the input size (such as 640x 384) of the model by using a hardware image scaling calculation unit on the soc, and uses a neon calculation unit of Cortex-A53 to realize the focus operation of yolov5, the data subjected to the focus operation can be directly called npu to perform network operation, and finally obtains the coordinate frame of the detection target through post-processing.
The implementation process of the specific technical scheme of the invention is as follows:
1. collecting training data
And acquiring a video sequence of the target by using the installed camera, and storing image data to the edge equipment or the central server. The collected data are mainly used for training and testing the target detection algorithm model. In order to ensure the performance of the model, data under different conditions need to be acquired, and the data cover different time periods, different illumination changes and different weather conditions.
2. Definition of detection model
The target detection algorithm is mainly based on a convolutional neural network model, and the specific model is a yolov5s target detection model. The algorithm based on the convolutional neural network model can be more suitable for the conditions of illumination change, shielding and the like on the detection effect. The convolutional neural network model needs to define input and output in advance, wherein the input is the number and format of pictures in the input of the defined model, and the output is the target envelope frame and class. And manually labeling the acquired data by using a defined output scheme to obtain training data, and finally training the target detection model and updating the model weight. In the invention, the model is a yolov5 target detection model, a picture with the size of 640x384 is input, the speed and the precision are both considered, and the detection model is quantized into an int8 quantization model of a corresponding hardware platform. The model outputs 1 category corresponding to the detection target of the pedestrian category.
3. Adaptation of yuv420 input format for model input header
The input method comprises the steps of modifying a yolov5 input head, carrying out original support of yuv420 data input, specifically improving the input model by assuming that the original model is RGB image data with input dimension (h, w, 3), splitting the input yuv420 into three components of y, u and v after the improvement, carrying out focus operation on the y component to obtain data dimension (h/2,w/2, 4), keeping the data dimension (h/2,w/2, 1) corresponding to the u and v component unchanged, carrying out merging operation on the last dimension in sequence of y, u and v, obtaining the dimension of input model data which is changed into (h/2,w/2, 6), and inputting the merged data to npu for neural network calculation.
Specific focus operation is as follows, as shown in fig. 2, input is y, and dimensions are (h, w, 1);
the output is:
y_focus=concat(y[0::2,0::2],y[1::2,0::2],
y[0::2,1::2],y[1::2,1::2], index=-1);
where concat (..the index= -1) represents merging in the last dimension.
It is worth noting that when the conventional edge end performs the target detection task, the obtained h264 stream of the ipc is decoded into yuv420 format data, then the yuv420 format data is converted into RGB color space, and then the neural network calculation is performed on npu.
This improvement brings the following advantages, particularly the time-consuming examples of the haisi platform.
(1) Since the amount of data input is reduced by half, the corresponding convolution operation MAC is reduced by half, which can reduce the time consuming convolution of npu at the first layer of the model.
(2) Compared with RGB, yuv420 reduces half data transmission and transmission delay, and taking Soc of Hai Si as an example, cpu is Cortex-A53 and memory is a ddr4 combination with 64 bits width. The maximum bandwidth of the single core cpu for the memory copy is about 1GB/s, assuming that the image size is 1080p, the time consumption of the memory copy can be reduced by about 1.5ms after yuv420 is used.
(3) The invention reduces the time consumption of half of the restore due to the reduction of half of the data volume, and simultaneously avoids the conversion operation of converting yuv into RGB color space once.
(4) Since the yuv to RGB color space conversion operation is no longer required, the time consumption is correspondingly reduced by about 2-3 ms.
4. Optimization of hardware pre-processing
On edge npu, each npu computing unit support instruction is relatively compact for improved power and reduced npu power consumption due to limited wafer area, and multiple instruction combinations are required for unusual operations. For example, there is no special optimization for interleaved data handling operations on the Hai Si platform npu, requiring multiple instruction combinations, resulting in a slow implementation of focus on npu.
In the invention, a neon instruction in Cortex-A53 is used for realizing focus operation, and since the neon instruction supports multiple interleaving data types to read and write in a width of 128 bits. After neon instruction optimization, the total focus operation time is reduced to 0.2ms.
The specific implementation is as follows:
256-bit interleaving data (namely y component data) are loaded into two 128-bit registers through a vld2q_u8 instruction, the data are assumed to be Q0 and Q1, the data of Q0 and Q1 are respectively written back into a Cache through a vst1q_u8 instruction, the operation is repeated, and the data of the whole y component are de-interleaved and written into a corresponding memory.
The pseudo code is as follows:
template<size_t length>
__attribute__((always_inline)) void deinterleave_line(uint8_t *interleave, uint8_t *array1, uint8_t *array2) {
size_t vectors = length>>5;
while (vectors -->0) {
const uint8x16x2_t src = vld2q_u8(interleave);
vst1q_u8(array1, src.val[0]);
vst1q_u8(array2, src.val[1]);
interleave += 32;
array1 += 16;
array2 += 16;
}
}
in image scaling, it is necessary to scale an image of yuv420 input 1080p to 640x384 resolution, conventionally using cpu in combination with an image processing library (opencv) for image scaling. The invention is completed by using hardware scaling resources on the soc, so that the occupancy rate of the cpu is obviously reduced, and meanwhile, the special hardware scaling is shorter than the software time consumption.
5. Npu optimization of edge models
When the edge terminal performs model reasoning, the edge terminal npu needs to be considered to perform fusion calculation on the conv-bn-relu operation combination, namely, data is not transmitted back and forth between the cache of npu and the ddr4, but three operations are calculated in the cache and then written back into the ddr4, so that a large amount of data carrying time is reduced.
In addition, the silu operation includes an exponential calculation and a multiply-add operation, where on edge npu, the exponential calculation is an operation that consumes multiple clock cycles, and typically uses a table look-up and interpolation to obtain the result of the exponential calculation. The relu operation is very simple, outputs the output with the input lower than 0 as 0, and the other outputs are kept unchanged, so that the operation can be realized by using a hardware digital circuit very simply, and on most npu platforms, the relu operation can be completed only by one clock cycle.
Based on the two reasons, all the silu operators in the model are changed into relu operators, and the calculation time consumption of npu can be reduced by 30% on Hai Si npu.
6. Optimization of post-processing
The greatest part of time consumption in post-processing is concentrated on a large number of judgment confidence degrees which are larger than a threshold value, for example, by taking yolov5 with 640x384 resolution of class1 as an example, the number of output confidence degrees is (20x12+40x24+80x48) =5040, and the number of confidence degrees which need to be compared is very large. Therefore, the invention parallelizes the comparison operation of the part through the neon instruction, and the confidence of 16 int8 can be compared in each instruction period, and the comparison speed is improved by 16 times compared with the original comparison speed.
The method comprises the steps of loading 16 int8 confidence coefficients into a Q register by using vld1q_u8, judging whether the confidence coefficient is larger than a threshold value or not through a vcgtq_s8 instruction, and calculating coordinates of a specific frame only when the confidence coefficient is larger than the threshold value and the tuple at the corresponding position is larger than the threshold value.
The high-frame-rate multipath target detection method of the edge equipment can be used for detecting pedestrians, vehicles and other static or moving objects, and is also applicable to the detection method.
The key points and the protection points of the invention are as follows:
(1) The key point of the invention is to reduce the input data dimension for the adaptation of YUV420 data format and the use of focus operation, thereby reducing convolution operation and calculation amount.
The improvement of the present invention over prior art implementations is the adaptation of the YUV420 data format and the reduction of the input data dimension using focus operations. Compared with the traditional method that YUV420 is converted into RGB color space and then neural network calculation is carried out in the edge end target detection task, the invention saves redundant operation of color space conversion, reduces the operation speed of the whole system and reduces the resource consumption of the system.
(2) The key technical point of the invention is to use NEON instruction set in Cortex-A53 to realize the de-interleaving operation of interleaving data with 256 bits, thereby improving the calculation efficiency and performance of the system.
Compared with the prior art, the invention adopts an optimization method of hardware preprocessing, namely, the NEON instruction set in Cortex-A53 is used for realizing the de-interleaving operation of interleaving data with 256 bits. The method can improve the calculation power and reduce the power consumption, thereby improving the performance of the whole system. In contrast, the algorithm implementation in the prior art is not optimized for the characteristics of hardware, resulting in slower operation speed.
(3) The key point of the invention is that the NEON instruction is used for parallelizing the comparison operation, so that the judgment process of the confidence coefficient is accelerated. The point to be protected is to improve the post-processing speed and efficiency of the target detection model and reduce the calculation time and resource consumption of the model.
Compared with the prior art, the invention adopts a more efficient calculation method, and can provide better performance when processing a large amount of data. Particularly in target detection, a great amount of confidence is needed to judge, and the advantage of the method is more obvious. In addition, the implementation method of the invention is relatively simple, can be conveniently integrated into the existing target detection framework, and can be optimized on different hardware platforms, thereby improving the applicability of the model.
(4) According to the invention, all the silu operators in the model are changed into the relu operators, and npu is used for carrying out fusion calculation on conv-bn-relu operation combination, so that a large amount of data carrying time is reduced.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (12)

1. The high frame rate multipath target detection method for the edge equipment is characterized by comprising the following steps of:
s1, edge equipment obtains h264 data streams transmitted by a plurality of network cameras and writes the h264 data streams into a memory;
s2, decoding the h264 data stream into original yuv420 image data through hardware coding and decoding resources on the edge equipment soc, and writing the original yuv420 image data into a memory;
s3, scaling the yuv420 image data to an input size conforming to a target detection model by using a hardware image scaling calculation unit on the edge equipment soc;
s4, splitting the scaled yuv420 image data into a y component representing brightness, a u component representing red chromaticity and a v component representing blue chromaticity;
s5, performing focus operation on the y component on the CPU to obtain a y component consistent with the height dimension and the width dimension of the u component and the v component; the u component and the v component remain unchanged;
and S6, merging the y component, the u component and the v component after the focus operation, directly inputting the merged y component, the u component and the v component into a target detection model on the NPU, outputting a plurality of tuple data corresponding to each anchor by the target detection model, and finally obtaining a coordinate frame of a detection target by post-processing the tuple data on the CPU.
2. The high frame rate multipath target detection method of claim 1, wherein the y component is subjected to focus operation to obtain a data dimension (h/2,w/2, 4), h is the height of the image, and w is the width of the image; the u and v components are kept unchanged, and the corresponding data dimension is (h/2,w/2, 1); and finally, carrying out merging operation on the last dimension in the sequence of y, u and v, changing the obtained data dimension of the input target detection model into (h/2,w/2, 6), and inputting the merged data to the NPU for calculating the neural network.
3. The method for detecting a high frame rate multipath target of an edge device according to claim 2, wherein the performing a focus operation on the y component is specifically as follows:
the input is the y component, the dimensions are (h, w, 1);
the output is:
y_focus=concat(y[0::2,0::2],y[1::2,0::2],
y[0::2,1::2],y[1::2,1::2], index=-1);
where concat (..the index= -1) represents merging in the last dimension.
4. The high frame rate multi-path object detection method of an edge device according to claim 1, wherein the object detection model performs a focus operation on the scaled yuv420 image data using a neon calculation unit.
5. The method for detecting a high frame rate multipath target of an edge device according to claim 4, wherein the process of performing a focus operation by the neon calculating unit is as follows: the method comprises the steps of loading 256-bit interleaving data into two 128-bit registers through a vld2q_u8 instruction, assuming the two registers to be Q0 and Q1, respectively writing the data of the registers Q0 and Q1 back to a Cache through a vst1q_u8 instruction, repeating the operation, de-interleaving the whole interleaving data, and writing the interleaving data into a corresponding memory; the interleaved data is the y component representing luminance in yuv420 image data.
6. The method for high frame rate multipath object detection of an edge device according to claim 4, wherein said neon computing unit is built into a Cortex-a53 processor.
7. The method according to claim 1, wherein in step S3, the post-processing refers to parallelizing to determine whether the confidence level in the tuple data is greater than a threshold value by a neon instruction, and if so, the tuple data is calculated as coordinates of a specific frame, thereby serving as a coordinate frame of the detection target.
8. The method of claim 7, wherein the post-processing loads the confidence level of the plurality of tuple data into the Q register by using vld1q_u8, and then determines whether the confidence level of the tuple data is greater than a threshold value by a vcgtq_s8 instruction.
9. The method for detecting a high frame rate multipath target of an edge device according to claim 1, wherein the target detection model is a yolov5 target detection model, and the output is specifically: for each anchor, 3 tuples [ bbox_x, bbox_y, bbox_w, bbox_h, confidence, class0, class1, ], data type int8, wherein bbox_x is the abscissa of the target frame, bbox_y is the ordinate of the target frame, bbox_w is the width of the target frame, bbox_h is the height of the target frame, confidence is the confidence of the target frame, class0, class1 is the class of the target.
10. The method for detecting a high frame rate multipath target of an edge device according to claim 1, wherein the edge device obtains h264 data streams transmitted by a plurality of network cameras through an Rtsp protocol.
11. A server comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the high frame rate multi-way object detection method of an edge device of any one of claims 1 to 10.
12. A computer readable storage medium having stored therein at least one program loaded and executed by a processor to implement the high frame rate multi-way object detection method of an edge device of any one of claims 1 to 10.
CN202311149598.6A 2023-09-07 2023-09-07 High-frame-rate multipath target detection method for edge equipment Active CN116883691B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311149598.6A CN116883691B (en) 2023-09-07 2023-09-07 High-frame-rate multipath target detection method for edge equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311149598.6A CN116883691B (en) 2023-09-07 2023-09-07 High-frame-rate multipath target detection method for edge equipment

Publications (2)

Publication Number Publication Date
CN116883691A true CN116883691A (en) 2023-10-13
CN116883691B CN116883691B (en) 2023-11-07

Family

ID=88266687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311149598.6A Active CN116883691B (en) 2023-09-07 2023-09-07 High-frame-rate multipath target detection method for edge equipment

Country Status (1)

Country Link
CN (1) CN116883691B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110753229A (en) * 2019-11-15 2020-02-04 天津光电通信技术有限公司 Video acquisition device and method based on H.265 coding
CN115393774A (en) * 2022-09-09 2022-11-25 南京邮电大学 Lightweight fire smoke detection method, terminal equipment and storage medium
WO2023089230A2 (en) * 2021-11-16 2023-05-25 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN116229226A (en) * 2023-02-28 2023-06-06 南京理工大学 Dual-channel image fusion target detection method suitable for photoelectric pod
CN116258941A (en) * 2023-03-13 2023-06-13 西安电子科技大学 Yolox target detection lightweight improvement method based on Android platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110753229A (en) * 2019-11-15 2020-02-04 天津光电通信技术有限公司 Video acquisition device and method based on H.265 coding
WO2023089230A2 (en) * 2021-11-16 2023-05-25 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN115393774A (en) * 2022-09-09 2022-11-25 南京邮电大学 Lightweight fire smoke detection method, terminal equipment and storage medium
CN116229226A (en) * 2023-02-28 2023-06-06 南京理工大学 Dual-channel image fusion target detection method suitable for photoelectric pod
CN116258941A (en) * 2023-03-13 2023-06-13 西安电子科技大学 Yolox target detection lightweight improvement method based on Android platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AOHUA SONG 等: "Lightweight the Focus module in YOLOv5 by Dilated Convolution", 《2022 3RD INTERNATIONAL CONFERENCE ON COMPUTER VISION, IMAGE AND DEEP LEARNING & INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS》, pages 111 - 114 *
曾寰 等: "颜色空间转换耦合特征图的显著性检测算法", 《计算机工程与设计》, vol. 40, no. 06, pages 1665 - 1670 *

Also Published As

Publication number Publication date
CN116883691B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
US20140153635A1 (en) Method, computer program product, and system for multi-threaded video encoding
US10964000B2 (en) Techniques for reducing noise in video
CN109936745B (en) Method and system for improving decompression of raw video data
CN107105266A (en) A kind of coding/decoding method, the apparatus and system of PNG images
CN110428382B (en) Efficient video enhancement method and device for mobile terminal and storage medium
US11470327B2 (en) Scene aware video content encoding
WO2023005140A1 (en) Video data processing method, apparatus, device, and storage medium
US20190349558A1 (en) Media processing systems
CN112235569B (en) Quick video classification method, system and device based on H264 compressed domain
CN107820095A (en) A kind of long term reference image-selecting method and device
Feng et al. A dual-network based super-resolution for compressed high definition video
CN110913225B (en) Image encoding method, image encoding device, electronic device, and computer-readable storage medium
WO2017162015A1 (en) Data processing method and apparatus, and storage medium
CN111432213A (en) Adaptive tile data size coding for video and image compression
US20140301641A1 (en) Tile-Based Compression and Decompression for Graphic Applications
CN116883691B (en) High-frame-rate multipath target detection method for edge equipment
US20210272327A1 (en) Decoding data arrays
WO2023142715A1 (en) Video coding method and apparatus, real-time communication method and apparatus, device, and storage medium
US20220385914A1 (en) Methods and apparatus for processing of high-resolution video content
He et al. FPGA-based high definition image processing system
Jilani et al. JPEG image compression using FPGA with Artificial Neural Networks
US11166035B1 (en) Method and device for transcoding video
TW201322774A (en) Multiple stream processing for video analytics and encoding
CN107480616B (en) Skin color detection unit analysis method and system based on image analysis
CN107172425A (en) Reduced graph generating method, device and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant