CN112257794B - YOLO-based lightweight target detection method - Google Patents

YOLO-based lightweight target detection method Download PDF

Info

Publication number
CN112257794B
CN112257794B CN202011164112.2A CN202011164112A CN112257794B CN 112257794 B CN112257794 B CN 112257794B CN 202011164112 A CN202011164112 A CN 202011164112A CN 112257794 B CN112257794 B CN 112257794B
Authority
CN
China
Prior art keywords
module
channel
convolution
target detection
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011164112.2A
Other languages
Chinese (zh)
Other versions
CN112257794A (en
Inventor
李晨
许虞俊
杜文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011164112.2A priority Critical patent/CN112257794B/en
Publication of CN112257794A publication Critical patent/CN112257794A/en
Application granted granted Critical
Publication of CN112257794B publication Critical patent/CN112257794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a lightweight target detection method based on YOLO, which consists of a repeatedly stacked channel non-scaling convolution module, wherein the size of a model is greatly reduced through the combination of repeated channel non-scaling convolution blocks, common 1x1 convolution and common 3x3 convolution, and the weight of each channel is redistributed through an ECA structure, so that the self-adaptive learning capacity of each channel to different types of targets is enhanced. The network outputs three characteristic graphs with different scales based on a YOLO series frame and is respectively responsible for predicting objects with different scales, so that the model can realize a detection effect with higher precision under the condition of extremely low parameter quantity.

Description

YOLO-based lightweight target detection method
Technical Field
The invention relates to the field of computer vision target detection, in particular to a light-weight multi-scale target detection method.
Background
Target detection is always a quite important research field in the field of computer vision, and with the development of deep learning technology, a target detection algorithm is also shifted from a traditional algorithm based on manual features to a detection technology based on a deep convolutional neural network. With the gradual improvement of the requirement on detection precision, the difficulty of detection tasks is increased, and more complex large networks are designed, such as SSD, R-CNN and Mask-R-CNN. The number of network parameters is often more than 100M, such as the number of Faster R-CNN model parameters reaches 132M, and the number of AmoebaNet model parameters reaches 209M. Although a larger network model and a deeper network depth mean that the network can extract more deep features to improve the accuracy of the network, huge parameters and calculation amount are introduced, and the network cannot be deployed in some application scenes with limited memory, so that the lightweight target detection network is always a research field of great interest in the industry.
In many mobile scenes, an object detection network is deployed, and not only the calculation complexity of a model and the number of parameters of the model but also the detection accuracy of the model need to be considered. The common methods such as network pruning and network parameter quantification are optimized on the designed network model.
In order to be more suitable for a mobile scene, a light-weight network specially customized for the mobile scene is needed, the problems of limited memory and limited computing power in the mobile scene are solved, and the light-weight target detection network can be well deployed in the mobile scene.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a lightweight target detection method based on YOLO, which can effectively reduce the number of parameters of a network and improve the detection precision of an algorithm.
The technical scheme is as follows: in order to realize the purpose, the invention adopts the technical scheme that:
a lightweight target detection method based on YOLO comprises a feature extraction module, a multi-scale receptive field fusion module and a target detection module; the feature extraction module mainly comprises a 1x1 convolution, a 3x3 convolution and a channel non-shrinking and unwinding block NEP, and a lightweight trunk network with a large receptive field is obtained by stacking the modules and is used for feature extraction; extracting three branches from the extracted semantic features on different scales, sending the three branches into a multi-scale receptive field module for multi-scale fusion, and finally respectively predicting objects with different sizes by using three feature maps with different scales obtained after fusion; and calculating a Loss function by using Distance-IoU Loss, and improving the regression precision of the detection frame to obtain a final target detection network, wherein the method specifically comprises the following steps:
step 1, collecting detection pictures to form a training set.
Step 2, feature extraction: and inputting the training set into a feature extraction module to extract semantic features, extracting three branches from the extracted semantic features on different scales, and sending the three branches into a multi-scale receptive field fusion module. The feature extraction module comprises a first 1x1 convolution, a first 3x3 convolution and a channel non-scaling convolution block NEP which are connected in sequence. The NEP comprises a first layer network, a second layer network, an attention module ECA and a third layer network which are sequentially connected, wherein the first layer network is a first Ghost module, the second layer network is a 3x3 depth separable convolution block, the third layer network is a second Ghost module, the first Ghost module and the second Ghost module respectively comprise a second 1x1 convolution and a second 3x3 depth separable convolution which are sequentially connected, and the first Ghost module and the second Ghost module replace a commonly used 1x1 convolution block.
The Ghost module in the GhostNet network structure is used as the basic module of the network convolution, and the module realizes the function of the conventional 1x1 convolution through the combination of the 1x1 convolution and the deep separable convolution of 3x 3. The invention ensures that the receptive field of the network can be continuously expanded in the process of channel fusion by introducing the 3x3 depth separable convolution, so as to solve the problem of insufficient feature extraction caused by over shallow depth of the lightweight network. Because the Ghost module is used for replacing a commonly used 1x1 volume block, and the depth of 3x3 in the module can be separated into convolutions, the receptive field of the lightweight network can be further improved, and the characteristic extraction performance of the network can be effectively improved.
The attention module ECA learns the channel weights of the channel non-reduced volume block NEP through a weight-shared 1-dimensional convolution on a one-dimensional feature map obtained after global average pooling, wherein the size of a 1-dimensional convolution kernel k × 1 represents the cross-channel information interaction rate of the module, and k is dynamically adjusted along with the change of the number of channels. And then distributing the obtained weight of each channel to each characteristic channel of the channel non-shrinking and unreeling block NEP, finally performing weight characteristic fusion on the channel subjected to weight redistribution, and fusing the obtained weight characteristics to obtain semantic characteristics through a second Ghost module. The method is characterized in that a channel attention module ECA is used, on a one-dimensional characteristic diagram obtained after global average pooling, the weight of each channel is learned through 1-dimensional convolution shared by the weights, and the module replaces a fully-connected channel scaling scheme through a 1-dimensional convolution mode, so that the problem of information loss is solved, and the parameter number of a network is greatly reduced. The invention adopts the module to effectively solve the balance problem of the channel weight of the lightweight neural network. The attention module ECA is used for introducing a channel attention mechanism, the attention module ECA learns the weight of each channel through a 1-dimensional convolution shared by weights on a one-dimensional feature map obtained after global average pooling, and the parameter learning of the important feature channel can be paid more attention to by the lightweight network under the condition that the parameters are limited.
Finally, under a network framework of a YOLO series, a stacked channel non-scaling module NEP is convoluted with a common 1x1 and a common 3x3, features are fused in a mode similar to a feature pyramid network, three-dimensional tensors are output through a plurality of convolutions after the network features are fused, and the three-dimensional tensors with different scales are responsible for predicting target detection frames with different scales.
And adjusting the number of channels by using a Ghost module in the NEP, extracting features by using 3x3 deep separable convolution, sending the output of the network into an attention module ECA to calculate the weight of each channel, distributing the calculated weight to each feature channel, and finally performing feature fusion on the channels after the weight is redistributed by using the Ghost module to obtain complete network output.
Step 3, a multi-scale receptive field fusion module: and the multi-scale receptive field fusion module performs multi-scale fusion according to the semantic features of the three branches to obtain three fused feature maps with different scales.
Step 4, the target detection module: and respectively predicting objects with different sizes by using the three fused feature maps with different scales.
Step 5, a loss calculation module: and calculating a Loss function by adopting Distance-IoU Loss, and improving the regression precision of the detection frame to obtain a final target detection network.
Preferably, the following components: when the feature extraction module uses the channel non-scaling convolution block NEP to carry out down-sampling on the current feature graph, the number of channels is expanded, and the problem of feature information loss caused by down-sampling is solved.
Preferably: when the depth separable convolution step length of the channel non-shrinkage volume block NEP is 2, residual connection is not used. The depth separable convolution step length of the channel non-shrinkage volume block NEP is 1, and residual error connection is added.
Preferably, the following components: the feature graph used by the channel non-shrinking and non-releasing block NEP is not zoomed in the calculation process of the channel dimension, the original channel number is kept unchanged, and the problems of feature information loss caused by the reduction of the channel dimension and excessive parameters caused by the expansion of the channel number are solved.
Preferably: in the step 3, the multi-scale receptive field fusion module uses the output of the network feature extraction module in different scales to fuse the high-level semantic features with smaller scales into the features with smaller receptive fields and larger scales through 1x1 convolution, so that the detection effect on small targets and medium targets is improved.
Preferably: in the step 5, distance-IoU Loss is as follows:
Figure BDA0002745185070000031
wherein L is DIoU Indicating the loss of position of the prediction box, B indicating the prediction box, B gt Representing the real box, b gt Represents the center point of the prediction frame and the center point of the real frame, rho 2 () Representing the euclidean distance and c representing the length of the diagonal of the smallest rectangle containing the predicted box and the real box.
Preferably, the following components: and finally, training the data used by the network in an actual application scene in a centralized manner, and after the training is finished, quantizing the parameters of the final target detection network into 8 bits, so that the size of the model of the network is further reduced, and the model can be deployed in an application scene with limited memory. .
Compared with the prior art, the invention has the following beneficial effects:
the lightweight target detection method provided by the invention uses the Ghost module as a basic channel adjustment and channel feature fusion module, introduces 3x3 deep separable convolution on the basis of common 1x1 convolution, and solves the problems of insufficient receptive field and insufficient semantic features of a lightweight target detection network. And channel weights are redistributed by introducing an ECA module, and the available channel capacity of the lightweight convolution is fully utilized. And the NEP module is ensured not to have scaling in the calculation process, the loss of characteristic information is reduced, and the detection precision of the network is effectively improved. Therefore, the network structure provided by the invention solves the problem of overhigh parameter complexity of the deep convolutional neural network, the precision is improved to a certain extent compared with the current mainstream lightweight target detection algorithm, the size of the model can be further reduced after the parameter precision is quantized to 8 bits, and meanwhile, the high-precision target detection is realized.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 is the Ghost module.
Fig. 3 is an ECA module.
Fig. 4 is a structure of the shrinking-free block NEP of the channel proposed by the present invention, where the left side of the diagram is a convolution step of 2, and the right side of the diagram is a convolution step of 1.
Fig. 5 is a structure designed in the embodiment (a) according to the network framework adopted by the present invention on the basis of the non-shrinking unwinding block of the channel. Wherein the number in parentheses after each module name represents the tensor dimension of the module output.
Detailed Description
The present invention is further illustrated in the accompanying drawings and described in the following detailed description, it is to be understood that such examples are included solely for the purposes of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications of the invention will become apparent to those skilled in the art after reading the present specification, and it is intended to cover all such modifications as fall within the scope of the invention as defined in the appended claims.
A YOLO-based lightweight target detection method is characterized in that a target object contained in an image is identified through a target detection algorithm, and the position of the target object in the image can be output, the YOLO-based lightweight target detection algorithm is used in the example, training is carried out on a training set, the detection effect of a model is verified on a test set, the YOLO-based lightweight target detection method is composed of a repeatedly-stacked channel scaling-free convolution module, the size of the model is greatly reduced through the combination of repeated channel scaling-free convolution blocks, common 1x1 convolution and 3x3 convolution, meanwhile, the weight of channels is redistributed through an ECA structure, and the adaptive learning capacity of each channel for different types of targets is enhanced. The network outputs three feature maps with different scales based on a YOLO series frame, and the three feature maps are respectively responsible for predicting objects with different scales, so that the model can realize a detection effect with higher precision under the condition of extremely low parameter quantity. As shown in fig. 1-5, comprising the steps of:
step 1, collecting detection pictures to form a training set.
Step 2, feature extraction: and inputting the training set into a feature extraction module to extract semantic features, extracting three branches of the extracted semantic features on different scales, and sending the three branches into a multi-scale receptive field fusion module. The feature extraction module comprises a first 1x1 convolution, a first 3x3 convolution and a channel non-scaling convolution block NEP which are connected in sequence. As shown in fig. 4, the channel non-scaling convolution block NEP includes a first layer network, a second layer network, an attention module ECA, and a third layer network, which are connected in sequence, where the first layer network is a first Ghost module, the second layer network is a 3x3 depth separable convolution block, and the third layer network is a second Ghost module, where the first Ghost module and the second Ghost module each include a second 1x1 convolution and a second 3x3 depth separable convolution, which are connected in sequence, and the first Ghost module and the second Ghost module replace a commonly used 1x1 convolution block.
The lightweight target detection algorithm used in this example, as shown in fig. 2, employs the Ghost module in the Ghost network structure as the basic module of the network convolution, and the Ghost module implements the function of the conventional 1x1 convolution by the combination of the 1x1 convolution and the deep separable convolution of 3x 3. The invention ensures that the receptive field of the network can be continuously expanded in the process of channel fusion by introducing the depth separable convolution of 3x3, so as to solve the problem of insufficient feature extraction of the lightweight network caused by over shallow depth.
In order to further improve the detection effect of the lightweight detection network, the limited parameters of the lightweight detection network are fully utilized, as shown in fig. 3, in this example, a channel attention module ECA is used, and on a one-dimensional feature map obtained after global average pooling, the weight of each channel is learned through a 1-dimensional convolution shared by the weights, wherein the size of a 1-dimensional convolution kernel kx 1 represents the cross-channel information interaction rate of the module, and k is dynamically adjusted along with the change of the number of the channels. The module replaces a fully-connected channel scaling scheme through a 1-dimensional convolution mode, so that the problem of information loss is solved, and the number of parameters of a network is greatly reduced. The invention adopts the module to effectively solve the balance problem of the weight of the lightweight network channel.
The attention module ECA learns the channel weights of the channel non-reduction volume block NEP through a weight-sharing 1-dimensional convolution on the one-dimensional feature map obtained after the global average pooling. And then distributing the obtained weight of each channel to each characteristic channel of the channel non-shrinking and unreeling block NEP, finally performing weight characteristic fusion on the channel subjected to weight redistribution, and fusing the obtained weight characteristics to obtain semantic characteristics through a second Ghost module.
In order to make the model more compact, in this example, the above Ghost module and the channel attention module ECA are fused to design a new convolution module NEP without scaling of the number of channels, the convolution module NEP without scaling of the number of channels firstly adjusts the number of channels through the Ghost module, then performs feature extraction by using 3x3 deep separable convolution, then sends the output of the network into the channel attention module ECA to calculate the weight of each channel, then distributes the calculated weight to each feature channel, and finally continues to perform feature fusion on the channel after the weight is redistributed through the Ghost module to obtain the complete network output.
In the embodiment, the non-zooming module NEP of the stacked channel is convoluted with common 1x1 and 3x3, the characteristics are fused in a mode similar to a characteristic pyramid network, the network characteristics are fused according to the weight, three-dimensional tensors are output through a plurality of convolutions, and the three-dimensional tensors with different scales are responsible for predicting target detection frames with different scales.
When the feature extraction module uses the channel non-scaling convolution block NEP to carry out down-sampling on the current feature graph, the number of channels is expanded, and the problem of feature information loss caused by down-sampling is solved.
As shown in fig. 4, when the step size of the depth separable convolution of the channel non-scaled module NEP is 2, residual concatenation is not used; when the depth separable convolution step length is 1 and downsampling is not carried out, residual error connection is added, and the problem that the gradient of a shallow network disappears during network training is solved.
The feature graph used by the NEP is not zoomed in the calculation process, the original channel number is kept unchanged, and the problems of feature information loss caused by channel dimension reduction and excessive parameters caused by channel number expansion are solved.
Step 3, a multi-scale receptive field fusion module: and the multi-scale receptive field fusion module performs multi-scale fusion according to the semantic features of the three branches to obtain three fused feature maps with different scales.
The multi-scale receptive field fusion module fuses the high-level semantic features with smaller scales into the features with smaller receptive fields and larger scales through 1x1 convolution by using the output of different scales of the network feature extraction module, so that the detection effect on small targets and medium targets is improved.
Step 4, the target detection module: and respectively predicting objects with different sizes by using the three fused feature maps with different scales.
Step 5, a loss calculation module: and calculating a Loss function by adopting Distance-IoU Loss, and improving the regression precision of the detection frame to obtain a final target detection network.
The Loss of the network training is mainly classified by three types class Predicting the Loss of position of the frame Loss location Whether each frame contains an object and Loss of accuracy Loss of the frame confidence Loss of position of frame therein location The Distance-IoU is adopted for calculating (1), and the formula is as follows:
Figure BDA0002745185070000061
Distance-IoU Loss adds a penalty term on the basis of the original IoU calculation, wherein b and b are in the formula gt Representing the center point of the prediction box and the center point of the real box. ρ is a unit of a gradient 2 () And c represents the Euclidean distance, and the diagonal length of the minimum rectangle containing the prediction frame and the real frame, so that the newly added penalty term can shorten the center distance between the target frame and the real frame, and can reflect the actual error between the target frame and the real frame.
In this embodiment, as shown in fig. 5, the training step of the model includes:
a network model is built by using a Keras framework, 1 Tesla V100 GPU video card is used for model training, and a Pascal VOC data set is used as a data set. Training by using a Train and val data set (5011 sheets in total) of VOC2007 and a Train and val data set (11540 sheets in total) of VOC2012, then testing by using Test data (4952 sheets in total) of VOC2007, fixing the pixels of an input picture to be 416x416, and using data enhancement including turning, scaling, random clipping and HSV enhancement on the input picture; using an Adam optimizer, an initial learning rate of 0.001, a batch size of 16, 250 epochs were trained. The decline of the learning rate is realized by monitoring the val _ loss, when the val _ loss keeps 10 epochs not to decline any more, the learning rate is reduced to 0.5 time of the original learning rate, and the network training uses a loss function Distance-IoU loss to guide the network optimization. The performance of the network is measured by the 20-class multi-class average accuracy of the VOC, as well as the model size.
The number of the finally trained network model parameters is only 1.54M, the mAP on the VOC2007 test set reaches 72.1%, and the comparison results with some mainstream lightweight target detection networks are as follows:
Model Name Parameter mAP(VOC 2007) FLOPs
Tiny YOLOv2 15.1M 57.1% 6.97B
Tiny YOLOv3 8.4M 58.4% 5.52B
YOLO Nano 4.0M 69.1% 4.67B
ours 1.54M 72.1% 2.69B
according to the data, the number of the obtained network model parameters is the minimum, only 1.54M and close to 1/3 of YOLO Nano; the floating point calculation amount is also only 2.68B. And the detection precision is the highest in all models, and the available capacity of the lightweight network is fully utilized, so that the network realizes better balance between reducing the model parameters and improving the model expressive force.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. A lightweight target detection method based on YOLO is characterized by comprising the following steps:
step 1, collecting detection pictures to form a training set;
step 2, feature extraction: inputting the training set into a feature extraction module to extract semantic features, extracting three branches of the extracted semantic features on different scales, and sending the three branches into a multi-scale receptive field fusion module; the feature extraction module comprises a first 1x1 convolution, a first 3x3 convolution and a channel non-scaling convolution block NEP which are connected in sequence; the channel non-scaling convolution block NEP comprises a first layer network, a second layer network, an attention module ECA and a third layer network which are sequentially connected, wherein the first layer network is a first Ghost module, the second layer network is a 3x3 depth separable convolution block, the third layer network is a second Ghost module, the first Ghost module and the second Ghost module respectively comprise a second 1x1 convolution and a second 3x3 depth separable convolution which are sequentially connected, and the first Ghost module and the second Ghost module replace a commonly used 1x1 convolution block; attention module ECA learns each channel weight of channel non-shrinking volume block NEP through 1-dimensional convolution shared by a weight on a one-dimensional characteristic diagram obtained after global average pooling, wherein the size of 1-dimensional convolution kernel k multiplied by 1 represents the cross-channel information interaction rate of the module, and k can be dynamically adjusted along with the change of the number of channels; then distributing the obtained weight of each channel to each characteristic channel of a channel non-shrinking and unreeling block NEP, finally performing weight characteristic fusion on the channel subjected to weight redistribution, and fusing the obtained weight characteristics to obtain semantic characteristics through a second Ghost module;
step 3, a multi-scale receptive field fusion module: the multi-scale receptive field fusion module performs multi-scale fusion according to the semantic features of the three branches to obtain three fused feature maps with different scales;
step 4, the target detection module: respectively predicting objects with different sizes by using the three fused feature maps with different scales;
step 5, a loss calculation module: and calculating a Loss function by adopting Distance-IoU Loss, and improving the regression precision of the detection frame to obtain a final target detection network.
2. The YOLO-based lightweight target detection method of claim 1, wherein: when the feature extraction module uses the channel non-scaling convolution block NEP to carry out down-sampling on the current feature graph, the number of channels is expanded, and the problem of feature information loss caused by down-sampling is solved.
3. The YOLO-based lightweight target detection method of claim 2, wherein: when the depth separable convolution step length of the channel non-shrinkage and unwinding block NEP is 2, residual connection is not used; the depth separable convolution step length of the channel non-shrinkage volume block NEP is 1, and residual error connection is added.
4. The YOLO-based lightweight target detection method according to claim 3, wherein: the feature graph used by the channel non-shrinking and non-releasing block NEP is not zoomed in the calculation process of the channel dimension, the original channel number is kept unchanged, and the problems of feature information loss caused by the reduction of the channel dimension and excessive parameters caused by the expansion of the channel number are solved.
5. The YOLO-based lightweight target detection method according to claim 4, wherein: in the step 3, the multi-scale receptive field fusion module uses the output of the network feature extraction module in different scales to fuse the high-level semantic features with smaller scales into the features with smaller receptive fields and larger scales through 1x1 convolution, so that the detection effect on small targets and medium targets is improved.
6. The YOLO-based lightweight target detection method according to claim 5, wherein: in the step 5, distance-IoU Loss is as follows:
Figure FDA0002745185060000021
wherein L is DIoU Indicating the loss of position of the prediction box, B indicating the prediction box, B gt Representing the real box, b gt Represents the center point of the prediction frame and the center point of the real frame, rho 2 () Representing the euclidean distance and c representing the length of the diagonal of the smallest rectangle containing the prediction box and the real box.
7. The YOLO-based lightweight target detection method of claim 6, wherein: and 5, quantizing the parameters of the final target detection network into 8 bits.
CN202011164112.2A 2020-10-27 2020-10-27 YOLO-based lightweight target detection method Active CN112257794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011164112.2A CN112257794B (en) 2020-10-27 2020-10-27 YOLO-based lightweight target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011164112.2A CN112257794B (en) 2020-10-27 2020-10-27 YOLO-based lightweight target detection method

Publications (2)

Publication Number Publication Date
CN112257794A CN112257794A (en) 2021-01-22
CN112257794B true CN112257794B (en) 2022-10-28

Family

ID=74261337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011164112.2A Active CN112257794B (en) 2020-10-27 2020-10-27 YOLO-based lightweight target detection method

Country Status (1)

Country Link
CN (1) CN112257794B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819771A (en) * 2021-01-27 2021-05-18 东北林业大学 Wood defect detection method based on improved YOLOv3 model
CN112836751A (en) * 2021-02-03 2021-05-25 歌尔股份有限公司 Target detection method and device
CN112950565A (en) * 2021-02-25 2021-06-11 山东英信计算机技术有限公司 Method and device for detecting and positioning water leakage of data center and data center
CN113052812B (en) * 2021-03-22 2022-06-24 山西三友和智慧信息技术股份有限公司 AmoebaNet-based MRI prostate cancer detection method
CN113112456B (en) * 2021-03-25 2022-05-13 湖南工业大学 Thick food filling finished product defect detection method based on target detection algorithm
CN113011365A (en) * 2021-03-31 2021-06-22 中国科学院光电技术研究所 Target detection method combined with lightweight network
CN112926605B (en) * 2021-04-01 2022-07-08 天津商业大学 Multi-stage strawberry fruit rapid detection method in natural scene
CN113065558B (en) * 2021-04-21 2024-03-22 浙江工业大学 Lightweight small target detection method combined with attention mechanism
CN113421222B (en) * 2021-05-21 2023-06-23 西安科技大学 Lightweight coal gangue target detection method
CN113435269A (en) * 2021-06-10 2021-09-24 华东师范大学 Improved water surface floating object detection and identification method and system based on YOLOv3
CN113536963B (en) * 2021-06-25 2023-08-15 西安电子科技大学 SAR image airplane target detection method based on lightweight YOLO network
CN113688759A (en) * 2021-08-31 2021-11-23 重庆科技学院 Safety helmet identification method based on deep learning
CN114418064B (en) * 2021-12-27 2023-04-18 西安天和防务技术股份有限公司 Target detection method, terminal equipment and storage medium
CN114898171B (en) * 2022-04-07 2023-09-22 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN115880564A (en) * 2022-11-29 2023-03-31 沈阳新松机器人自动化股份有限公司 Lightweight target detection method
CN115661614B (en) * 2022-12-09 2024-05-24 江苏稻源科技集团有限公司 Target detection method based on lightweight YOLO v1

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522966B (en) * 2018-11-28 2022-09-27 中山大学 Target detection method based on dense connection convolutional neural network
CN110796037B (en) * 2019-10-15 2022-03-15 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid
CN110941995A (en) * 2019-11-01 2020-03-31 中山大学 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN111507271B (en) * 2020-04-20 2021-01-12 北京理工大学 Airborne photoelectric video target intelligent detection and identification method

Also Published As

Publication number Publication date
CN112257794A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112257794B (en) YOLO-based lightweight target detection method
CN111626330B (en) Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation
CN110188685B (en) Target counting method and system based on double-attention multi-scale cascade network
CN110175671B (en) Neural network construction method, image processing method and device
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN113435590B (en) Edge calculation-oriented searching method for heavy parameter neural network architecture
CN111126472A (en) Improved target detection method based on SSD
CN110046550B (en) Pedestrian attribute identification system and method based on multilayer feature learning
CN112990116B (en) Behavior recognition device and method based on multi-attention mechanism fusion and storage medium
CN110309835B (en) Image local feature extraction method and device
CN111723915B (en) Target detection method based on deep convolutional neural network
CN114255361A (en) Neural network model training method, image processing method and device
CN110008853B (en) Pedestrian detection network and model training method, detection method, medium and equipment
CN113326930A (en) Data processing method, neural network training method, related device and equipment
CN111860683B (en) Target detection method based on feature fusion
CN115116054B (en) Multi-scale lightweight network-based pest and disease damage identification method
CN111079767B (en) Neural network model for segmenting image and image segmentation method thereof
CN117037215B (en) Human body posture estimation model training method, estimation device and electronic equipment
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN115035418A (en) Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN115601692A (en) Data processing method, training method and device of neural network model
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant