CN113869412B - Image target detection method combining lightweight attention mechanism and YOLOv network - Google Patents

Image target detection method combining lightweight attention mechanism and YOLOv network Download PDF

Info

Publication number
CN113869412B
CN113869412B CN202111141568.1A CN202111141568A CN113869412B CN 113869412 B CN113869412 B CN 113869412B CN 202111141568 A CN202111141568 A CN 202111141568A CN 113869412 B CN113869412 B CN 113869412B
Authority
CN
China
Prior art keywords
convolution
attention mechanism
network
yolov
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111141568.1A
Other languages
Chinese (zh)
Other versions
CN113869412A (en
Inventor
段运生
檀怡
竺德
孙冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
CERNET Corp
Original Assignee
Anhui University
CERNET Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University, CERNET Corp filed Critical Anhui University
Priority to CN202111141568.1A priority Critical patent/CN113869412B/en
Publication of CN113869412A publication Critical patent/CN113869412A/en
Application granted granted Critical
Publication of CN113869412B publication Critical patent/CN113869412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image target detection method combining a lightweight attention mechanism and a YOLOv network, which comprises a training process of a target detection algorithm of the lightweight attention mechanism and the YOLOv network, wherein the lightweight attention mechanism and the YOLOv network are combined by the algorithm to improve the feature extraction capability, a depth separable convolution module is combined into the YOLOv network, the efficiency of the algorithm is improved, the detection precision is further improved, a multiscale fusion method is used in the traditional YOLOv3 network, the feature extraction capability of a model is improved, the performance of the model is further improved, and a target detection method with higher recognition degree is designed by combining the lightweight attention mechanism, the depth separable convolution and the multiscale fusion method into the YOLOv network, so that the task of target detection in an image can be effectively completed, the feature of the image is automatically extracted, and the efficiency is improved, and meanwhile the detection precision is higher.

Description

Image target detection method combining lightweight attention mechanism and YOLOv network
Technical Field
The invention relates to the technical field of target detection algorithm research in the field of computer vision, in particular to an image target detection method combining a lightweight attention mechanism and YOLOv networks.
Background
Target detection is an important branch of the field of computer vision research and is one of the most fundamental problems in computer vision research. Target detection is the identification of the spatial location or coverage of a specified category (such as people, dogs, zebra, elephants and cars) in some specific images. In addition, in the research of artificial intelligence and information technology, target detection is also important, especially in the aspects of robot vision, face recognition, automatic driving, intelligent monitoring and the like. Challenges encountered in target detection today include both accuracy and efficiency, and how to improve efficiency while ensuring accuracy is a major aspect of current research. According to the existing target detection algorithm, two main classes can be classified, one is a single-stage detection framework (One-stage Detectors) and the other is a Two-stage detection framework (Two-stage Detectors). The monopole type target detection method directly calculates on the complete image to complete detection, and the two-stage type target detection method firstly carries out pretreatment on the image to extract some candidate frames, and then carries out correction on the pretreated image to obtain a final detection result. In contrast, two-stage target detection is more accurate, but slower. The commonly used two-stage target detection method comprises a regional convolution neural network algorithm (R-CNN), a Fast regional convolution neural network algorithm (Fast-R-CNN), a spatial pyramid pooling algorithm (SPP-Net) in a deep convolution network, a multi-region convolution neural network (MR-CNN) and the like. Wherein, R-CNN is used as a basic stone of a two-stage target detection algorithm and is the most representative two-stage target detection algorithm. In the preprocessing stage, a selective search algorithm is utilized to select a candidate frame of interest, and then the spatial position of a target is positioned through a convolutional neural network, a support vector machine and a regression method. In addition, common monopolar target detection includes Single Shot MultiBox Detector (SSD), YOLO, YOLOv2, and YOLOv. Currently, the research directions of researchers on target detection algorithms can be roughly divided into three types, namely, an improved two-stage target detection algorithm, an improved monopole type target detection method and an algorithm combining the monopole type target detection algorithm and the two-stage target detection algorithm. The above algorithm has been shown to perform well in the target detection study, but can be further improved, so we propose a target detection method that combines a lightweight attention mechanism with YOLOv networks.
The R-CNN algorithm is one of two-stage models, and is also the earliest proposed deep learning method for target detection. Firstly, a selective search algorithm is used for searching the feature similarity of adjacent image blocks in an original image, then scoring the similar image blocks, selecting candidate frames of an interested image area to be input into a trained CNN as samples, entering a full-connection layer after feature extraction, and training an SVM classifier and a linear regression prediction model to complete a final target detection task.
Although the R-CNN algorithm has a certain improvement over the conventional target detection method, and the trained CNN after use also has a good effect in the image feature extraction method, the operation time of the algorithm is increased since the generation of the candidate region in the first stage of the R-CNN algorithm is conventional, and when there are a large number of candidate regions in an image, the front propagation calculation of the CNN is multiplied, because each candidate region performs feature extraction once, and the operation time is greatly increased. Thus, the method is applicable to a variety of applications. These repeated operations of the R-CNN algorithm limit the performance of the algorithm.
The Fast-R-CNN algorithm is an improved target detection algorithm based on the R-CNN algorithm, and the main purpose of the Fast-R-CNN algorithm is to optimize the running time of the R-CNN algorithm. Similar to the R-CNN algorithm, the main idea of the Fast-R-CNN algorithm is a method for generating a suggested region, but the difference is that a candidate box is not in an afferent neural network, but is directly used as an input of a convolutional neural network, so that the feature extraction operation is realized. And according to the relation between the region and the extracted features, fusing in a pooling layer. In summary, the most important improvement of Fast-R-CNN is to propose the idea of pooling layers and parallel multitasking training of the region of interest.
The Fast-R-CNN algorithm has several disadvantages:
1) The Fast-R-CNN algorithm is the same as R-CNN, and the interested region needs to be selected and then the feature extraction operation is carried out, so that the process can only be carried out on a CPU, and a great amount of time is wasted.
2) Because of the limitation of the running time, the Fast-R-CNN algorithm cannot be used in real-time application, and the end-to-end training test is not really realized.
The SSD algorithm is a one-stage target detection algorithm, and the feature extractor adopted by the SSD algorithm is a VGG-16 network. When an image is input, the SSD algorithm firstly carries out convolution operation by utilizing a plurality of convolution layers, so that a plurality of feature images with different sizes are obtained, local feature information in the feature images is estimated by utilizing a convolution kernel, and meanwhile, the spatial position information and the classification probability of a target to be detected are also calculated. In addition, the SSD algorithm acts on many position areas of the image and the sizes of bounding boxes of detection results are inconsistent, which causes some redundant boxes to appear, in order to solve the problem, the SSD algorithm also adds a non-maximum suppression technology to merge bounding boxes with high overlapping degree, and also introduces a hard negative sample mining technology to keep the balance of positive and negative samples.
The SSD algorithm has the following disadvantages:
1) When the SSD algorithm performs feature extraction, the features of the features are fewer, and the processing of samples with lower resolution is often not good.
2) The setting of certain parameters in the SSD algorithm is considered to be set and cannot be obtained through training, so that the debugging process is very dependent on experience and has certain randomness, and the generalization capability is poor.
The Fast-R-CNN algorithm is an algorithm obtained by optimizing on the basis of a Fast-R-CNN model, and is therefore also a two-stage target detection method. The method combines the regional suggestion generation module with the Fast-R-CNN module to finish the task of target detection. The Fast-R-CNN module is used for completing the feature mapping of the input image and extracting the features on the basis of the feature mapping. The strategy adopted by the region proposal generating module is a sliding window method, a plurality of candidate regions are generated on the characteristic image after convolution operation, and finally the candidate regions are transferred to a full-connection layer through an ROI pooling layer for final fusion operation. Therefore, the Faster-R-CNN algorithm realizes end-to-end training, and improves the detection efficiency of the model.
The Faster-R-CNN algorithm has the following disadvantages:
1) Because the fast-R-CNN algorithm divides the training process into two phases, the real-time requirement cannot be met in efficiency.
2) For small target detection, the fast-R-CNN algorithm performs poorly, most importantly because its final prediction uses a single deep feature map, resulting in poor generalization ability at different scales.
Disclosure of Invention
The invention aims to provide an image target detection method combining a lightweight attention mechanism and YOLOv networks, which can effectively complete the task of target detection in images, automatically extract the characteristics of pictures and has higher detection precision while improving the efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions: an image target detection method combining a lightweight attention mechanism and a YOLOv network is characterized by comprising the training process of a target detection algorithm of the lightweight attention mechanism and the YOLOv network:
The training process of the lightweight attention mechanism and the object detection algorithm of YOLOv network is divided into two phases: the first stage is to perform feature extraction on an input image under multiple scales, wherein the feature extraction comprises a residual structure of depth separable convolution and an attention mechanism; the second stage is to fuse the multi-scale features trained in the previous stage and finally output a predicted image, wherein the specific training process is as follows:
the first step: initializing a weight value by a network;
And a second step of: the input image is subjected to multi-scale feature extraction;
and a third step of: under multiple scales, obtaining a downsampled characteristic map of the multiple scales through a depth separable convolution layer and a residual error module of an attention mechanism;
Fourth step: the characteristics of each scale are output and predicted through a convolution layer;
fifth step: and fusing the output predictions of the multi-scale features to form a final prediction model.
Preferably, the method includes a depth separable convolution structure, wherein the depth separable structure is a part of realizing a feature extraction function, and is a key module for realizing a lightweight design, because in standard convolution, convolution operation and combination of feature channels are performed simultaneously, the two parts are separated, namely, the two parts are divided into a depth convolution process and a point convolution process, the calculated amount and the parameter number in the convolution process are reduced greatly through the convolution process after grouping, and the purpose of lightweight is achieved, and further, for input features: d F×DF ×m is first decomposed into two convolution processes, namely a depth convolution process and a point-by-point convolution process, then the calculated amount is written as D K×DK ×1×m depth convolution process and 1×1×m×n point-by-point convolution process, then the calculated amount is written as
O1(n)=DK·DK·M·DF·DF+M·N·DF·DF
For the traditional standard convolution process, the calculated amount under the same input is that
O2(n)=DK·DK·M·N·DF·DF
As a result of the comparison it was found that,
When the convolution kernel size is 3×3, the calculation amount is reduced by nearly 9 times by adopting the depth separable convolution, so that the efficiency of the model is effectively improved.
Preferably, the residual structure of the attention mechanism is another part of the feature extraction process, and is used for improving the feature extraction performance on the backbone network, as an input feature image U, firstly performing a point convolution operation, then performing a depth convolution operation with the size of 3×3 to obtain a graph F after feature extraction, then collecting the attention mechanism SE-Block module to obtain a new graph F1, and finally summing the graphs F and F1 to obtain a final output feature graph V, wherein the attention mechanism can optimize the connection of a channel domain and a spatial domain and can induce the feature extraction network to learn a region of interest,
Wherein: f tr (, θ) represents a convolution mapping operation, specifically:
Ftr:X→U,X∈RH′×W′×C′,U∈RH×W×C
Wherein: f sq (·) represents a squeze operation, i.e. a compression operation, in particular
Wherein: f ex (. Cndot.w) represents the expression operation, i.e. the Excitation operation, in particular
Fex(z,W)=σ(g(z,W))=σ(W2ReLU(W1z))
Wherein: z is the output after the compression operation, the activation function takes Sigmoid, and R is a super parameter and the final output is written as X% = F scale (u, s) = s·u, where u and s are the output of the convolution operation and the output of the excitation operation, respectively.
Preferably, the method comprises the step of adopting a common cross entropy function as the loss function of the prediction model, wherein the specific difference between the predicted value and the true value is calculated by cross entropy, and the cross entropy is expressed as follows:
wherein y represents a real label, y' represents the probability that a sample belongs to a certain class, and in order to further balance the problem of weight distribution of a difficult sample in actual detection, the whole loss function expression of the improved network is as follows:
Preferably, the SE-Block module calibrates the feature relationships in the network through compression and excitation processes, increasing the effective weight and decreasing the ineffective or less effective weight.
Preferably, the depth separable convolution corresponds to the conv2d operation of the operator stage.
Preferably, the residual module of the attention mechanism is bneck operations in the operator stage.
Compared with the prior art, the invention has the following beneficial effects:
The method combines a lightweight attention mechanism and YOLOv networks to improve feature extraction capability, a depth separable convolution module is combined into YOLOv networks, the algorithm efficiency is improved, the detection accuracy is further improved, a multi-scale fusion method is used in a traditional YOLOv network, the feature extraction capability of the model is improved, the performance of the model is further improved, and a target detection method with higher recognition degree is designed by combining the lightweight attention mechanism, the depth separable convolution and the multi-scale fusion method into YOLOv networks, so that the task of target detection in an image can be effectively completed, the features of the image are automatically extracted, and the efficiency is improved, and meanwhile, the detection accuracy is higher.
Drawings
FIG. 1 is a training process for the lightweight attention mechanism and object detection of YOLOv networks of the present invention;
FIG. 2 is a sample of a face object detection image of the present invention;
FIG. 3 is a diagram of a depth separable convolution structure of the present invention;
FIG. 4 is a residual structure of the attention mechanism of the present invention;
FIG. 5 is a schematic view of the SE-Block structure;
FIG. 6 shows the variation curves of different models of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-6, a method for detecting an image target by combining a lightweight attention mechanism and YOLOv networks includes a training procedure of the lightweight attention mechanism and a target detection algorithm of YOLOv networks:
The training process of the lightweight attention mechanism and the object detection algorithm of YOLOv network is divided into two phases: the first stage is to perform feature extraction on an input image under multiple scales, wherein the feature extraction comprises a residual structure of depth separable convolution and an attention mechanism; the second stage is to fuse the multi-scale features trained in the previous stage and finally output a predicted image, wherein the specific training process is as follows:
the first step: initializing a weight value by a network;
And a second step of: the input image is subjected to multi-scale feature extraction;
and a third step of: under multiple scales, obtaining a downsampled characteristic map of the multiple scales through a depth separable convolution layer and a residual error module of an attention mechanism;
Fourth step: the characteristics of each scale are output and predicted through a convolution layer;
fifth step: and fusing the output predictions of the multi-scale features to form a final prediction model.
In this embodiment, the method includes a depth separable convolution structure, where the depth separable structure is a part of realizing a feature extraction function, and is a key module for realizing a lightweight design, because in standard convolution, convolution operation and combination of feature channels are performed simultaneously, and the two parts are performed separately, that is, the two parts are divided into a depth convolution process and a point convolution process, and by the convolution process after grouping, the calculation amount and the parameter number in the convolution process are reduced greatly, so that the purpose of lightweight is achieved, and further, for input features: d F×DF ×m is first decomposed into two convolution processes, namely a depth convolution process and a point-by-point convolution process, then the calculated amount is written as D K×DK ×1×m depth convolution process and 1×1×m×n point-by-point convolution process, then the calculated amount is written as
O1(n)=DK·DK·M·DF·DF+M·N·DF·DF
For the traditional standard convolution process, the calculated amount under the same input is that
O2(n)=DK·DK·M·N·DF·DF
As a result of the comparison it was found that,
When the convolution kernel size is 3×3, the calculation amount is reduced by nearly 9 times by adopting the depth separable convolution, so that the efficiency of the model is effectively improved.
In this embodiment, the residual structure of the attention mechanism is another part of the feature extraction process, and is used to improve the feature extraction performance on the backbone network, as an input feature image U, firstly perform a point convolution operation, then perform a depth convolution operation with a size of 3×3 to obtain a graph F after feature extraction, then aggregate the attention mechanism SE-Block to obtain a new graph F1, and finally sum the graphs F and F1 to obtain a final output feature graph V, where the attention mechanism can optimize the association of the channel domain and the spatial domain, and can induce the feature extraction network to learn the region of interest,
Wherein: f tr (, θ) represents a convolution mapping operation, specifically:
Ftr:X→U,X∈RH′×W′×C′,U∈RH×W×C
Wherein: f sq (·) represents a squeze operation, i.e. a compression operation, in particular
Wherein: f ex (. Cndot.w) represents the expression operation, i.e. the Excitation operation, in particular
Fex(z,W)=σ(g(z,W))=σ(W2ReLU(W1z))
Wherein: z is the output after the compression operation, the activation function takes Sigmoid, and R is a super parameter and the final output is written as X% = F scale (u, s) = s·u, where u and s are the output of the convolution operation and the output of the excitation operation, respectively.
In this embodiment, including the loss function of the prediction model, we use a common cross entropy function as the loss function of the prediction model, and the difference between the specific predicted value and the actual value uses cross entropy calculation, where the expression of the cross entropy is as follows:
wherein y represents a real label, y' represents the probability that a sample belongs to a certain class, and in order to further balance the problem of weight distribution of a difficult sample in actual detection, the whole loss function expression of the improved network is as follows:
In this embodiment, the SE-Block module calibrates the feature relationships in the network by compression and excitation processes, increasing the effective weight and decreasing the ineffective or less effective weight.
In this embodiment, the depth separable convolution corresponds to the conv2d operation of the operator stage.
In this embodiment, the residual module of the attention mechanism is then bneck operations in the operator stage.
The light-weight attention mechanism is combined with YOLOv networks to improve the feature extraction capability, the depth separable convolution module is combined with YOLOv networks, the algorithm efficiency is improved, the detection precision is further improved, the multi-scale fusion method is used in the traditional YOLOv networks, the feature extraction capability of the model is improved, the performance of the model is further improved, the light-weight attention mechanism, the depth separable convolution and the multi-scale fusion method are combined with YOLOv networks, a target detection method with higher recognition degree is designed, the task of target detection in an image can be effectively completed, the features of the image are automatically extracted, and the detection precision is higher while the efficiency is improved.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. An image target detection method combining a lightweight attention mechanism and a YOLOv network is characterized by comprising the training process of a target detection algorithm of the lightweight attention mechanism and the YOLOv network:
The training process of the lightweight attention mechanism and the object detection algorithm of YOLOv network is divided into two phases: the first stage is to perform feature extraction on an input image under multiple scales, wherein the feature extraction comprises a depth separable convolution structure and a residual structure of an attention mechanism; the second stage is to fuse the multi-scale features trained in the previous stage and finally output a predicted image, wherein the specific training process is as follows:
the first step: initializing a weight value by a network;
And a second step of: the input image is subjected to multi-scale feature extraction;
and a third step of: under multiple scales, obtaining a downsampled characteristic map of the multiple scales through a depth separable convolution layer and a residual error module of an attention mechanism;
Fourth step: the characteristics of each scale are output and predicted through a convolution layer;
Fifth step: fusing the output predictions of the multi-scale features to form a final prediction model,
The depth separable convolution structure is a part for realizing the feature extraction function, and is a key module for realizing the lightweight design, because in standard convolution, convolution operation and combination of feature channels are performed simultaneously, the two parts are separated, namely the depth convolution and the point convolution process, the calculated amount and the parameter number in the convolution process are reduced greatly through the convolution process after grouping, and the purpose of lightweight is further achieved, and for the input features: d F×DF ×m is first decomposed into two convolution processes, namely, a depth convolution process and a point-by-point convolution process, then the calculated amount is written as D K×DK ×1×m depth convolution process and 1×1×m×n point-by-point convolution process, namely, the calculated amount is
O1(n)=DK·DK·M·DF·DF+M·N·DF·DF
Wherein the residual structure of the attention mechanism is another part of the feature extraction process, and is used for improving the feature extraction performance on the backbone network, as an input feature image U, firstly performing point convolution operation, then performing depth convolution operation with the size of 3×3 to obtain a graph F after feature extraction, then collecting the attention mechanism SE-Block module to obtain a new graph F1, and finally summing the graphs F and F1 to obtain a final output feature graph V, specifically, the attention mechanism can optimize the connection of a channel domain and a space domain and can induce the feature extraction network to learn the region of interest,
Wherein: f tr (, θ) represents a convolution mapping operation, specifically:
Ftr:X→U,X∈RH′×W′×C′,U∈RH×W×C
Wherein: f sq (·) represents a squeze operation, i.e. a compression operation, in particular
Wherein: f ex (. Cndot.w) represents the expression operation, i.e. the Excitation operation, in particular
Fex(z,W)=σ(g(z,W))=σ(W2ReLU(W1z))
Wherein: z is the output after the compression operation, the activation function takes Sigmoid, and R is a superparameter and the final output is written as/>Where u and s are the output of the convolution operation and the output of the excitation operation, respectively.
2. The method for detecting an image object by combining a lightweight attention mechanism and YOLOv networks according to claim 1, wherein the method comprises a loss function of a prediction model, a common cross entropy function is adopted as the loss function of the prediction model, a specific difference between a predicted value and a true value is calculated by adopting cross entropy, and the cross entropy is expressed as follows:
wherein y represents a real label, y' represents the probability that a sample belongs to a certain class, and in order to further balance the problem of weight distribution of a difficult sample in actual detection, the whole loss function expression of the improved network is as follows:
3. The method of claim 1, wherein the SE-Block module calibrates the feature relationships in the network by compression and excitation processes to increase the effective weight and decrease the ineffective or less effective weight.
4. The method of image object detection in combination with a lightweight attention mechanism and YOLOv network of claim 1, wherein the depth separable convolution corresponds to the operator phase conv2d operation.
5. The method for image object detection in combination with a lightweight attention mechanism and YOLOv network as recited in claim 1, wherein the residual module of the attention mechanism is then bneck operation of the operator stage.
CN202111141568.1A 2021-09-28 2021-09-28 Image target detection method combining lightweight attention mechanism and YOLOv network Active CN113869412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111141568.1A CN113869412B (en) 2021-09-28 2021-09-28 Image target detection method combining lightweight attention mechanism and YOLOv network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111141568.1A CN113869412B (en) 2021-09-28 2021-09-28 Image target detection method combining lightweight attention mechanism and YOLOv network

Publications (2)

Publication Number Publication Date
CN113869412A CN113869412A (en) 2021-12-31
CN113869412B true CN113869412B (en) 2024-06-07

Family

ID=78991824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111141568.1A Active CN113869412B (en) 2021-09-28 2021-09-28 Image target detection method combining lightweight attention mechanism and YOLOv network

Country Status (1)

Country Link
CN (1) CN113869412B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063478B (en) * 2022-05-30 2024-07-12 华南农业大学 Fruit positioning method, system, equipment and medium based on RGB-D camera and visual positioning
CN117809294B (en) * 2023-12-29 2024-07-19 天津大学 Text detection method based on feature correction and difference guiding attention

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3
CN112232214A (en) * 2020-10-16 2021-01-15 天津大学 Real-time target detection method based on depth feature fusion and attention mechanism
CN112396002A (en) * 2020-11-20 2021-02-23 重庆邮电大学 Lightweight remote sensing target detection method based on SE-YOLOv3
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3
CN112232214A (en) * 2020-10-16 2021-01-15 天津大学 Real-time target detection method based on depth feature fusion and attention mechanism
CN112396002A (en) * 2020-11-20 2021-02-23 重庆邮电大学 Lightweight remote sensing target detection method based on SE-YOLOv3

Also Published As

Publication number Publication date
CN113869412A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN111259786A (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN104484890B (en) Video target tracking method based on compound sparse model
CN113869412B (en) Image target detection method combining lightweight attention mechanism and YOLOv network
CN110610210B (en) Multi-target detection method
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN117252904B (en) Target tracking method and system based on long-range space perception and channel enhancement
CN117496384B (en) Unmanned aerial vehicle image object detection method
CN111738074B (en) Pedestrian attribute identification method, system and device based on weak supervision learning
CN114882351A (en) Multi-target detection and tracking method based on improved YOLO-V5s
CN115731517B (en) Crowded Crowd detection method based on crown-RetinaNet network
Guo et al. ANMS: attention-based non-maximum suppression
CN115880660A (en) Track line detection method and system based on structural characterization and global attention mechanism
Li et al. Research on YOLOv3 pedestrian detection algorithm based on channel attention mechanism
Li et al. PaFPN-SOLO: A SOLO-based image instance segmentation algorithm
Guo et al. Udtiri: An open-source road pothole detection benchmark suite
Ajith et al. Pedestrian detection: performance comparison using multiple convolutional neural networks
Narmadha et al. Robust Deep Transfer Learning Based Object Detection and Tracking Approach.
Jain et al. An improved traffic flow forecasting based control logic using parametrical doped learning and truncated dual flow optimization model
CN114463732A (en) Scene text detection method and device based on knowledge distillation
Ranjbar et al. Scene novelty prediction from unsupervised discriminative feature learning
Li et al. Pedestrian Motion Path Detection Method Based on Deep Learning and Foreground Detection
Tang et al. Rapid forward vehicle detection based on deformable Part Model
Lu Deep learning for object detection in video
Li et al. Research on efficient detection network method for remote sensing images based on self attention mechanism
Wang et al. A fast-training GAN for coal–gangue image augmentation based on a few samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant