CN113392960B - Target detection network and method based on mixed hole convolution pyramid - Google Patents

Target detection network and method based on mixed hole convolution pyramid Download PDF

Info

Publication number
CN113392960B
CN113392960B CN202110646653.7A CN202110646653A CN113392960B CN 113392960 B CN113392960 B CN 113392960B CN 202110646653 A CN202110646653 A CN 202110646653A CN 113392960 B CN113392960 B CN 113392960B
Authority
CN
China
Prior art keywords
layer
network
module
characteristic
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110646653.7A
Other languages
Chinese (zh)
Other versions
CN113392960A (en
Inventor
殷光强
殷康宁
候少麒
梁杰
丁一寅
曾宇昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110646653.7A priority Critical patent/CN113392960B/en
Publication of CN113392960A publication Critical patent/CN113392960A/en
Application granted granted Critical
Publication of CN113392960B publication Critical patent/CN113392960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of digital image processing, in particular to a target detection network and a method based on a mixed cavity convolution pyramid, wherein the target detection network comprises a backbone network, a mixed receptive field module, a low-layer embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module performs feature enhancement on the highest-layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module fuses high-layer features downwards on the basis of the feature pyramid, and generates a final feature graph to be detected in a low-layer embedded mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result. By the target detection network and the target detection method, the problems of missed detection and wrong detection caused by scale and shielding can be effectively solved.

Description

Target detection network and method based on mixed hole convolution pyramid
Technical Field
The invention relates to the technical field of digital image processing, in particular to a target detection network and a target detection method based on a mixed cavity convolution pyramid.
Background
Object detection is one of the most widespread applications in real life, with the task of focusing on specific objects in the picture. The traditional target detection method can be divided into a single-stage target detection method and a two-stage target detection method, wherein the two-stage method has the core that a region proposing method is adopted, an input image is selectively searched, a region proposing frame is generated, then, a convolutional neural network is used for extracting characteristics of each region proposing frame, and then, a classifier is used for classifying. The single-stage method is to directly output the target detection result through a convolutional neural network.
Through a series of varieties, the common point of the two methods gradually evolves to the method that a large number of Anchor frames are required to be generated in advance in the detection process, and the algorithms are collectively called Anchor-based target detection algorithms. The anchor box is a group of rectangular boxes obtained by utilizing a clustering algorithm on a training set before training, and represents the length and width sizes of the main distribution of the targets in the data set. During reasoning, n candidate rectangular frames are extracted from the anchor frames on the feature diagram, and then the rectangular frames are further classified and regressed. Compared with a two-stage algorithm, the processing of the candidate frame still passes through two steps of foreground coarse classification and multi-class fine classification.
The single-stage target detection algorithm lacks fine processing of a two-stage algorithm, and is not good in performance when the problems of multi-scale and shielding of targets and the like are faced. In addition, although the problem of candidate frame calculation amount explosion caused by selective search is relieved to a certain extent by the Anchor-based algorithm, the generation of a large number of Anchor frames with different sizes in each grid still causes calculation redundancy, and most importantly, the generation of the Anchor frames depends on a large number of super-parameter settings, and the positioning accuracy and the classification effect of the target are seriously influenced by manual parameter adjustment.
In the prior art, a patent with publication number CN110222712A discloses "a multi-item target detection algorithm based on deep learning", the proposed target detection algorithm obtains an augmented RoI set through a multi-scale sliding window and selective search, and takes over the generation of an intensive RoI set through an exhaustive mode with the multi-scale sliding window, which is large in calculation amount and low in efficiency.
The patent publication No. CN112115883A discloses a "method and an apparatus for suppressing non-maximum value based on Anchor-free target detection algorithm", which proposes that a centret network model is used to perform target detection by predicting the upper left corner, the lower right corner and the central point of an object, and a non-maximum value suppression method is used to avoid the situation that there are multiple detection boxes in the same target object, but more complicated post-processing is required to group each pair of corners belonging to the same target, which is inefficient.
The patent with publication number CN112101153A discloses a "remote sensing target detection method based on a receptive field module and a multi-feature pyramid", which performs feature extraction on a visible light remote sensing image through a VGG network to obtain feature maps with different sizes, then performs cascade fusion on the feature maps and obtains an optimized feature map through a step length convolution feature pyramid, and then performs multi-scale output detection through receptive field information mining. The method utilizes the feature maps with different sizes, but the feature map fusion mode is redundant, and the performance of the backbone network is poor, so that the final detection result is influenced.
Disclosure of Invention
In order to solve the technical problems, the invention provides a target detection network and a method based on a mixed cavity convolution pyramid, which can effectively solve the problems of missed detection and false detection caused by scale and shielding.
The invention is realized by adopting the following technical scheme:
a target detection network based on a hybrid void convolution pyramid is characterized in that: the system comprises a backbone network, a mixed reception field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts the characteristics of the target picture by using a hierarchical cascading network structure; the mixed receptive field module is used for carrying out feature enhancement on the highest layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module is used for fusing high-layer features downwards on the basis of a feature pyramid and generating a final feature graph to be detected in a low-layer embedding mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The low-layer embedded characteristic pyramid module is used for generating a final characteristic diagram to be detected, and specifically comprises the following steps:
a. the low-layer embedded characteristic pyramid module fuses the current-layer characteristic graph with the high-layer characteristic graph subjected to channel compression and upsampling to form a composite characteristic graph, and embedding high-layer semantic information is completed;
b. fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information;
c. and (4) generating a final characteristic diagram to be detected after each mixed characteristic diagram passes through the composite convolution layer.
The fusion mode in the step a and the step b is element-by-element channel-by-channel addition.
And the composite convolution layer in the step c is formed by connecting a 3 x 3 convolution layer, a BN layer and a LeakyReLU activation layer.
The mixed receptive field module comprises four parallel branches, including a 1 × 1 convolutional layer branch and three 3 × 3 convolutional layer branches with void rates of 1, 2 and 4 respectively; and after splicing the feature maps obtained by the cavity convolution layers with different cavity rates in parallel by the mixed receptive field module, performing feature information fusion by adopting the convolution layers of 1 multiplied by 1, and reducing the channel dimension to a specified number.
The backbone network is a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced in the prediction of a target, pixel-by-pixel prediction is carried out, and a Centeress branch network is added in a Loss function part.
The feature map output by the backbone network comprises C3, C4 and C5, and the feature map size is 100 × 100, 50 × 50 and 25 × 25.
A target detection method based on a mixed cavity convolution pyramid is characterized in that: the method comprises the following steps:
constructing a backbone network based on an Achor-free mechanism, obtaining characteristic graphs C3, C4 and C5 through the backbone network, and outputting a highest-level characteristic graph C5 output by the backbone network to a low-level embedded characteristic pyramid module after carrying out characteristic enhancement through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
In the training of the network in step iii, the loss function is as follows:
Figure BDA0003110069770000031
wherein p is x,y Representing the class prediction probability, t x,y Expressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
L cls the specific expression form is a Focal local Loss function:
Figure BDA0003110069770000032
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter;
L reg for the GIoU Loss function, the specific calculation process is as follows:
Figure BDA0003110069770000041
Figure BDA0003110069770000042
L reg =1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoU reg
Compared with the prior art, the invention has the beneficial effects that:
1. the invention improves the structure of the characteristic pyramid, provides a low-layer embedded characteristic pyramid module, can effectively solve the problem that target detection is insufficient in processing multi-scale change, fuses shallow-layer characteristic information and high-layer characteristic information, adds normalization processing and activation functions to fused output, and optimizes model training.
The invention designs a mixed reception field module, and increases the reception field to acquire more global feature detail information by utilizing multi-size cavity convolution and combining with the multi-scale output characteristic of the feature pyramid under the condition of controlling the parameter quantity of the model so as to solve the problem of shielding of a target.
The method introduces an Anchor-free mechanism, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and effectively solves the problems of missing detection and the like.
2. According to the invention, the target detection network can solve the multi-scale and shielding problems of a target detection scene, can be used in a plug-and-play manner, introduces an Anchor-free algorithm, combines a low-layer embedded characteristic pyramid module and a mixed receptive field module, can reduce invalid calculation caused by redundant candidate frames, can improve positioning accuracy, and solves the problems of large model parameter quantity, large redundant calculation, low applicability, low efficiency and easy omission in the face of actual conditions in the existing target detection task.
3. The backbone network adopts an echo-free mechanism of introducing FCOS (fuzzy C-operating system) to predict pixel points, target detection is carried out without depending on a predefined anchor frame or an proposed area, invalid calculation brought by a redundant candidate frame is reduced, positioning accuracy is improved, the problems of detection omission and the like are effectively solved, a center mechanism is utilized to quickly filter negative samples, low-quality prediction frames at positions far away from a target center are restrained, the weight of the prediction frames close to the target center is increased, and detection performance is improved. Introducing the Res2Net50 network replaces the single 3 x 3 convolutional layer used in the ResNet50 with a hierarchical cascaded feature set in a given redundancy block that is more optimized in terms of network width, depth and resolution.
4. The hybrid receptive field module is different from other networks which carry out feature processing after feature fusion of multiple layers (C3, C4 and C5) is carried out, but before the feature fusion, the hybrid receptive field module is embedded between C5 and a feature pyramid P5 of a backbone network, the characterization capability of C5 features is improved, and final detection and prediction are carried out after the hybrid receptive field module passes through a low-layer embedded feature pyramid module. The use of the convolutional layers with different void rates improves the adaptability of the model to targets with different scales, after the spliced characteristic diagram, the convolutional layers with 1 multiplied by 1 are adopted for carrying out characteristic information fusion, the channel dimensionality is reduced to the specified number, and the flexibility of the mixed receptive field module is improved.
5. Compared with the characteristic pyramid, the characteristics output by the low-level embedded characteristic pyramid module in the invention not only contain rich semantic information, but also contain specific detail information, thereby realizing double promotion of multi-scale target detection effect and positioning precision.
Drawings
The invention will be described in further detail with reference to the following description taken in conjunction with the accompanying drawings and detailed description, in which:
FIG. 1 is a schematic diagram of the overall structure of a target detection network according to the present invention;
FIG. 2 is a schematic flow chart of a target detection method according to the present invention;
FIG. 3 is a schematic diagram of a hybrid receptor field module according to the present invention;
FIG. 4 is a schematic diagram of a low-level embedded feature pyramid module according to the present invention;
FIG. 5 is a schematic view of a composite convolution layer according to the present invention.
Detailed Description
Example 1
As a basic implementation manner of the invention, the invention comprises a target detection network based on a mixed cavity convolution pyramid, which comprises a backbone network, a mixed receptive field module, a low-level embedded characteristic pyramid module and a detection module. The backbone network extracts target picture features by using a layered cascade network structure; and the mixed receptive field module is used for carrying out characteristic enhancement on the highest-layer characteristic diagram output from the topmost end of the backbone network. And the low-layer embedded characteristic pyramid module is used for fusing high-layer characteristics downwards on the basis of the characteristic pyramid and generating a final characteristic diagram to be detected in a low-layer embedding mode. The detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result.
The backbone network can be a single-stage detection network based on a Res2Net50 network, the feature extraction capability is stronger without increasing the calculation load, an echo-free mechanism of an FCOS is introduced in the aspect of target prediction to predict pixel points, a centerless branch network is added in a Loss function part, a low-quality detection frame is restrained, and the detection performance is improved.
A target detection method based on a mixed hole convolution pyramid comprises the following steps:
constructing a backbone network based on an Achor-free mechanism, obtaining characteristic graphs C3, C4 and C5 through the backbone network, and outputting a highest-level characteristic graph C5 output by the backbone network to a low-level embedded characteristic pyramid module after carrying out characteristic enhancement through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Example 2
As a best implementation mode of the invention, the invention comprises a target detection network based on a hybrid void convolution pyramid, and with reference to the attached drawing 1 of the specification, the target detection network comprises a backbone network, a hybrid reception field module, a low-level embedded feature pyramid module and a detection module.
The backbone network adopts a single-stage detection network structure, introduces an echo-free mechanism of FCOS (fiber channel operating system), performs pixel-by-pixel prediction, does not depend on a predefined anchor frame or a proposed area to perform target detection, reduces invalid calculation caused by redundant candidate frames, improves positioning accuracy, effectively solves the problems of missed detection and the like, utilizes a center mechanism, quickly filters negative samples, inhibits low-quality prediction frames at positions far away from a target center, increases the weight of the prediction frames close to the target center, and improves detection performance. The expression of Centeress is shown in formula (1) < CHEM > * 、r * 、t * 、b * The distances from the pixel points to the left, right, upper and lower parts of the prediction frame are represented, and the values are between 0 and 1, so that the closer the Centeness value to the target real center is, the larger the Centeness value is, and the farther the Centeness value is from the target real center is, the smaller the Centeness value is.
Figure BDA0003110069770000061
The backbone network introduces a Res2Net50 network, using a hierarchical cascaded set of features in a given redundancy block instead of the single 3 x 3 convolutional layer used in the ResNet50, which is more optimized in terms of network width, depth and resolution. When passing through C3, C4 and C5, the feature map sizes are 100 × 100, 50 × 50 and 25 × 25.
The mixed receptive field module is used for splicing the feature maps which are obtained by the cavity convolution layers with different cavity rates in parallel, so that the obtaining capability of the network on the global features is improved, and the grid effect caused by single cavity convolution is compensated. The hybrid receptive field module of the present application uses all the hole convolution layers to effectively solve the target occlusion problem.
Referring to the description and the attached drawing fig. 3, in order to fully exert the performance of the hybrid receptive field module, the hybrid receptive field module of the present invention is different from other networks in that feature fusion is performed after multi-layer (C3, C4, C5) feature fusion is performed, but before feature fusion, the hybrid receptive field module is embedded between C5 and feature pyramid P5 of a backbone network, so as to improve the characterization capability of C5 features, and final detection prediction is performed after the hybrid receptive field module passes through the low-layer embedded feature pyramid module. The mixed receptive field module of the invention is composed of four parallel branches, a convolution layer branch of 1 × 1, and three convolution layer branches of 3 × 3 with void rates of 1, 2 and 4 respectively. 3 x 3 cavity convolution with a cavity rate of 4 can acquire more global context feature details, enhance reasoning capability and solve the problem of target occlusion; and the convolution layers with different void ratios are used, so that the adaptability of the model to targets with different scales is improved.
The high-level features output by the C5 have rich semantic information, and are different from the combination of the conventionally adopted cascade features, and the parallel feature combination adopted by the invention can train the network parameters more suitable for the current data set. After the parallel branch 1 passes through the 1 × 1 convolutional layer, the detailed information of the image can be kept as much as possible under the condition of not changing the size of the feature diagram, and meanwhile, the number of channels of the feature diagram can be controlled, so that the subsequent calculation amount is reduced; the convolution kernel of 3 multiplied by 3 has smaller parameters, which can process the characteristic information and further reduce the calculation of the network; the hole convolution can obtain more global feature detail information, the reasoning capability is enhanced, the shielding target is well identified, and the adaptability of the model to the multi-scale target is improved while the grid effect is eliminated due to the arrangement of different hole rates. The parallel branch 2 is a convolution of 3 x 3 with a void rate equal to 1 and is suitable for detecting small and medium-sized targets, the parallel branch 3 is a convolution of 3 x 3 with a void rate equal to 2 and is suitable for detecting medium-sized targets, and the parallel branch 4 is a convolution of 3 x 3 with a void rate equal to 4 and is suitable for detecting medium and large-sized targets.
After the spliced feature map, feature information fusion is carried out by adopting a convolution layer of 1 multiplied by 1, the channel dimensionality is reduced to a specified number, and the flexibility of the mixed receptive field module is improved.
The feature pyramid enables feature graphs of each layer to have strong semantic information by fusing high-layer features downwards, and prediction can be performed respectively. Compared with a characteristic pyramid, the characteristics output by the low-layer embedded characteristic pyramid module of the application not only contain rich semantic information, but also contain specific detail information, and double promotion of multi-scale target detection effect and positioning accuracy is achieved.
Referring to the specification and the attached figure 4, C5' is a feature diagram after passing through a low-layer embedded feature pyramid module, and referring to the attached figure 5, a composite convolutional layer (formed by cascading a 3 × 3 convolutional layer, a BN layer and a leakyreu active layer) aims at processing fused features, optimizing model training and improving the nonlinear expression capability of the features.
The low-level embedded characteristic pyramid module firstly fuses a current-level characteristic graph and a high-level characteristic graph subjected to channel compression and upsampling in a mode of adding element by element and channel by channel to form a composite characteristic graph and complete the embedding of high-level semantic information; secondly, fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information; and finally, after each mixed feature map is subjected to the designed composite convolution layer, generating a final feature map to be detected and entering the next module.
A target detection method based on a mixed cavity convolution pyramid refers to the attached figure 1 of the specification, and comprises the following steps:
building a backbone network based on an Achor-free mechanism, obtaining feature maps C3, C4 and C5 through the backbone network, and outputting a highest-level feature map C5 output by the backbone network to a low-level embedded feature pyramid module after feature enhancement is carried out on the highest-level feature map C5 output by the backbone network through a mixed receptive field module;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
Wherein, in the process of training the network, the loss function is as follows:
Figure BDA0003110069770000081
wherein p is x,y Representing the class prediction probability, t x,y Expressing regression prediction coordinates, and N expressing the number of positive samples; k is an indication function, if the current prediction is determined to be a positive sample, the current prediction is 1, and if not, the current prediction is 0;
L cls the specific expression form is a Focal local Loss function:
Figure BDA0003110069770000091
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter; compared with the common cross entropy Loss function, the Focal local increases a gamma factor, and the influence of simple samples is reduced by controlling the value of gamma to focus more on difficult samples.
L reg For the GIoU Loss function, the specific calculation process is as follows:
Figure BDA0003110069770000092
Figure BDA0003110069770000093
L reg =1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and their minimum convex set C, i.e. the minimum bounding of bounding A, B, is calculated firstBox, then combine C this minimum convex set to calculate GIoU, and thus L reg
In summary, after reading the present disclosure, those skilled in the art should make various other modifications without creative efforts according to the technical solutions and concepts of the present disclosure, which are within the protection scope of the present disclosure.

Claims (5)

1. A target detection network based on a hybrid void convolution pyramid is characterized in that: the system comprises a backbone network, a mixed receptive field module, a low-level embedded characteristic pyramid module and a detection module; the backbone network extracts target picture features by using a layered cascade network structure; the mixed receptive field module is used for carrying out feature enhancement on the highest-layer feature map output from the topmost end of the backbone network; the low-layer embedded feature pyramid module is used for fusing high-layer features downwards on the basis of a feature pyramid and generating a final feature graph to be detected in a low-layer embedding mode; the detection module is used for positioning and classifying the characteristic diagram to be detected and outputting a result; the backbone network is a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced in the prediction of a target, pixel-by-pixel prediction is carried out, and a Centeress branch network is added in a Loss function part; the mixed receptive field module comprises four parallel branches, including a 1 × 1 convolutional layer branch and three 3 × 3 convolutional layer branches with void rates of 1, 2 and 4 respectively; the mixed receptive field module splices together feature maps obtained by parallel void convolutional layers with different void rates, performs feature information fusion by adopting 1 multiplied by 1 convolutional layers, and reduces the channel dimension to a specified number;
the low-layer embedded characteristic pyramid module is used for generating a final characteristic diagram to be detected, and specifically comprises the following steps:
a. the low-layer embedded characteristic pyramid module fuses the current-layer characteristic graph with the high-layer characteristic graph subjected to channel compression and upsampling to form a composite characteristic graph, and embedding high-layer semantic information is completed;
b. fusing the composite feature map and the downsampled low-level feature map to form a mixed feature map, and completing embedding of low-level detail information;
c. after each mixed characteristic diagram passes through the composite convolution layer, generating a final characteristic diagram to be detected; wherein, the composite convolution layer is formed by connecting a 3 multiplied by 3 convolution layer, a BN layer and a LeakyReLU activation layer.
2. The hybrid hole convolution pyramid-based target detection network of claim 1, wherein: the fusion mode in the step a and the step b is element-by-element channel-by-channel addition.
3. The hybrid hole convolution pyramid-based target detection network of claim 1, wherein: the feature maps output by the backbone network comprise C3, C4 and C5, and the feature maps are 100 × 100, 50 × 50 and 25 × 25 in size.
4. A target detection method based on a mixed cavity convolution pyramid is characterized in that: the method comprises the following steps: constructing a backbone network based on an Achor-free mechanism, obtaining characteristic graphs C3, C4 and C5 through the backbone network, and outputting a highest-level characteristic graph C5 output by the backbone network to a low-level embedded characteristic pyramid module after carrying out characteristic enhancement through a mixed receptive field module; the backbone network can be a single-stage detection network based on a Res2Net50 network, an echo-free mechanism of an FCOS is introduced for predicting targets, pixel-by-pixel prediction is carried out, and a Centeress branch network is added to a Loss function part; the mixed receptive field module consists of four parallel branches, a 1 × 1 convolutional layer branch, and three void rates are respectively 1, 2 and 4 of 3 × 3 convolutional layer branches; the mixed receptive field module splices characteristic graphs obtained by parallel void convolutional layers with different void rates together, performs characteristic information fusion by adopting 1 multiplied by 1 convolutional layers, and reduces the channel dimension to a specified number;
II, forming composite characteristics by the aid of the low-layer embedded characteristic pyramid module and characteristic graphs C4 and C3 output by the main network through up-sampling and down-sampling operations, generating a characteristic graph to be detected after the composite characteristics pass through a composite convolution layer, and conveying the characteristic graph to be detected to a detection module for target positioning and classification tasks; the composite convolutional layer is formed by connecting a 3 multiplied by 3 convolutional layer, a BN layer and a LeakyReLU activation layer;
training the network, testing the model of each round, storing the best training model weight, testing the real-time performance of the mixed receptive field module and the low-level embedded characteristic pyramid module by using a corresponding test set, and training to obtain a network model;
and iv, detecting the target by using the trained network model, and outputting a detection result.
5. The method of claim 4, wherein the mixed-hole convolution pyramid-based target detection method comprises: in the process of training the network in step iii, the loss function is as follows:
Figure FDA0003743639450000021
wherein p is x,y Representing the class prediction probability, t x,y Expressing regression prediction coordinates, and N expressing the number of positive samples; k is an indicator function, and is 1 if the current prediction is determined to be a positive sample, otherwise is 0;
L cls the specific expression form is a Focal local Loss function:
Figure FDA0003743639450000022
wherein y is a sample label, y' predicts the probability that a sample is a positive case, and gamma is a focusing parameter;
L reg for the GIoU Loss function, the specific calculation process is as follows:
Figure FDA0003743639450000023
Figure FDA0003743639450000024
L reg =1-GIoU
where A and B represent the prediction and real boxes, IoU is their intersection-to-parallel ratio, and L is calculated by first calculating their minimum convex set C, i.e., the minimum bounding box bounding A, B, and then combining C with the minimum convex set to calculate GIoU reg
CN202110646653.7A 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid Active CN113392960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110646653.7A CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110646653.7A CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Publications (2)

Publication Number Publication Date
CN113392960A CN113392960A (en) 2021-09-14
CN113392960B true CN113392960B (en) 2022-08-30

Family

ID=77620186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110646653.7A Active CN113392960B (en) 2021-06-10 2021-06-10 Target detection network and method based on mixed hole convolution pyramid

Country Status (1)

Country Link
CN (1) CN113392960B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947774B (en) * 2021-10-08 2024-05-14 东北大学 Lightweight vehicle target detection system
CN113887455B (en) * 2021-10-11 2024-05-28 东北大学 Face mask detection system and method based on improved FCOS
CN113963177A (en) * 2021-11-11 2022-01-21 电子科技大学 CNN-based building mask contour vectorization method
CN113989498B (en) * 2021-12-27 2022-07-12 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN114339049A (en) * 2021-12-31 2022-04-12 深圳市商汤科技有限公司 Video processing method and device, computer equipment and storage medium
CN114283488B (en) * 2022-03-08 2022-06-14 北京万里红科技有限公司 Method for generating detection model and method for detecting eye state by using detection model
CN114693939B (en) * 2022-03-16 2024-04-30 中南大学 Method for extracting depth features of transparent object detection under complex environment
CN115984105B (en) * 2022-12-07 2023-08-01 深圳大学 Hole convolution optimization method and device, computer equipment and storage medium
CN115861855B (en) * 2022-12-15 2023-10-24 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN117132761A (en) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 Target detection method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN109543672A (en) * 2018-10-15 2019-03-29 天津大学 Object detecting method based on dense characteristic pyramid network
CN111260630A (en) * 2020-01-16 2020-06-09 高新兴科技集团股份有限公司 Improved lightweight small target detection method
CN112365501A (en) * 2021-01-13 2021-02-12 南京理工大学 Weldment contour detection algorithm based on convolutional neural network
CN112819748A (en) * 2020-12-16 2021-05-18 机科发展科技股份有限公司 Training method and device for strip steel surface defect recognition model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070729B (en) * 2020-08-26 2023-07-07 西安交通大学 Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112183649A (en) * 2020-09-30 2021-01-05 佛山市南海区广工大数控装备协同创新研究院 Algorithm for predicting pyramid feature map
CN112419237B (en) * 2020-11-03 2023-06-30 中国计量大学 Deep learning-based automobile clutch master cylinder groove surface defect detection method
CN112446327B (en) * 2020-11-27 2022-06-07 中国地质大学(武汉) Remote sensing image target detection method based on non-anchor frame
CN112651351B (en) * 2020-12-29 2022-01-04 珠海大横琴科技发展有限公司 Data processing method and device
CN112801117B (en) * 2021-02-03 2022-07-12 四川中烟工业有限责任公司 Multi-channel receptive field guided characteristic pyramid small target detection network and detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN109543672A (en) * 2018-10-15 2019-03-29 天津大学 Object detecting method based on dense characteristic pyramid network
CN111260630A (en) * 2020-01-16 2020-06-09 高新兴科技集团股份有限公司 Improved lightweight small target detection method
CN112819748A (en) * 2020-12-16 2021-05-18 机科发展科技股份有限公司 Training method and device for strip steel surface defect recognition model
CN112365501A (en) * 2021-01-13 2021-02-12 南京理工大学 Weldment contour detection algorithm based on convolutional neural network

Also Published As

Publication number Publication date
CN113392960A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN113392960B (en) Target detection network and method based on mixed hole convolution pyramid
CN111967305B (en) Real-time multi-scale target detection method based on lightweight convolutional neural network
CN114627360B (en) Substation equipment defect identification method based on cascade detection model
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN114049584A (en) Model training and scene recognition method, device, equipment and medium
CN111768388B (en) Product surface defect detection method and system based on positive sample reference
CN111126472A (en) Improved target detection method based on SSD
CN113052834B (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN114943963A (en) Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN112906718A (en) Multi-target detection method based on convolutional neural network
CN111429466A (en) Space-based crowd counting and density estimation method based on multi-scale information fusion network
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN113850324B (en) Multispectral target detection method based on Yolov4
CN116503318A (en) Aerial insulator multi-defect detection method, system and equipment integrating CAT-BiFPN and attention mechanism
CN112528904A (en) Image segmentation method for sand particle size detection system
CN112183649A (en) Algorithm for predicting pyramid feature map
CN115223009A (en) Small target detection method and device based on improved YOLOv5
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN112507849A (en) Dynamic-to-static scene conversion method for generating countermeasure network based on conditions
CN113901928A (en) Target detection method based on dynamic super-resolution, and power transmission line component detection method and system
CN114782298A (en) Infrared and visible light image fusion method with regional attention
CN114170526A (en) Remote sensing image multi-scale target detection and identification method based on lightweight network
CN112700450A (en) Image segmentation method and system based on ensemble learning
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant