CN113111736A - Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN - Google Patents

Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN Download PDF

Info

Publication number
CN113111736A
CN113111736A CN202110325504.0A CN202110325504A CN113111736A CN 113111736 A CN113111736 A CN 113111736A CN 202110325504 A CN202110325504 A CN 202110325504A CN 113111736 A CN113111736 A CN 113111736A
Authority
CN
China
Prior art keywords
output
feature
layer
convolution
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110325504.0A
Other languages
Chinese (zh)
Inventor
包晓安
马铉钧
包梓群
邵一鸣
马云龙
许铭洋
张娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110325504.0A priority Critical patent/CN113111736A/en
Publication of CN113111736A publication Critical patent/CN113111736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multilevel characteristic pyramid target detection method based on depth separable convolution and fusion PAN, and belongs to the field of target detection. The method comprises the following steps: 1) data acquisition: acquiring video data of a target to be detected, slicing the video data, and converting continuous video data into continuous images; 2) preprocessing the image; 3) performing target detection on the preprocessed image, and acquiring a multi-scale fusion feature map of the image by utilizing a multi-level feature pyramid network of depth separable convolution and fusion PAN; 4) and predefining a multi-length-width ratio and a multi-scale detection frame according to the size of the receptive field of the multi-scale fusion characteristic diagram, and completing the positioning and classification of the targets by using the detection frame to realize the high-precision detection of the multi-scale targets. The invention improves the characteristic pyramid, deepens the network parameters under the condition of reducing the parameter quantity and the calculated quantity, and obtains the multi-scale fusion characteristic, thereby improving the accuracy and the efficiency of detecting the target.

Description

Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN
Technical Field
The invention belongs to the field of target detection, and particularly relates to a multilevel characteristic pyramid target detection method based on depth separable convolution and fusion PAN.
Background
With the improvement of computer computing power, especially the application of a graphics processor and the development of deep learning technology, the convolutional neural network develops rapidly in the field of target detection. Since the 21 st century, video image processing, which is closely related to computer computing, has also been greatly developed. However, the huge calculation amount and parameter amount have great influence on the calculation amount of image processing, and the traditional convolutional neural network and the characteristic pyramid network are difficult to process the pictures of the multi-target object. For some images with serious occlusion, blurred images and different sizes of target objects in the images due to the distance, the traditional pyramid network not only needs a lot of parameters and calculated amount, but also the detected result is not necessarily accurate.
Aiming at the problem of video image target detection, the multi-stage characteristic pyramid detection method based on the depth separable convolution and fusion PAN structure provided by the invention not only can reduce the calculated amount and parameter amount to a great extent, but also can improve the performance of detecting a target object.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multi-level characteristic pyramid target detection method based on depth separable convolution and fusion PAN, which is used for processing the video and image target detection problems.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a multilevel characteristic pyramid target detection method based on depth separable convolution and fusion PAN comprises the following steps:
1) data acquisition: acquiring video data of a target to be detected, slicing the video data, and converting continuous video data into continuous images;
2) preprocessing the image;
3) performing target detection on the preprocessed image, and acquiring a multi-scale fusion feature map of the image by utilizing a multi-level feature pyramid network of depth separable convolution and fusion PAN;
the multi-level characteristic pyramid network with the depth separable convolution and the fusion PAN comprises a backbone network and a multi-level FPN network with a PAN structure; the backbone network downsamples an input image to obtain feature maps with different sizes, depth separable convolution is adopted in each downsampling, the feature maps are fused through upsampling to obtain a fused feature map containing features with different depths, and the fused feature map is sent to a multi-stage FPN network with a PAN structure;
the multi-stage FPN network with the PAN structure is formed by connecting a plurality of feature pyramids with the same structure in series, the down-sampling layer of each feature pyramid is formed by depth separable convolution, and the up-sampling layer is formed by depth separable convolution and up-sampling convolution; the input of the first characteristic pyramid is a fusion characteristic graph output by the backbone network, and the fusion characteristic graph is connected with the output of the last up-sampling layer of the previous-stage characteristic pyramid according to the channel direction and used as the input of the next-stage characteristic pyramid; different feature pyramids are used for extracting features of different depths, and the output of each level of feature pyramid is connected according to the channel direction to obtain a multi-scale fusion feature map;
4) and predefining a multi-length-width ratio and a multi-scale detection frame according to the size of the receptive field of the multi-scale fusion characteristic diagram, and completing the positioning and classification of the targets by using the detection frame to realize the high-precision detection of the multi-scale targets.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a method for detecting a target by using a multi-stage feature pyramid fused with a PAN structure based on depth separable convolution. And extracting a feature map by using downsampling, detecting a target by using a multi-stage feature pyramid network, and adding a PAN structure behind the FPN of each stage. The network structure of the deep separable convolution replaces the original convolution network structure, so that the network depth can be deepened, and the parameter quantity and the calculated quantity are reduced. The multi-stage feature pyramid network is composed of a plurality of feature pyramids which are identical in structure and use depth separable convolution, the feature pyramids are connected in series, features with the same size and obtained by different pyramids are fused, and detection is carried out by utilizing the fused feature pyramids. The PAN structure is characterized in that a bottom-up characteristic pyramid is added behind the FPN layer, so that the accuracy and efficiency of target detection can be improved.
Drawings
FIG. 1 is a schematic diagram of the structure of a deep separable convolutional network employed in the present invention;
FIG. 2 is a schematic diagram of a multi-level feature pyramid structure employed in the present invention;
FIG. 3 is a schematic diagram of the present invention using a multi-level feature pyramid of depth separable convolution and fusion PAN for target detection.
Detailed Description
The invention is further explained below with reference to the drawings.
The invention is divided into two main parts: a backbone network portion and an improved multi-level feature pyramid network portion. The backbone network part: firstly, taking out and down-sampling an input image to obtain feature maps with different sizes, using a depth separable network for each down-sampling, then fusing the features through up-sampling to obtain feature maps containing features with different depths, and sending the fused feature maps into an improved multistage pyramid. Improved multi-level pyramid network part: the network is composed of a plurality of feature pyramids with the same structure, and each feature pyramid outputs three features with different sizes; and fusing the characteristics of different pyramids, and detecting the target object.
Depth separable convolution:
the depth separable convolution is performed in two parts: respectively, channel-by-channel convolution and point-by-point convolution. For example, for an M x M, three channel color input picture. In the first step, channel-by-channel convolution is firstly carried out, the number of convolution kernels is the same as that of input layer channels, the size of the convolution kernels is assumed to be 3 x 3, and three characteristic graphs are generated when the step is completed. The second step is point-by-point convolution, wherein the input is the output of the previous step, the convolution kernel size of the point-by-point convolution is 1 multiplied by 3, 3 is the number of channels of the input layer of the second step, and the number of output channels is L (actually, how many convolution kernels exist in the point-by-point convolution part and how many output channels exist). As can be seen, the parameters of the depth separable convolution are: 3 × 3 × 3+3 × L; if the convolutional neural network is common, the parameters are as follows: KxKx3 xL; as the number of input channels increases and the convolution kernel size becomes larger, the amount of parameters required is significantly reduced. As in fig. 1.
Improved multilevel feature pyramid network:
the multi-stage feature pyramid has the main effects of fusing a plurality of processed feature maps, enhancing the performance of a detection target object and reducing the missing rate. The part is formed by connecting a plurality of characteristic pyramid networks with the same structure in series, and the specific series connection mode is shown in fig. 2. And then, processing the feature map by adding a pyramid structure (PAN structure) behind each feature pyramid. Except the first characteristic pyramid, the input characteristic graphs of other pyramids are obtained by connecting the last layer of the pyramid of the previous FPN structure with the output of the backbone network according to the channel direction, different characteristic pyramids are used for extracting characteristics of different depths, and each characteristic pyramid is formed by depth separable convolution. After each convolution layer, a Batch Normalization process and a linear correction unit were used as activation functions.
The invention provides a multilevel characteristic pyramid target detection method based on depth separable convolution and fusion PAN, which comprises the following steps:
1) data acquisition: acquiring video data of a target to be detected, slicing the video data, and converting continuous video data into continuous images;
2) preprocessing the image;
3) performing target detection on the preprocessed image, and acquiring a multi-scale fusion feature map of the image by utilizing a multi-level feature pyramid network of depth separable convolution and fusion PAN;
the multi-level characteristic pyramid network with the depth separable convolution and the fusion PAN comprises a backbone network and a multi-level FPN network with a PAN structure; the backbone network downsamples an input image to obtain feature maps with different sizes, depth separable convolution is adopted in each downsampling, the feature maps are fused through upsampling to obtain a fused feature map containing features with different depths, and the fused feature map is sent to a multi-stage FPN network with a PAN structure;
the multi-stage FPN network with the PAN structure is formed by connecting a plurality of feature pyramids with the same structure in series, the down-sampling layer of each feature pyramid is formed by depth separable convolution, and the up-sampling layer is formed by depth separable convolution and up-sampling convolution; the input of the first characteristic pyramid is a fusion characteristic graph output by the backbone network, and the fusion characteristic graph is connected with the output of the last up-sampling layer of the previous-stage characteristic pyramid according to the channel direction and used as the input of the next-stage characteristic pyramid; different feature pyramids are used for extracting features of different depths, and the output of each level of feature pyramid is connected according to the channel direction to obtain a multi-scale fusion feature map;
4) and predefining a multi-length-width ratio and a multi-scale detection frame according to the size of the receptive field of the multi-scale fusion characteristic diagram, and completing the positioning and classification of the targets by using the detection frame to realize the high-precision detection of the multi-scale targets.
In one embodiment of the present invention, the preprocessing method in step 2) is a filtering method, a square region is obtained by taking pixel points on a picture as a center, the gray values of each pixel point in the region are sorted, the sorted middle value is taken as a new value of the gray value of the center pixel, and the image is traversed in a sliding window manner.
In one embodiment of the present invention, the depth separable convolution comprises an input layer, a channel-by-channel convolution layer, a point-by-point convolution layer, and an output layer; the input of the input layer is a three-channel image, firstly, the input three-channel image is subjected to channel-by-channel convolution operation, three convolution kernels are utilized to carry out convolution on three channels respectively, and three characteristic graphs are generated; and performing point-by-point convolution on the three characteristic graphs by using a three-dimensional convolution kernel, and synthesizing the three characteristic graphs into one characteristic graph to be output.
In one embodiment of the present invention, the backbone network is composed of four convolutional layers and two upsampling layers, after the preprocessed image is input into the backbone network, the preprocessed image is sequentially processed by the four convolutional layers, an output of the fourth convolutional layer is connected to the first upsampling layer, and an output of the first upsampling layer is connected to an output of the third convolutional layer in a channel direction and then used as an input of the second upsampling layer; the output of the second up-sampling layer is connected with the output of the third convolution layer according to the channel direction and then used as the output of the backbone network.
In one specific implementation of the present invention, the multi-stage FPN network with a PAN structure is formed by connecting a plurality of feature pyramids with the same structure in series, where each feature pyramid includes an input layer, four down-sampling layers, and two up-sampling layers;
after an input layer of the characteristic pyramid acquires an image, the image is sequentially processed by a first down-sampling layer and a second down-sampling layer, the output of the second down-sampling layer is used as the input of a first up-sampling layer, the output of the first up-sampling layer and the output of the first down-sampling layer are added according to elements, and then the input of the second up-sampling layer is used; the output of the second up-sampling layer is added with the image acquired by the input layer of the feature pyramid according to elements, and the added result is output as a first feature map on one hand and input as a third down-sampling layer on the other hand; adding the output of the third down-sampling layer and the input of the second up-sampling layer according to elements, wherein the result of the addition is output as a second feature map on one hand and is input as a fourth down-sampling layer on the other hand; adding the output of the fourth down-sampling layer and the output of the second down-sampling layer according to elements, and outputting the added result as a third feature map;
connecting the output of the last up-sampling layer in the previous characteristic pyramid with the output of the backbone network according to the channel direction, and using the output as an input image of an input layer of a next-stage characteristic pyramid; and each feature pyramid outputs three feature graphs with different sizes, and the feature graphs with corresponding sizes are connected according to the channel direction to obtain a final multi-scale fusion feature graph.
In one embodiment of the present invention, the manipulation of the detection frame is completed in step 4) by using a target detector, wherein the target detector is MaskR-CNN or RetinaNet.
In one embodiment of the invention, the optimization of the loss function during training uses a stochastic gradient descent algorithm.
In one embodiment of the present invention, as shown in fig. 3, the implementation process is as follows:
(1) carrying out 8-time down-sampling, 16-time down-sampling and 32-time down-sampling on an input image to obtain a feature map, wherein each down-sampling is performed by using a depth separable convolution network; the illustration takes these three samples as an example.
(2) And fusing the features through upsampling to obtain a feature map containing features of different depths.
(3) And inputting the fused feature map into the improved multi-level pyramid.
(4) And the first characteristic pyramid extracts characteristics from the characteristic graphs and outputs 3 characteristic graphs with different sizes, the output of the last layer of the FPN network structure and the output of the backbone network are connected in a channel mode to obtain the input of the next characteristic pyramid, and each characteristic pyramid is formed by depth separable convolution.
(5) When the extraction of the features is completed by a plurality of connected feature pyramids, each feature pyramid outputs three feature graphs with different sizes, the feature graphs with corresponding sizes are connected according to the channel direction, and the features which are connected and fused according to the channel direction are obtained as follows:
Xi=Concat(Xi1,Xi2,Xi3,Xi4,……,Xin),n=1,2,3……,i=1,2,3
wherein Xi1 represents the ith feature map output by the first feature pyramid, Xi is the ith feature map after fusion, and n represents the number of multi-level feature pyramids. Therefore, the characteristics of all the characteristic pyramids can be fused, and the performance of detecting the target object is improved.
The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (7)

1. A multilevel characteristic pyramid target detection method based on depth separable convolution and fusion PAN is characterized by comprising the following steps:
1) data acquisition: acquiring video data of a target to be detected, slicing the video data, and converting continuous video data into continuous images;
2) preprocessing the image;
3) performing target detection on the preprocessed image, and acquiring a multi-scale fusion feature map of the image by utilizing a multi-level feature pyramid network of depth separable convolution and fusion PAN;
the multi-level characteristic pyramid network with the depth separable convolution and the fusion PAN comprises a backbone network and a multi-level FPN network with a PAN structure; the backbone network downsamples an input image to obtain feature maps with different sizes, depth separable convolution is adopted in each downsampling, the feature maps are fused through upsampling to obtain a fused feature map containing features with different depths, and the fused feature map is sent to a multi-stage FPN network with a PAN structure;
the multi-stage FPN network with the PAN structure is formed by connecting a plurality of feature pyramids with the same structure in series, the down-sampling layer of each feature pyramid is formed by depth separable convolution, and the up-sampling layer is formed by depth separable convolution and up-sampling convolution; the input of the first characteristic pyramid is a fusion characteristic graph output by the backbone network, and the fusion characteristic graph is connected with the output of the last up-sampling layer of the previous-stage characteristic pyramid according to the channel direction and used as the input of the next-stage characteristic pyramid; different feature pyramids are used for extracting features of different depths, and the output of each level of feature pyramid is connected according to the channel direction to obtain a multi-scale fusion feature map;
4) and predefining a multi-length-width ratio and a multi-scale detection frame according to the size of the receptive field of the multi-scale fusion characteristic diagram, and completing the positioning and classification of the targets by using the detection frame to realize the high-precision detection of the multi-scale targets.
2. The multilevel feature pyramid target detection method based on depth separable convolution and fusion PAN according to claim 1, wherein the preprocessing method in step 2) is a filtering method, a square region is taken by taking a pixel point on a picture as a center, the gray values of each pixel point in the region are sorted, the sorted middle value is taken as a new value of the gray value of the center pixel, and the image is traversed in a sliding window manner.
3. The multi-level feature pyramid target detection method based on depth-separable convolution and fusion PAN according to claim 1, wherein the depth-separable convolution comprises an input layer, a channel-by-channel convolution layer, a point-by-point convolution layer, an output layer; the input of the input layer is a three-channel image, firstly, the input three-channel image is subjected to channel-by-channel convolution operation, three convolution kernels are utilized to carry out convolution on three channels respectively, and three characteristic graphs are generated; and performing point-by-point convolution on the three characteristic graphs by using a three-dimensional convolution kernel, and synthesizing the three characteristic graphs into one characteristic graph to be output.
4. The multilevel feature pyramid target detection method based on the depth separable convolution and fusion PAN according to claim 1, wherein the backbone network is composed of four convolution layers and two upsampling layers, after the preprocessed image is input into the backbone network, the preprocessed image is sequentially processed by the four convolution layers, an output of a fourth convolution layer is connected with a first upsampling layer, and an output of the first upsampling layer is connected with an output of a third convolution layer according to a channel direction and then used as an input of a second upsampling layer; the output of the second up-sampling layer is connected with the output of the third convolution layer according to the channel direction and then used as the output of the backbone network.
5. The method for detecting the multilevel characteristic pyramid target based on the depth separable convolution and the fusion PAN according to claim 1, wherein the multilevel FPN network with the PAN structure is formed by connecting a plurality of characteristic pyramids with the same structure in series, and each characteristic pyramid comprises an input layer, four down-sampling layers and two up-sampling layers;
after an input layer of the characteristic pyramid acquires an image, the image is sequentially processed by a first down-sampling layer and a second down-sampling layer, the output of the second down-sampling layer is used as the input of a first up-sampling layer, the output of the first up-sampling layer and the output of the first down-sampling layer are added according to elements, and then the input of the second up-sampling layer is used; the output of the second up-sampling layer is added with the image acquired by the input layer of the feature pyramid according to elements, and the added result is output as a first feature map on one hand and input as a third down-sampling layer on the other hand; adding the output of the third down-sampling layer and the input of the second up-sampling layer according to elements, wherein the result of the addition is output as a second feature map on one hand and is input as a fourth down-sampling layer on the other hand; adding the output of the fourth down-sampling layer and the output of the second down-sampling layer according to elements, and outputting the added result as a third feature map;
connecting the output of the last up-sampling layer in the previous characteristic pyramid with the output of the backbone network according to the channel direction, and using the output as an input image of an input layer of a next-stage characteristic pyramid; each feature pyramid outputs three feature graphs with different sizes, and the feature graphs with corresponding sizes are connected according to the channel direction to obtain a final multi-scale fusion feature graph, wherein the final multi-scale fusion feature graph comprises the following steps:
Xi=Concat(Xi1,Xi2,Xi3,Xi4,……,Xin),n=1,2,3……,i=1,2,3
wherein Xi1 represents the ith feature map output by the first feature pyramid, Xi is the ith feature map after fusion, and n represents the number of multi-level feature pyramids.
6. The multi-stage feature pyramid target detection method based on deep separable convolution and fusion PAN according to claim 1, wherein the manipulation of the detection frame in step 4) is completed by using a target detector, wherein the target detector is MaskR-CNN or RetinaNet.
7. The multi-level feature pyramid target detection method based on depth separable convolution and fusion PAN according to claim 1, wherein the optimization of the loss function in the training process adopts a random gradient descent algorithm.
CN202110325504.0A 2021-03-26 2021-03-26 Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN Pending CN113111736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110325504.0A CN113111736A (en) 2021-03-26 2021-03-26 Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110325504.0A CN113111736A (en) 2021-03-26 2021-03-26 Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN

Publications (1)

Publication Number Publication Date
CN113111736A true CN113111736A (en) 2021-07-13

Family

ID=76712337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110325504.0A Pending CN113111736A (en) 2021-03-26 2021-03-26 Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN

Country Status (1)

Country Link
CN (1) CN113111736A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511515A (en) * 2022-01-17 2022-05-17 山东高速路桥国际工程有限公司 Bolt corrosion detection system and detection method based on BoltCorrDetNet network
CN114694003A (en) * 2022-03-24 2022-07-01 湖北工业大学 Multi-scale feature fusion method based on target detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472298A (en) * 2018-10-19 2019-03-15 天津大学 Depth binary feature pyramid for the detection of small scaled target enhances network
KR102160682B1 (en) * 2019-11-05 2020-09-29 인천대학교 산학협력단 Method and apparatus for processing remote images using multispectral images
CN112102241A (en) * 2020-08-11 2020-12-18 中山大学 Single-stage remote sensing image target detection algorithm
CN112487862A (en) * 2020-10-28 2021-03-12 南京云牛智能科技有限公司 Garage pedestrian detection method based on improved EfficientDet model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472298A (en) * 2018-10-19 2019-03-15 天津大学 Depth binary feature pyramid for the detection of small scaled target enhances network
KR102160682B1 (en) * 2019-11-05 2020-09-29 인천대학교 산학협력단 Method and apparatus for processing remote images using multispectral images
CN112102241A (en) * 2020-08-11 2020-12-18 中山大学 Single-stage remote sensing image target detection algorithm
CN112487862A (en) * 2020-10-28 2021-03-12 南京云牛智能科技有限公司 Garage pedestrian detection method based on improved EfficientDet model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MJIANSUN: "【YOLOV4】FPN+PAN结构", 《CSDN》, pages 1 *
SHU LIU: "Path Aggregation Network for Instance Segmentation", 《ARXIV:1803.01534V4[CS.CV]》, pages 1 - 11 *
姜义成等: "基于深度可分离卷积和多级特征金字塔网络的行人检测", 《汽车安全与节能学报》, pages 1 - 8 *
江大白: "如何评价新出的YOLOV4?", 《知乎》, pages 1 - 4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511515A (en) * 2022-01-17 2022-05-17 山东高速路桥国际工程有限公司 Bolt corrosion detection system and detection method based on BoltCorrDetNet network
CN114694003A (en) * 2022-03-24 2022-07-01 湖北工业大学 Multi-scale feature fusion method based on target detection

Similar Documents

Publication Publication Date Title
CN112287940B (en) Semantic segmentation method of attention mechanism based on deep learning
CN112329800B (en) Salient object detection method based on global information guiding residual attention
CN110136063B (en) Single image super-resolution reconstruction method based on condition generation countermeasure network
CN109035149B (en) License plate image motion blur removing method based on deep learning
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN112861729B (en) Real-time depth completion method based on pseudo-depth map guidance
CN111899168B (en) Remote sensing image super-resolution reconstruction method and system based on feature enhancement
CN110070574B (en) Binocular vision stereo matching method based on improved PSMAT net
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
CN113111736A (en) Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN
CN112365511B (en) Point cloud segmentation method based on overlapped region retrieval and alignment
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN112529908B (en) Digital pathological image segmentation method based on cascade convolution network and model thereof
CN114627290A (en) Mechanical part image segmentation algorithm based on improved DeepLabV3+ network
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN117392496A (en) Target detection method and system based on infrared and visible light image fusion
CN115293986A (en) Multi-temporal remote sensing image cloud region reconstruction method
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN113344110B (en) Fuzzy image classification method based on super-resolution reconstruction
CN113888505A (en) Natural scene text detection method based on semantic segmentation
CN107358625B (en) SAR image change detection method based on SPP Net and region-of-interest detection
CN117593187A (en) Remote sensing image super-resolution reconstruction method based on meta-learning and transducer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination