CN111814814B - Single-stage target detection method based on image super-resolution network - Google Patents

Single-stage target detection method based on image super-resolution network Download PDF

Info

Publication number
CN111814814B
CN111814814B CN201910286446.8A CN201910286446A CN111814814B CN 111814814 B CN111814814 B CN 111814814B CN 201910286446 A CN201910286446 A CN 201910286446A CN 111814814 B CN111814814 B CN 111814814B
Authority
CN
China
Prior art keywords
network
target
convolution
targets
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910286446.8A
Other languages
Chinese (zh)
Other versions
CN111814814A (en
Inventor
刘怡光
畅青
薛凯
史雪蕾
杨艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910286446.8A priority Critical patent/CN111814814B/en
Publication of CN111814814A publication Critical patent/CN111814814A/en
Application granted granted Critical
Publication of CN111814814B publication Critical patent/CN111814814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention innovatively introduces an image super-resolution reconstruction technology into a target detection network and provides a novel and stable single-stage target detection method. Firstly, performing super-resolution reconstruction on an original picture by using a convolutional neural network to generate a clear reconstructed picture with high resolution; then, a target detection network is built on the super-resolution reconstruction network; and finally, detecting large and small targets on the original picture and the reconstructed picture respectively: for medium and large targets with sufficient pixel quantity, the target identification and detection are still carried out on the original picture by utilizing a neural network; and the detection of the small target is carried out on the reconstructed picture, and then the detection result is mapped back to the original picture. Experiments prove that the method can remarkably improve the detection precision and recall rate of small targets, fuzzy targets and shielding targets while ensuring the detection performance of large and medium targets.

Description

Single-stage target detection method based on image super-resolution network
Technical Field
The invention relates to a single-stage target detection method based on an image super-resolution network, which is used for improving the identification efficiency and the positioning accuracy of a target detection model on a target, particularly a tiny target in a picture. Belongs to the field of computer vision.
Background
Target detection is taken as a basic work of computer vision, and has important research values in the fields of pedestrian detection, license plate recognition, unmanned driving and the like, so that the target detection is widely concerned for a long time. At present, the top-level target detection method almost adopts a deep convolutional network architecture, and is mainly divided into two genres: one is a two-stage target detection method taking the master of the faster RCNN and based on a candidate region paradigm. Such detectors first generate candidate regions (region artifacts), and then perform object classification and position refinement on the candidate regions. The other is an end-to-end single-stage target detection method taking RetinaNet, SSD and the like as the main components, and the method does not need a region general stage, but directly generates the class probability and the position coordinate value of the predicted target. Whether the method is a single-stage detection method or a two-stage detection method, the development and the improvement are to obtain higher detection precision and higher detection speed.
Although the target detection model and the method are rapidly developed, the target detection precision and the recall rate are greatly improved, the target information is increasingly lacked due to the fact that the number of pixels occupied by small targets, fuzzy targets and shielding targets in the picture is too small, and the small targets, the fuzzy targets and the shielding targets are continuously rolled and pooled through a neural network, so that the detection precision and the recall rate of the small targets cannot be improved all the time, and the method becomes an important factor which is beneficial to the performance of a target detection framework.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the convolution neural network is utilized to continuously enrich the pixel information of small targets, fuzzy targets and shielding targets, and further the precision and recall rate of target detection are greatly improved.
The solution of the invention is: the image super-resolution reconstruction technology is innovatively introduced into a target detection network. Firstly, super-resolution reconstruction is carried out on an original image by utilizing a convolutional neural network to generate a clear reconstructed image with high resolution. Then, the detection of large and small targets is respectively carried out on the original image and the reconstructed image: for the medium and large-sized targets with sufficient pixel quantity, the neural network is still utilized to identify and detect the targets on the original image, while for the small targets, the detection is carried out on the reconstructed image, and finally, the result is mapped back to the original image.
The invention aims to realize the solution of the complaint, and the method comprises the following steps:
1. and (5) building and training a super-resolution reconstruction network. The original image can generate a reconstructed image with richer pixel information after passing through the super-resolution reconstruction network.
2. And (3) constructing a target detection skeleton network module on the feature map generated by the super-resolution reconstruction network, and further extracting the features of the original image.
3. And performing feature extraction on the reconstructed image by using the convolutional neural network to generate a feature map of the reconstructed image.
4. And constructing a characteristic pyramid module on the skeleton network and the characteristic graph of the reconstructed image, and performing characteristic fusion on the characteristics of different scales by using the characteristic pyramid.
5. And building a target classification and coordinate regression network on the fused feature map.
6. And (4) network training is carried out by adopting a multitask loss function. And keeping the parameters of the image super-resolution reconstruction network unchanged in the training process.
Drawings
Description of the drawings fig. 1 is a detailed architecture diagram of an image super-resolution reconstruction network.
Description of the drawings figure 2 is a general network architecture diagram of the present invention.
Detailed description of the preferred embodiments
The method is described in further detail below with reference to the accompanying drawings:
1. referring to the attached figure 1 of the specification, the method firstly builds and trains an image super-resolution reconstruction network. Firstly, the original image is subjected to feature extraction by 56 5 × 5 convolution kernels and 64 3 × 3 convolution kernels to generate a feature map B0(ii) a Next, 12 convolution checks of 1 × 1 are used to check B0Performing feature compression (the feature map is reduced from 64 layers to 12 layers) to reduce network parameters; then entering a characteristic mapping stage: performing convolution operation on the compressed feature map by using 12 convolution kernels with the size of 3 multiplied by 3 for 3 times; in order to provide richer information for the reconstruction stage, the method uses 56 1 × 1 convolution kernels for feature expansion (the feature map is expanded from 12 layers to 56 layers); finally useAnd performing image reconstruction by using the deconvolution kernel to generate a reconstructed image.
2. In B0And constructing a residual error network as a network skeleton module of the overall detection method. The residual error (ResNet) network can make the network deeper and easier to optimize by means of jump connection. Generating a skeleton characteristic diagram of 4 layers through continuous convolution and pooling operations of the residual blocks: { B1, B2, B3, B4}。
3. Referring to the description and the attached drawing fig. 2, the feature pyramid module is generated by connecting with the residual error network in a top-down manner. Wherein, P4From B4Formed by convolution of 1 × 1; p3From P4After upsampling, the obtained product is mixed with B3 Performing element-level addition, and performing 3-by-3 convolution to form the element-level addition; p2,P1Is generated from P3Similarly; for P0Firstly, feature extraction is carried out on a reconstructed picture by using 64 convolution kernels of 7 × 7, and S is generated by 3 × 3 convolution and 2 × 2 pooling operation0Then P is added1Perform upsampling with S0The element-level addition is performed by 256 convolution kernels of 3 × 3.
4. In order to make the target positioning more accurate, the method is characterized in a characteristic diagram { P }0, P1, P2, P3, P4All set up 9 types of default frames on it, correspond to 3 different yards {2 }0,21/3,22/3And 3 different aspect ratios 1:1,1:2,2: 1. The default boxes cover an area of 322,642,1282,2562,5122}. The positioning of the target is actually achieved by predicting the offset value of the target with respect to the default box coordinates.
5. And predicting the category of the target and the coordinate offset value of the target relative to a default box by adopting a full convolution network. Prediction for target class: further pairs { P } using 256 convolution kernels of 3 × 30, P1, P2, P3, P4Performing feature extraction, performing K × A convolution with 3 × 3 convolution kernels, and obtaining a final target category score by using a sigmoid activation function. Where a =9 is the number of classes of default boxes on each level, and K is the number of target classes. Prediction of target position: further pairs { P } using 256 convolution kernels of 3 × 30, P1, P2, P3, P4And (5) performing feature extraction, and performing convolution by using 4 multiplied by A convolution kernels to obtain a coordinate offset value of the target relative to each default frame. The offset values and the corresponding default frame coordinate values are added to obtain a predicted frame of the network to the target in the image, and it is noted that for the feature map P0The generated prediction frame is mapped back to the original image.
6. And filtering the result. There are a large number of invalid boxes for the prediction box generated using the convolutional network. The score values of the target category prediction are sorted according to the network, only 300 prediction boxes with the highest score are reserved, and then the final prediction result is generated after the non-maximum value suppression with the threshold value of 0.5 is carried out.

Claims (1)

1. A single-stage target detection method based on an image super-resolution network is characterized by comprising the following steps: firstly, performing super-resolution reconstruction on an original image by using a convolutional neural network to generate a clear reconstructed image with high resolution; then, the detection of large and small targets is respectively carried out on the original image and the reconstructed image: for the medium and large-sized targets with sufficient pixel quantity, the neural network is still utilized to identify and detect the targets on the original image, the detection on the small targets is carried out on the reconstructed image, and finally the result is mapped back to the original image, and the specific steps are as follows:
(1) constructing and training an image super-resolution reconstruction network, and performing feature extraction on an original image by using 56 5 × 5 convolution kernels and 64 3 × 3 convolution kernels to generate a feature map B0(ii) a Next, 12 convolution checks of 1 × 1 are used to check B0Performing feature compression to reduce network parameters; then entering a characteristic mapping stage: performing convolution operation on the compressed feature map by using 12 convolution kernels with the size of 3 multiplied by 3 for 3 times; and 56 convolution kernels of 1 × 1 are used for feature expansion; finally, performing image reconstruction by using a deconvolution kernel to generate a reconstructed image;
(2) in super-resolution reconstruction network generationBuilding a residual error network on the feature map, and generating a skeleton feature map with 4 layers: { B1, B2, B3, B4};
(3) Carrying out feature extraction on the reconstructed image to generate a feature map S0By top-down and side-to-side connections, in { S0, B1, B2, B3, B4Establishing a characteristic pyramid structure (P)0, P1, P2, P3, P4}, wherein: p4From B4Formed by convolution of 1 × 1; p3From P4After upsampling, the obtained product is mixed with B3 Performing element-level addition, and performing 3-by-3 convolution to form the element-level addition; p2,P1Is generated from P3Similarly; for P0Firstly, feature extraction is carried out on a reconstructed picture by using 64 convolution kernels of 7 × 7, and S is generated by 3 × 3 convolution and 2 × 2 pooling operation0Then P is added1Perform upsampling with S0Element-level addition and 256 convolution kernels of 3 × 3 to form the element-level addition;
(4) in the feature map { P0, P1, P2, P3, P4All set up 9 types of default frames on it, correspond to 3 different yards {2 }0,21/3,22/3And 3 different aspect ratios 1:1,1:2,2:1, the default boxes covering an area of 322,642,1282,2562,5122Predicting the category of the target and the coordinate offset value of the target relative to a default frame by using a full convolution network, and predicting the category of the target: further pairs { P } using 256 convolution kernels of 3 × 30, P1, P2, P3, P4Performing feature extraction, performing K × A convolution with 3 × 3 convolution kernels, and then obtaining a final target category score by using a sigmoid activation function, wherein A =9 is the number of categories of default frames on each level, and K is the number of target categories; prediction of target position: further pairs { P } using 256 convolution kernels of 3 × 30, P1, P2, P3, P4Performing feature extraction, and convolving with 4 × A convolution kernelsThen obtaining a coordinate deviation value of the target relative to each default frame; adding the deviation values and the corresponding default frame coordinate values to obtain a prediction frame of the network to the target in the image; and apply the feature map P0Mapping the generated prediction frame back to the original image;
(5) sorting the score values of the target category prediction according to the network, only keeping 300 prediction frames with the highest score, and then performing non-maximum value suppression with a threshold value of 0.5 to generate a final prediction result;
and (3) combining the steps (1), (2), (3), (4) and (5) to complete the construction of the whole method.
CN201910286446.8A 2019-04-10 2019-04-10 Single-stage target detection method based on image super-resolution network Active CN111814814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910286446.8A CN111814814B (en) 2019-04-10 2019-04-10 Single-stage target detection method based on image super-resolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910286446.8A CN111814814B (en) 2019-04-10 2019-04-10 Single-stage target detection method based on image super-resolution network

Publications (2)

Publication Number Publication Date
CN111814814A CN111814814A (en) 2020-10-23
CN111814814B true CN111814814B (en) 2022-04-12

Family

ID=72844221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910286446.8A Active CN111814814B (en) 2019-04-10 2019-04-10 Single-stage target detection method based on image super-resolution network

Country Status (1)

Country Link
CN (1) CN111814814B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804657A (en) * 2006-01-23 2006-07-19 武汉大学 Small target super resolution reconstruction method for remote sensing image
CN105139339A (en) * 2015-07-27 2015-12-09 中国人民解放军陆军军官学院 Polarization image super-resolution reconstruction method based on multi-level filtering and sample matching
CN105389797A (en) * 2015-10-16 2016-03-09 西安电子科技大学 Unmanned aerial vehicle video small-object detecting method based on super-resolution reconstruction
CN108171656A (en) * 2018-01-12 2018-06-15 西安电子科技大学 Adaptive Global Dictionary remote sensing images ultra-resolution method based on rarefaction representation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070083114A1 (en) * 2005-08-26 2007-04-12 The University Of Connecticut Systems and methods for image resolution enhancement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804657A (en) * 2006-01-23 2006-07-19 武汉大学 Small target super resolution reconstruction method for remote sensing image
CN105139339A (en) * 2015-07-27 2015-12-09 中国人民解放军陆军军官学院 Polarization image super-resolution reconstruction method based on multi-level filtering and sample matching
CN105389797A (en) * 2015-10-16 2016-03-09 西安电子科技大学 Unmanned aerial vehicle video small-object detecting method based on super-resolution reconstruction
CN108171656A (en) * 2018-01-12 2018-06-15 西安电子科技大学 Adaptive Global Dictionary remote sensing images ultra-resolution method based on rarefaction representation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Scale-Transferrable Object Detection;Peng Zhou 等;《 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181217;528-537 *
深度卷积神经网络的发展及其在计算机视觉领域的应用;张顺 等;《计算机学报》;20170918;第42卷(第3期);453-482 *
空间运动图像序列的增强和超分辨率重建研究;曹守鑫;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150815(第8期);I138-1144 *

Also Published As

Publication number Publication date
CN111814814A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN114202672A (en) Small target detection method based on attention mechanism
CN112270280B (en) Open-pit mine detection method in remote sensing image based on deep learning
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN111626176B (en) Remote sensing target rapid detection method and system based on dynamic attention mechanism
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN110991444B (en) License plate recognition method and device for complex scene
CN112069868A (en) Unmanned aerial vehicle real-time vehicle detection method based on convolutional neural network
CN111738344A (en) Rapid target detection method based on multi-scale fusion
Li et al. Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing
CN111797846B (en) Feedback type target detection method based on characteristic pyramid network
Wang et al. Dual transfer learning for event-based end-task prediction via pluggable event to image translation
US20220366682A1 (en) Computer-implemented arrangements for processing image having article of interest
CN113111875A (en) Seamless steel rail weld defect identification device and method based on deep learning
CN111932511A (en) Electronic component quality detection method and system based on deep learning
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN116994047A (en) Small sample image defect target detection method based on self-supervision pre-training
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN116152226A (en) Method for detecting defects of image on inner side of commutator based on fusible feature pyramid
CN114972780A (en) Lightweight target detection network based on improved YOLOv5
CN111814814B (en) Single-stage target detection method based on image super-resolution network
CN113673478B (en) Port large-scale equipment detection and identification method based on deep learning panoramic stitching
CN112001479B (en) Processing method and system based on deep learning model and electronic equipment
Bao et al. YED-YOLO: An object detection algorithm for automatic driving
Sun et al. CGCANet: Context-Guided Cost Aggregation Network for Robust Stereo Matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant