CN113221925A - Target detection method and device based on multi-scale image - Google Patents

Target detection method and device based on multi-scale image Download PDF

Info

Publication number
CN113221925A
CN113221925A CN202110679907.5A CN202110679907A CN113221925A CN 113221925 A CN113221925 A CN 113221925A CN 202110679907 A CN202110679907 A CN 202110679907A CN 113221925 A CN113221925 A CN 113221925A
Authority
CN
China
Prior art keywords
image
feature map
resolution
target detection
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110679907.5A
Other languages
Chinese (zh)
Other versions
CN113221925B (en
Inventor
单纯
王曦
宫英慧
周彦哲
李金泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110679907.5A priority Critical patent/CN113221925B/en
Publication of CN113221925A publication Critical patent/CN113221925A/en
Application granted granted Critical
Publication of CN113221925B publication Critical patent/CN113221925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method and a target detection device based on a multi-scale image, wherein the method comprises the steps of inputting an original image to obtain a candidate region; acquiring an original characteristic map of the candidate region; comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement; and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification. According to the scheme of the invention, the target detection performance of the target detection network on the low-resolution images is enhanced by utilizing the output of the trained image reconstruction network, the detection of small targets is emphasized, and the detection effect is good.

Description

Target detection method and device based on multi-scale image
Technical Field
The invention relates to the technical field of computer vision, in particular to a target detection method and device based on multi-scale images.
Background
Matching and detecting image objects are always a very important problem in the field of computer vision. The target detection technology has a wide application range, and how to develop an effective, accurate and widely applicable target detection algorithm is particularly important. In the process of target detection, we usually encounter four classical errors, (a) class is misidentified; (b) positioning error, only positioning part of the body; (c) target unrecognized error caused by occlusion; (d) small object errors, classification errors resulting from the fact that the object occupies an area that is too small and the features of the object are not effectively identified.
In recent years, image target detection algorithms based on deep learning have achieved breakthrough progress. The detection is carried out through the convolutional neural network, and the precision is greatly improved. For the above errors, many excellent algorithms are proposed to optimize the target detection, so as to improve the accuracy and speed of the target detection. The improvement of the target detection algorithm mainly aims at the following aspects: (1) the first method mainly aims at the model infrastructure of the algorithm, namely, the structure of the deep network is improved, such as deepening the basic network. (2) The second method mainly aims at improving the characteristics, and methods for increasing the context information and multi-scale information of the characteristics are popular improvement modes at present, so that the detection capability of the algorithm on small targets is improved. (3) A third category of methods is directed to improvements in data enhancement methods. Data enhancement is the simplest and most effective method to improve model robustness and reduce overfitting. In addition, the target detection algorithm is mainly improved in the following stages of (1) image processing. (2) And (5) a detection stage. (3) And (5) a classification stage.
The target detection algorithm based on deep learning is mainly divided into two main categories, namely a regression-based detection algorithm and a region proposal-based detection algorithm according to the structural difference. The regression-based target detection algorithm mainly comprises algorithms such as YOLO, SSD, RetinaNet, RefineDet and the like, and the algorithm mainly obtains a result by performing primary regression and multi-classification calculation through features extracted from a main network. The detection algorithm based on the region proposal mainly comprises algorithms such as R-CNN, SPPNET, Fast-RCNN, R-FCN, FPN and the like, the type of algorithm is detected in two stages, the first stage is mainly responsible for carrying out rough regression and classification on an initial frame anchor on the characteristics extracted from an image to obtain a proposal frame, the second stage is mainly used for carrying out further regression and classification calculation on the proposal frame (proposal) obtained by the first stage detection to obtain a result, all the results obtained by the network are subjected to post-processing operations such as non-maximum value inhibition, anti-border crossing processing and the like, and finally all the obtained detection frames are marked on the original image to complete the detection.
However, the above two algorithms completely depend on the scale change of the anchor for the problem of the target scale change, and cannot well solve the problem of the scale change in the target detection, especially the problem of the small target detection.
Disclosure of Invention
In order to solve the technical problems, the invention provides a target detection method and a target detection device based on a multi-scale image, and the method and the device are used for solving the problem that the target detection in the prior art cannot well solve the scale change in the target detection, especially the technical problem of small target detection.
According to a first aspect of the present invention, there is provided a method for multi-scale image-based object detection, the method comprising the steps of:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original characteristic map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
According to a second aspect of the present invention, there is provided a multi-scale image-based object detection apparatus, the apparatus comprising:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
According to a third aspect of the present invention, there is provided a multi-scale image-based object detection system, comprising:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the instructions are stored by the memory and loaded and executed by the processor to perform the multi-scale image-based object detection method as described above.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having a plurality of instructions stored therein; the instructions are used for loading and executing the multi-scale image-based target detection method by the processor.
According to the scheme of the invention, some improved algorithms are provided based on the aspect of multi-scale features, multi-scale feature expression is realized by methods such as feature fusion, feature enhancement and the like, and a new end-to-end network structure is provided for the purpose: and detecting the object in the low-resolution image through the cooperative learning of two deep neural networks, namely an image reconstruction network and an object detection network. Firstly, a target detection network is trained, secondly, an image reconstruction network enhances a low-resolution image into a high-resolution image through the assistance of the target detection network, and finally, the target detection performance of the target detection network on the low-resolution image is enhanced by utilizing the output of the trained image reconstruction network. The scheme of the invention focuses on the detection of small targets, and the low-resolution picture can be detected by using the IRN.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flowchart of a multi-scale image-based target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic overall structure diagram of a multi-scale image-based target detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an overall structure of an image reconstruction network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of image reconstruction for an image reconstruction network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an upsampling process according to an embodiment of the present invention;
FIG. 6 is a schematic view of a downsampling according to one embodiment of the present invention;
fig. 7 is a block diagram of a multi-scale image-based object detection apparatus according to an embodiment of the present invention.
Detailed Description
First, a flow of a multi-scale image-based target detection method according to an embodiment of the present invention is described with reference to fig. 1. As shown in fig. 1-2, the method comprises the steps of:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original characteristic map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
The step S101: an original image is input to obtain a candidate region, and in the present embodiment, the candidate region is obtained by a region generation network (RPN).
The step S102: obtaining the original feature map of the candidate region, in this embodiment, RolPooling obtains the original feature map of the candidate region.
The step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement, wherein:
and carrying out no processing on the original feature map of the candidate region higher than or equal to the preset resolution.
In this embodiment, as shown in fig. 3 to 4, the image reconstruction network model includes an Image Reconstruction Network (IRN) and a target detection network, an input of the image reconstruction network is an image with a resolution lower than a preset resolution, an output of the image reconstruction network is a reconstructed image RLR, and a pixel size of the reconstructed image RLR is the same as a pixel size of an HR image output by the image reconstruction network; with the reconstructed image RLR as an input of the target detection network, a loss is calculated based on the reconstructed image RLR and a feature map HR acquired via an up-sampling operation of the image reconstruction network, thereby adjusting a parameter of the image reconstruction network.
The image reconstruction network comprises a plurality of convolution layers and a plurality of branches of different levels; for an input original feature map lower than a preset resolution, after convolution operation of the plurality of convolution layers, obtaining a branch with the lowest feature vector input level; each branch comprises a plurality of sampling blocks, each sampling block comprises an up sampling block and a down sampling block, and the transmission characteristics of each branch are enhanced in a certain proportion in the forward propagation process of each branch through the sampling blocks; for each branch of the plurality of branches: transmitting the upsampling characteristics of the branch to the corresponding sampling block in the branch with the higher level than the sampling block through each sampling block; the downsampling characteristic of the branch is transmitted to the corresponding sample block in the branch with the lower level than the branch through each sample block.
In this embodiment, the upsampling operation of the upsampling block and the downsampling operation of the downsampling block may be performed concurrently.
The image reconstruction network adopting the structure has the advantages that:
(1) the overall architecture starts with the low resolution profile as a first stage, adding step by step low to high resolution operations to form more stages, connecting subnets with different resolutions in parallel.
(2) Multi-scale fusion is performed, which is performed with the help of low resolution representations of the same depth and similar level to boost the high resolution representation. I.e. each subnet repeatedly accepts information from other parallel subnets.
For the operation expanded by 4 times, a total of three branches are provided in the embodiment, and the size of the feature map remains unchanged during the forward propagation of each branch. The three branches are different, but there is communication of information between each branch. For example, in the forward process, the lowest branch in the figure, i.e., branch 1, will expand its feature map by an upsampling block, which comprises 3 units (as shown in fig. 5), and then pass into branches 2 and 3, while branch 2 will also pass through a downsampling block (as shown in fig. 6), and send the reduced feature map to branch 1. In this embodiment, the upsampling operation of the upsampling block and the downsampling operation of the downsampling block may be performed at the same stage.
In this embodiment, the feature map lower than the threshold is input into the network, and through image reconstruction of the feature map, even if three different branches are provided, each branch acquires target features of different pixels in a parallel-addition manner of upsampling and downsampling, and meanwhile, through convolution, automatic noise reduction and feature enhancement, communication among the branches finally acquires a single feature map after fusion in a feature fusion manner. The loss is calculated using the acquired enhanced low resolution image (RLR) and the target detection results of the upsampled acquired feature map (HR).
As shown in fig. 5, the upsampling block is composed of a first sub-pixel convolution unit, a first convolution unit, and a second sub-pixel convolution unit; the low-resolution feature map L0 is subjected to sub-pixel convolution of a first sub-pixel convolution unit to generate a high-resolution feature map H0; the high-resolution feature map H0 is converted into a low-resolution feature map L1 through the convolution operation of a first convolution unit; subtracting pixels by pixels between L0 and L1 to find the difference between the low resolution feature maps; and the low-resolution feature map L1 generates a high-resolution feature map H1 through the sub-pixel convolution of the second sub-pixel convolution unit, and the two high-resolution feature maps H0 and H1 are added pixel by pixel to output a high-resolution feature map HR.
As shown in fig. 6, the downsampling block is composed of a first convolution unit, a first sub-pixel convolution unit, and a second convolution unit; the high-resolution feature map H0 'is convolved by a first convolution unit to generate a low-resolution feature map L0'; the low-resolution feature map L0 'is converted into a high-resolution feature map H1' through the convolution operation of the first sub-pixel convolution unit; fusing the pixel-by-pixel addition between H0 'and H1'; and the high-resolution feature map H1 'generates a low-resolution feature map L1' through convolution of a second convolution unit, and the low-resolution feature map L0 'and the low-resolution feature map L1' are subjected to pixel-by-pixel subtraction to output a low-resolution feature map LR.
The loss function of the Image Reconstruction Network (IRN) is:
the task of IRN is to reconstruct high resolution images from low resolution images, and it is important to design an appropriate loss function in order to obtain the desired enhancement effect. Since our ultimate goal is to improve the accuracy of object detection, we wish to focus on the information related to the object to reconstruct a high resolution image. Based on the typical reconstruction loss in super-resolution, we add three auxiliary loss functions that play a secondary role in reconstructing the image.
RLoss error between RLR images generated by the image reconstruction network and the HR output by the upsampling module.
ELoss-edge extraction between RLR and HR using the classical Sobel operator, and then calculating the mean of pixel differences.
PLoss-extracting perceptual features from the Frozen Layer in the object recognition network, respectively, and then calculating the perceptual loss using the euclidean distance between the two extracted feature vectors.
4. Total loss of image reconstruction network:
Total Loss=w1RLoss+w2ELoss+w3PLoss
wherein w1,w2,w3The weight coefficient is specifically set according to experiments.
As shown in fig. 3, the training process of the image reconstruction network model includes:
step S301: training the target detection network by using the HR image output by the up-sampling module, and keeping the parameters of certain layers of the YOLOV3 unchanged;
in this embodiment, parameters of some layers of YOLOV3 are kept unchanged to preserve general feature extraction capability of the target detection network, and the network is used to guide Image Reconstruction Network (IRN).
Step S302: fixing parameters of the target detection network, training an Image Reconstruction Network (IRN) in a supervised manner using training samples and the target recognition network;
in this embodiment, the training sample is an image with a resolution lower than a preset threshold as a low resolution image (LR), the output of the image reconstruction network is a reconstructed image (RLR) with the same size as HR pixels of the image reconstruction network, and a reconstruction loss and an edge loss are calculated by using a difference between the RLR and the HR image; the total loss of detection is calculated by using the target detection network with the RLR image as input. By using recombination losses, Image Reconstruction Networks (IRNs) focus on using information useful for target detection to accomplish the reconstruction of images. The Total Loss of the image reconstruction network is Total Loss w1RLoss+w2ELoss+w3PLoss。
Step S303: training the target detection network using reconstructed images (RLR) generated by the Image Reconstruction Network (IRN), at which stage the parameters of the Image Reconstruction Network (IRN) are fixed and the parameters of the target detection network are trained.
In this embodiment, all layers of the target detection network are not Frozen (Frozen) to enhance target detection capability. After training is complete, the entire process can be applied to the new LR images. The LR image is input to the IRN to generate a reconstructed image, which is then input as input into the object detection network. The final result is then predicted by the target detection network.
An embodiment of the present invention further provides a target detection apparatus based on a multi-scale image, as shown in fig. 7, the apparatus includes:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
The embodiment of the invention further provides a target detection system based on multi-scale images, which comprises the following steps:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the instructions are stored by the memory and loaded and executed by the processor to perform the multi-scale image-based object detection method as described above.
The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium; the instructions are used for loading and executing the multi-scale image-based target detection method by the processor.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a physical machine server, or a network cloud server, etc., and needs to install a Ubuntu operating system) to perform some steps of the method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

Claims (9)

1. A target detection method based on multi-scale images is characterized by comprising the following steps:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original characteristic map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
2. The multi-scale image-based object detection method according to claim 1, wherein the image reconstruction network model comprises an image reconstruction network and an object detection network, the image reconstruction network has an input of an image with a resolution lower than a preset resolution and an output of a reconstructed image RLR, and the pixel size of the reconstructed image RLR is the same as the pixel size of an HR image output by the image reconstruction network; calculating a loss based on a reconstructed image RLR and a feature map HR obtained through an up-sampling operation of the image reconstruction network by taking the reconstructed image RLR as an input of a target detection network, so as to adjust parameters of the image reconstruction network;
the image reconstruction network comprises a plurality of convolution layers and a plurality of branches of different levels; for an input original feature map lower than a preset resolution, after convolution operation of the plurality of convolution layers, obtaining a branch with the lowest feature vector input level; each branch comprises a plurality of sampling blocks, each sampling block comprises an up sampling block and a down sampling block, and the transmission characteristics of each branch are enhanced in a certain proportion in the forward propagation process of each branch through the sampling blocks; for each branch of the plurality of branches: transmitting the upsampling characteristics of the branch to the corresponding sampling block in the branch with the higher level than the sampling block through each sampling block; the downsampling characteristic of the branch is transmitted to the corresponding sample block in the branch with the lower level than the branch through each sample block.
3. The multi-scale image based object detection method of claim 2, wherein the upsampling block is composed of a first sub-pixel convolution unit, a first convolution unit, and a second sub-pixel convolution unit; the low-resolution feature map L0 is subjected to sub-pixel convolution of a first sub-pixel convolution unit to generate a high-resolution feature map H0; the high-resolution feature map H0 is converted into a low-resolution feature map L1 through the convolution operation of a first convolution unit; subtracting pixels by pixels between L0 and L1 to find the difference between the low resolution feature maps; and the low-resolution feature map L1 is subjected to sub-pixel convolution of a second sub-pixel convolution unit to generate a high-resolution feature map H1, and the two high-resolution feature maps H0 and H1 are added pixel by pixel to output a feature map HR.
4. The multi-scale image-based object detection method of claim 3, wherein the downsampling block is composed of a first convolution unit, a first sub-pixel convolution unit, and a second convolution unit; the high-resolution feature map H0 'is convolved by a first convolution unit to generate a low-resolution feature map L0'; the low-resolution feature map L0 'is converted into a high-resolution feature map H1' through the convolution operation of the first sub-pixel convolution unit; fusing the pixel-by-pixel addition between H0 'and H1'; the high-resolution feature map H1 'is convolved by the second convolution unit to generate a low-resolution feature map L1', and the low high-resolution feature map L0 'and the low high-resolution feature map L1' are subtracted pixel by pixel to output a feature map LR.
5. The multi-scale image-based object detection method of claim 4,
the loss function of the Image Reconstruction Network (IRN) is:
Total Loss=w1RLoss+w2ELoss+w3PLoss
wherein w1,w2,w3RLoss is the error between the RLR image generated by the image reconstruction network and the HR, which is the weight coefficient; ELoss is to adopt Sobel operator to extract the edge between RLR and HR separately, then, calculate the average value of the pixel difference; PLoss is the perceptual loss calculated by extracting perceptual features from Frozen layers in the object recognition network, respectively, and then using the euclidean distance between the two extracted feature vectors.
6. The multi-scale image-based target detection method of claim 5, wherein the training process of the image reconstruction network model comprises:
step S301: training the target detection network by using the HR image output by the up-sampling module, and keeping the parameters of certain layers of the YOLOV3 unchanged;
step S302: fixing parameters of the target detection network, and training an image reconstruction network in a supervised manner by using a training sample and the target recognition network;
step S303: and training the target detection network by using a reconstructed image RLR generated by the image reconstruction network, and at this stage, fixing the parameters of the image reconstruction network and training the parameters of the target detection network.
7. An apparatus for object detection based on multi-scale images, the apparatus comprising:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
8. A multi-scale image based object detection system, comprising:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the plurality of instructions are to be stored by the memory and loaded and executed by the processor to perform the multi-scale image based object detection method of any one of claims 1-6.
9. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing by a processor the method for multi-scale image based object detection according to any one of claims 1-6.
CN202110679907.5A 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image Active CN113221925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110679907.5A CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110679907.5A CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Publications (2)

Publication Number Publication Date
CN113221925A true CN113221925A (en) 2021-08-06
CN113221925B CN113221925B (en) 2022-11-11

Family

ID=77080572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110679907.5A Active CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Country Status (1)

Country Link
CN (1) CN113221925B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140427A (en) * 2021-11-30 2022-03-04 深圳集智数字科技有限公司 Object detection method and device
TWI779784B (en) * 2021-08-19 2022-10-01 中華電信股份有限公司 Feature analysis system, method and computer readable medium thereof
CN115601357A (en) * 2022-11-29 2023-01-13 南京航空航天大学(Cn) Stamping part surface defect detection method based on small sample
CN115937794A (en) * 2023-03-08 2023-04-07 北京龙智数科科技服务有限公司 Small target object detection method and device, electronic equipment and storage medium
CN117197756A (en) * 2023-11-03 2023-12-08 深圳金三立视频科技股份有限公司 Hidden danger area intrusion detection method, device, equipment and storage medium
CN117745595A (en) * 2024-02-18 2024-03-22 珠海金山办公软件有限公司 Image processing method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN111626208A (en) * 2020-05-27 2020-09-04 北京百度网讯科技有限公司 Method and apparatus for detecting small targets
US20210065337A1 (en) * 2019-09-03 2021-03-04 Novatek Microelectronics Corp. Method and image processing device for image super resolution, image enhancement, and convolutional neural network model training
CN112597887A (en) * 2020-12-22 2021-04-02 深圳集智数字科技有限公司 Target identification method and device
US20210166350A1 (en) * 2018-07-17 2021-06-03 Xi'an Jiaotong University Fusion network-based method for image super-resolution and non-uniform motion deblurring

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166350A1 (en) * 2018-07-17 2021-06-03 Xi'an Jiaotong University Fusion network-based method for image super-resolution and non-uniform motion deblurring
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
US20210065337A1 (en) * 2019-09-03 2021-03-04 Novatek Microelectronics Corp. Method and image processing device for image super resolution, image enhancement, and convolutional neural network model training
CN111626208A (en) * 2020-05-27 2020-09-04 北京百度网讯科技有限公司 Method and apparatus for detecting small targets
CN112597887A (en) * 2020-12-22 2021-04-02 深圳集智数字科技有限公司 Target identification method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HAIPENG ZHAO: "Mixed YOLOv3-LITE:A Lightweight Real-Time Object Detection Method", 《SENSORS》 *
OMKAR MASUREKAR等: "Real Time Object Detection Using YOLOv3", 《INTERNATIONAL RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY (IRJET)》 *
周慧等: "基于特征金字塔模型的高分辨率遥感图像船舶目标检测", 《大连海事大学学报》 *
崔艳鹏等: "一种改进 YOLOv3的动态小目标检测方法", 《西安电子科技大学学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI779784B (en) * 2021-08-19 2022-10-01 中華電信股份有限公司 Feature analysis system, method and computer readable medium thereof
CN114140427A (en) * 2021-11-30 2022-03-04 深圳集智数字科技有限公司 Object detection method and device
CN115601357A (en) * 2022-11-29 2023-01-13 南京航空航天大学(Cn) Stamping part surface defect detection method based on small sample
CN115937794A (en) * 2023-03-08 2023-04-07 北京龙智数科科技服务有限公司 Small target object detection method and device, electronic equipment and storage medium
CN115937794B (en) * 2023-03-08 2023-08-15 成都须弥云图建筑设计有限公司 Small target object detection method and device, electronic equipment and storage medium
CN117197756A (en) * 2023-11-03 2023-12-08 深圳金三立视频科技股份有限公司 Hidden danger area intrusion detection method, device, equipment and storage medium
CN117197756B (en) * 2023-11-03 2024-02-27 深圳金三立视频科技股份有限公司 Hidden danger area intrusion detection method, device, equipment and storage medium
CN117745595A (en) * 2024-02-18 2024-03-22 珠海金山办公软件有限公司 Image processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113221925B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN113221925B (en) Target detection method and device based on multi-scale image
US10671855B2 (en) Video object segmentation by reference-guided mask propagation
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
CN111652218A (en) Text detection method, electronic device and computer readable medium
CN111507333B (en) Image correction method and device, electronic equipment and storage medium
CN111968064B (en) Image processing method and device, electronic equipment and storage medium
KR20210025942A (en) Method for stereo matching usiing end-to-end convolutional neural network
CN110580680B (en) Face super-resolution method and device based on combined learning
US9025889B2 (en) Method, apparatus and computer program product for providing pattern detection with unknown noise levels
CN111784762B (en) Method and device for extracting blood vessel center line of X-ray radiography image
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
Couturier et al. Image denoising using a deep encoder-decoder network with skip connections
CN113610087B (en) Priori super-resolution-based image small target detection method and storage medium
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
Zhang et al. Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filtering
CN117593187A (en) Remote sensing image super-resolution reconstruction method based on meta-learning and transducer
CN117011137B (en) Image stitching method, device and equipment based on RGB similarity feature matching
CN116612280A (en) Vehicle segmentation method, device, computer equipment and computer readable storage medium
Zheng et al. Joint residual pyramid for joint image super-resolution
Zhou Superresolution reconstruction of remote sensing image based on generative adversarial network
Yu et al. Intensity guided depth upsampling using edge sparsity and super-weighted $ l_0 $ gradient minimization
Cho et al. Depth map up-sampling using cost-volume filtering
KR20220079125A (en) System and method for semi-supervised single image depth estimation and computer program for the same
CN114529828A (en) Method, device and equipment for extracting residential area elements of remote sensing image
CN112634298A (en) Image processing method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant