CN113420840B

CN113420840B - Target detection method and system based on low-resolution image

Info

Publication number: CN113420840B
Application number: CN202110964770.8A
Authority: CN
Inventors: 魏文憬; 郭骏; 潘正颐; 侯大为
Original assignee: Changzhou Weiyizhi Technology Co Ltd
Current assignee: Changzhou Weiyizhi Technology Co Ltd
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2021-12-21
Anticipated expiration: 2041-08-23
Also published as: CN113420840A

Abstract

The invention provides a target detection method and a system based on a low-resolution image, and relates to the technical field of target detection, wherein the method comprises the following steps: acquiring a low-resolution image of a target to be detected; extracting the features of the low-resolution image to obtain a corresponding feature map; processing the characteristic diagram to obtain a corresponding detection frame; processing the detection frame to obtain the coordinate and classification probability score of the detection frame; and detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame. The invention can extract the characteristic diagram with deep semantics without amplifying the low-resolution image, thereby effectively reducing the detection time on the premise of not losing the detection precision.

Description

Target detection method and system based on low-resolution image

Technical Field

The invention relates to the technical field of target detection, in particular to a target detection method based on a low-resolution image and a target detection system based on the low-resolution image.

Background

In recent years, low resolution object recognition technology based on deep learning continues to receive a great deal of attention from both academic and industrial circles. For example, the task of intuitive understanding of handwritten digit strings is to find the digits in a picture and then recognize the digits, which is very similar to target detection. However, these methods based on target detection are to enlarge the image to a certain size for detection, and have some problems: firstly, unnecessary time consumption can be caused by amplifying a low-resolution image and then extracting features from the low-resolution image; secondly, if the low-resolution image is not amplified and is directly sent to a network, a small feature map is generated in the process of extracting the image features, so that a correct detection frame cannot be generated.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, an object of the present invention is to provide a target detection method based on a low-resolution image, which can extract a feature map with deep semantics without enlarging the low-resolution image, so as to effectively reduce the detection time without losing the detection accuracy.

A second object of the present invention is to provide a low resolution image based object detection system.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for detecting an object based on a low-resolution image, including the following steps: acquiring a low-resolution image of a target to be detected; extracting the features of the low-resolution image to obtain a corresponding feature map; processing the characteristic diagram to obtain a corresponding detection frame; processing the detection frame to obtain the coordinate and the classification probability score of the detection frame; and detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame.

According to the target detection method based on the low-resolution image, which is provided by the embodiment of the invention, the low-resolution image is subjected to feature extraction to obtain the corresponding feature map, the feature map is processed to obtain the corresponding detection frame, the detection frame is processed to obtain the coordinate and the classification probability score of the detection frame, and finally the target to be detected is detected and classified according to the coordinate and the classification probability score of the detection frame.

In addition, the target detection method based on the low-resolution image proposed according to the above embodiment of the present invention may further have the following additional technical features:

according to an embodiment of the present invention, the feature extraction of the low-resolution image to obtain a corresponding feature map includes the following steps: and performing feature extraction on the low-resolution image by adopting a low-resolution applicable convolutional neural network to obtain a corresponding feature map.

According to an embodiment of the present invention, the low resolution adaptive convolutional neural network includes a first stage, a second stage, and a third stage of a residual error network, and further includes: a first upsampling, the first upsampling connected to a first stage of the residual network; a first convolution layer connected to the first upsampling; a second upsampling, the second upsampling connected to a second stage of the residual network; a second convolutional layer connected to a second upsampling; a third upsampling, the third upsampling connected to a third stage of the residual network.

According to an embodiment of the present invention, the feature map is formed by element-by-element addition of output quantities of the first convolution layer, the second convolution layer, and the third upsampled, and a length and a width of the feature map are half of a length and a width of the low resolution image, respectively.

According to an embodiment of the present invention, processing the feature map to obtain a corresponding detection frame includes the following steps: processing the feature map by adopting a regional candidate network to obtain different types of candidate frames; and processing the feature map and the different types of candidate frames by adopting an interested area pooling layer to obtain corresponding detection frames.

According to an embodiment of the present invention, a first Non-Maximum Suppression algorithm and a second Non-Maximum Suppression algorithm are disposed in the region of interest pooling layer, wherein the first Non-Maximum Suppression algorithm is a Non-classified Non-Maximum Suppression algorithm (Non-classified Non-Maximum Suppression algorithm) of the NC-NMS, and the NC-NMS Non-Maximum Suppression algorithm is configured to suppress different types of candidate frames of the target to be detected as the same type, so as to reserve a candidate frame with a largest prediction score among the different types of candidate frames; the second non-maximum suppression algorithm is a top-N score selection non-maximum suppression algorithm (N score selection non-maximum suppression algorithms with the highest score), and the top-N score selection non-maximum suppression algorithm is used for suppressing the different types of candidate frames according to the length of the target to be detected.

According to an embodiment of the present invention, processing the detection frame to obtain the coordinate and the classification probability score of the detection frame includes the following steps: processing the detection frame by adopting a first full-connection layer to obtain a coordinate of the detection frame; processing the detection frame by adopting a second full-connection layer to obtain a classification probability score of the detection frame; wherein the first fully-connected layer and the second fully-connected layer are arranged in parallel.

In order to achieve the above object, a second aspect of the present invention provides an object detection system based on low resolution images, including: the acquisition module is used for acquiring a low-resolution image of a target to be detected; the characteristic extraction module is used for extracting the characteristics of the low-resolution image to obtain a corresponding characteristic diagram; the first processing module is used for processing the characteristic diagram to obtain a corresponding detection frame; the second processing module is used for processing the detection frame to obtain the coordinate and the classification probability score of the detection frame; and the detection module is used for detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame.

According to the target detection system based on the low-resolution image, which is provided by the embodiment of the invention, the characteristic extraction module is used for extracting the characteristics of the low-resolution image to obtain the corresponding characteristic diagram, the first processing module is used for processing the characteristic diagram to obtain the corresponding detection frame, the second processing module is used for processing the detection frame to obtain the coordinate and the classification probability score of the detection frame, and finally the detection module is used for detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame.

To achieve the above object, an embodiment of a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the low resolution image-based object detection method according to any one of the embodiments of the first aspect.

According to the computer device provided by the embodiment of the present invention, when the processor executes the computer program, the target detection method based on the low resolution image according to any one of the embodiments of the first aspect is implemented, so that the feature map having deep semantics can be extracted without enlarging the low resolution image, and thus the detection time can be effectively reduced without losing the detection accuracy.

To achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, the program, when executed by a processor, implementing the low resolution image-based object detection method according to any one of the first aspect of the present invention.

According to an embodiment of the present invention, there is provided a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the low-resolution image-based object detection method according to any one of the first to fourth embodiments, so that a feature map having deep semantics can be extracted without enlarging a low-resolution image, and detection time can be effectively reduced without losing detection accuracy.

Drawings

FIG. 1 is a flow chart of a method for detecting a target based on a low resolution image according to an embodiment of the present invention;

FIG. 2 is a block diagram of a low resolution adaptive convolutional neural network in accordance with one embodiment of the present invention;

fig. 3 is a flow chart of a first non-maximum suppression algorithm in accordance with one embodiment of the present invention;

fig. 4 is a flow chart of a second non-maximum suppression algorithm in accordance with one embodiment of the present invention;

FIG. 5(a) a low resolution diagram of the digit string "2154" of one embodiment of the invention;

FIG. 5(b) is a graph of the processing results of a prior art non-maximum suppression algorithm on a low resolution graph of a string "2154" in accordance with one embodiment of the present invention;

fig. 5(c) is a diagram of the processing result of the first non-maximum suppression algorithm for the low resolution graph of the string "2154" according to an embodiment of the present invention;

FIG. 6(a) a low resolution diagram of the digit string "70889" of one embodiment of the present invention;

FIG. 6(b) is a graph of the processing results of a prior art non-maximum suppression algorithm on a low resolution graph of a string "70889" in accordance with one embodiment of the present invention;

fig. 6(c) is a diagram illustrating the processing result of the second non-maximum suppression algorithm according to an embodiment of the present invention on a low resolution graph of the logarithmic string "70889";

FIG. 7 is a flowchart of a method for low resolution image based object detection according to an embodiment of the present invention;

FIG. 8 is a block diagram of a low resolution image based object detection system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a target detection method based on a low-resolution image according to an embodiment of the present invention.

As shown in fig. 1, the target detection method based on low resolution images of the embodiment of the present invention includes the following steps:

and S1, acquiring a low-resolution image of the object to be detected.

Specifically, a camera may be used to acquire image information of the target to be detected, and the image information may include a low-resolution image of the target to be detected.

And S2, extracting the features of the low-resolution image to obtain a corresponding feature map.

Specifically, a low-resolution convolution neural network can be adopted to perform feature extraction on the low-resolution image to obtain a corresponding feature map. As shown in fig. 2, the low-resolution applicable convolutional neural network includes a first stage, a second stage, and a third stage of a residual network, for example, a ResNet-50 residual network may be used, and a fourth stage and a fifth stage in the ResNet-50 residual network, that is, stage 4 and stage 5, may be removed, and the first stage s1, the second stage s2, and the third stage s3, that is, stage 1, stage 2, and stage 3, are reserved as network bodies of the low-resolution applicable convolutional neural network. By removing the fourth stage and the fifth stage in the residual error network, the detection speed can be improved on the basis of ensuring the detection precision.

In addition, as shown in fig. 2, the low resolution adaptive convolutional neural network further includes: the first up-sample u1, i.e. up-Sampling 1, the first up-sample u1, i.e. up-Sampling 1, is connected to the first stage s1, i.e. stage 1, of the residual network; a first convolution layer c1, conv 1, a first convolution layer c1, conv 1, is connected to a first upsampling u1, up-Sampling 1; a second upsampling u2, i.e. up-Sampling 2, a second upsampling u2, i.e. up-Sampling 2, is connected to the second stage s2, i.e. stage 2, of the residual network; a second convolutional layer c2, conv 2, a second convolutional layer c2, conv 2, is connected to a second upsampling u2, up-Sampling 2; the third upsampling u3, i.e. up-Sampling 3, and the third upsampling u3, i.e. up-Sampling 3, are connected to the third stage s3, i.e. stage 3, of the residual network. The feature map p4 is formed by adding the outputs of the first convolution layer c1, i.e. conv 1, the second convolution layer c2, i.e. conv 2, and the third upsampling u3, i.e. up-Sampling 3 element by element, specifically, a highway mechanism may be adopted to add the outputs of the first convolution layer c1, i.e. conv 1, the second convolution layer c2, i.e. conv 2 element by element, and then add the third upsampling u3, i.e. the output of up-Sampling 3, and the length and width of the finally obtained feature map are half of the length and width of the original low-resolution image respectively. By means of the upsampling and highway mechanism, the feature map with deep semantics can be obtained under the condition that the low-resolution image is not amplified.

Based on the above structure, the low resolution adaptive convolutional neural network according to the embodiment of the present invention can be formed, and the operation process of the low resolution adaptive convolutional neural network according to the embodiment of the present invention will be specifically described below with reference to fig. 2.

As shown in FIG. 2, an input low resolution image, such as a 160 × 32 × 1 low resolution image, may first go through a first stage s1, stage 1The result is a 40 × 8 × 64 image, and the 40 × 8 × 64 image may be input into a first upsampling u1, i.e. up-Sampling 1, e.g. 2 × 2 upsampling results in an 80 × 16 × 64 image ((ii))x1) Further, the 80 × 16 × 64 image (x1) A first convolution layer c1, conv 1, can be input, for example a convolution layer of 3 × 3 results in a first characteristic map p1 of 80 × 16 × 512; in addition, the 80 × 16 × 64 image (x1) The second stage s2 may also be input, i.e. stage 2 processing results in a 40 × 8 × 256 image, which in turn may be input to a second upsampling u2, i.e. up-Sampling 2, e.g. 2 × 2 upsampling results in an 80 × 16 × 256 image: (b) ((r))x2) Further, the 80 × 16 × 256 image (x2) A second convolutional layer c2, conv 2, can be input, for example a convolutional layer of 3 × 3 yields a second profile p2 of 80 × 16 × 512; in addition, the 80 × 16 × 256 image (x2) The third stage s3 may also be input, i.e. stage 3 processing results in a 40 × 8 × 512 image, which in turn may be input to a third upsampling u3, i.e. up-Sampling 3, e.g. 2 × 2 upsampling results in an 80 × 16 × 512 image ((r))x3）。

Further, as shown in fig. 2, the output of the first convolutional layer, i.e. the first characteristic map p1 of 80 × 16 × 512, and the output of the second convolutional layer, i.e. the second characteristic map p2 of 80 × 16 × 512, can be added element by using a highway mechanism to form an element-by-element addition map p3 of 80 × 16 × 512, and the element-by-element addition map p3 of 80 × 16 × 512 can also be added with the output of the third upsampling, i.e. the image of 80 × 16 × 512 (80 × 16 × 512) ((x3) By adding the feature maps p4 of 80 × 16 × 512 element by element, the length and width of the feature map finally obtained are half the length and width of the input low-resolution image, respectively, and feature maps having deep semantics can be extracted without enlarging the low-resolution image.

And S3, processing the feature map to obtain a corresponding detection frame.

Specifically, the feature map may be processed by using a regional candidate network to obtain different types of candidate frames, and the feature map and the different types of candidate frames may be processed by using a region of interest pooling layer to obtain corresponding detection frames. And a first non-maximum suppression algorithm and a second non-maximum suppression algorithm can be arranged in the region of interest pooling layer.

Specifically, as shown in fig. 3, the first Non-Maximum Suppression algorithm may be a Non-classified Non-Maximum Suppression algorithm (Non-classified Non-Maximum Suppression algorithm) of the NC-NMS, and the NC-NMS may be used to suppress different classes of candidate frames of the target to be detected as the same class, for example, a class set to which the target to be detected belongs may be used to suppress the different classes of candidate frames of the target to be detected as the same classCEach category candidate box in

And performing inhibition when the same category is used, so as to reserve the candidate box with the largest prediction score in the candidate boxes of different categories.

Specifically, as shown in fig. 4, the second non-maximum suppression algorithm may select a non-maximum suppression algorithm for the top-N score (the non-maximum suppression algorithm selected by the N scores highest), and the top-N score select the non-maximum suppression algorithm is used to suppress different categories of candidate frames according to the length of the target to be detected, for example, the different categories of candidate frames may be suppressed according to the known text length N of the target to be detected.

The prominent effect of the first non-maximum suppression algorithm, that is, the NC-NMS non-maximum suppression algorithm and the second non-maximum suppression algorithm, that is, the top-N score selection non-maximum suppression algorithm according to the embodiment of the present invention will be described below, taking the low-resolution handwritten images shown in fig. 5(a) and 6(a) as an example.

In practical applications, strokes resulting from different writing styles of people are connected, for example, the number string "2154" shown in fig. 5(a), wherein the number "1" exists in the number "4", and when the non-maximum suppression algorithm in the prior art is adopted to process the number, the result shown in fig. 5(b) can be obtained, and redundant detection boxes, for example, the detection boxes "2, 1, 5, 4, 1" appear, wherein the last appearing "1" is the redundant detection box; when the first non-maximum suppression algorithm, that is, the NC-NMS non-maximum suppression algorithm, according to the embodiment of the present invention is used to process the data, the result shown in fig. 5(c) can be obtained, and the detection frame of the corresponding number, that is, the detection frame of "2, 1, 5, 4" can be accurately obtained. From the results of fig. 5(b) and fig. 5(c), it can be seen that the first non-maximum suppression algorithm, that is, the NC-NMS non-maximum suppression algorithm, according to the embodiment of the present invention effectively suppresses the detection frame with the category "1", and can provide an accurate detection frame for subsequent detection, so that the detection accuracy can be improved.

In addition, in practical application, the length of the target to be detected, for example, the handwritten number string, can be determined, so that the length of the target to be detected can be introduced into the second non-maximum suppression algorithm, that is, the top-N score selection non-maximum suppression algorithm, to avoid an overdetection situation. When the non-maximum suppression algorithm in the prior art is used to process the string "70889" as shown in fig. 6(a), the result shown in fig. 6(b) is obtained, and 3 detection boxes with the number "8" are detected, wherein 1 is redundant; when the second non-maximum suppression algorithm, that is, the top-N fractional selection non-maximum suppression algorithm of the embodiment of the present invention is used to process it, the result shown in fig. 6(c) can be obtained. From the results of fig. 6(b) and fig. 6(c), it can be seen that the second non-maximum suppression algorithm, i.e., the top-N score selection non-maximum suppression algorithm according to the embodiment of the present invention can skip the traversal process when the known N detection frames with the highest score are selected, so as to prevent the occurrence of the overdetection situation.

And S4, processing the detection frame to obtain the coordinate and the classification probability score of the detection frame.

Specifically, the first full-link layer may be used to process the detection frame to obtain the coordinates of the detection frame, and the second full-link layer may be used to process the detection frame to obtain the classification probability score of the detection frame.

More specifically, as shown in fig. 7, the first fully-connected layer and the second fully-connected layer are arranged in parallel, and a Softmax function is further provided corresponding to the second fully-connected layer to process the detection frames to obtain corresponding classification probability scores.

And S5, detecting and classifying the target to be detected according to the coordinate of the detection frame and the classification probability score.

The following will specifically describe the flow of the target detection method based on the low-resolution image according to the embodiment of the present invention, taking the low-resolution image of the handwritten digit string as an example.

As shown in fig. 7, a low-resolution image of an object to be detected, for example, an "46982" image, may be input into a low-resolution object detection model, that is, a low-resolution applicable convolutional neural network, to obtain a convolutional feature map of the low-resolution image, for example, a "46982" image (the convolutional feature map is half the size of the low-resolution image, for example, a "46982" image), and then the convolutional feature map may be input into a region candidate network to obtain a digital candidate frame of any shape of the convolutional feature map, further, the convolutional feature map and the digital candidate frame of the feature map may be input into a region-of-interest pooling layer, the digital candidate frame of the convolutional feature map is converted into a fixed-length detection frame, for example, a 7 × 7 fixed-length detection frame, and finally the detection frame may obtain coordinates and classification probability scores of the detection frame through a first fully-connected layer and a second fully-connected layer which are juxtaposed, so as to detect and classify the target to be detected.

Corresponding to the embodiment, the invention further provides a target detection system based on the low-resolution image.

As shown in fig. 8, the target detection system based on low resolution images according to the embodiment of the present invention includes an acquisition module 10, a feature extraction module 20, a first processing module 30, a second processing module 40, and a detection module 50. The acquisition module 10 is configured to acquire a low-resolution image of a target to be detected; the feature extraction module 20 performs feature extraction on the low-resolution image to obtain a corresponding feature map; the first processing module 30 is configured to process the feature map to obtain a corresponding detection frame; the second processing module 40 is configured to process the detection frame to obtain a coordinate and a classification probability score of the detection frame; the detection module 50 is configured to detect and classify the target to be detected according to the coordinates and the classification probability scores of the detection frames.

In an embodiment of the present invention, the obtaining module 10 may be a camera, and may be configured to obtain image information of the object to be detected, and the image information may include a low-resolution image of the object to be detected.

In one embodiment of the present invention, feature extraction module 20 may be a low resolution adaptive convolutional neural network. Specifically, as shown in fig. 2, the feature extraction module 20, i.e., the low-resolution applicable convolutional neural network, may include a first stage, a second stage, and a third stage of the residual network, for example, a ResNet-50 residual network may be used, and a fourth stage and a fifth stage, i.e., stage 4 and stage 5, in the ResNet-50 residual network may be removed, leaving the first stage s1, the second stage s2, and the third stage s3, i.e., stage 1, stage 2, and stage 3, as the network main bodies of the low-resolution applicable convolutional neural network. By removing the fourth stage and the fifth stage in the residual error network, the detection speed can be improved on the basis of ensuring the detection precision.

In addition, as shown in fig. 2, the feature extraction module 20, i.e. the low resolution adaptive convolutional neural network, further includes: the first up-sample u1, i.e. up-Sampling 1, the first up-sample u1, i.e. up-Sampling 1, is connected to the first stage s1, i.e. stage 1, of the residual network; a first convolution layer c1, conv 1, a first convolution layer c1, conv 1, is connected to a first upsampling u1, up-Sampling 1; a second upsampling u2, i.e. up-Sampling 2, a second upsampling u2, i.e. up-Sampling 2, is connected to the second stage s2, i.e. stage 2, of the residual network; a second convolutional layer c2, conv 2, a second convolutional layer c2, conv 2, is connected to a second upsampling u2, up-Sampling 2; the third upsampling u3, i.e. up-Sampling 3, and the third upsampling u3, i.e. up-Sampling 3, are connected to the third stage s3, i.e. stage 3, of the residual network. The feature map p4 is formed by adding the outputs of the first convolution layer c1, i.e. conv 1, the second convolution layer c2, i.e. conv 2, and the third upsampling u3, i.e. up-Sampling 3 element by element, specifically, a highway mechanism may be adopted to add the outputs of the first convolution layer c1, i.e. conv 1, the second convolution layer c2, i.e. conv 2 element by element, and then add the third upsampling u3, i.e. the output of up-Sampling 3, and the length and width of the finally obtained feature map are half of the length and width of the original low-resolution image respectively. By means of the upsampling and highway mechanism, the feature map with deep semantics can be obtained under the condition that the low-resolution image is not amplified.

As shown in fig. 2, the input low resolution image, for example, 160 × 32 × 1 low resolution image may be processed first in the first stage s1, i.e., stage 1, to obtain 40 × 8 × 64 image, and then the 40 × 8 × 64 image may be input into the first upsampling u1, i.e., up-Sampling 1, for example, 2 × 2 upsampling to obtain 80 × 16 × 64 image (2 × 2)x1) Further, the 80 × 16 × 64 image (x1) A first convolution layer c1, conv 1, can be input, for example a convolution layer of 3 × 3 results in a first characteristic map p1 of 80 × 16 × 512; in addition, the 80 × 16 × 64 image (x1) The second stage s2 may also be input, i.e. stage 2 processing results in a 40 × 8 × 256 image, which in turn may be input to a second upsampling u2, i.e. up-Sampling 2, e.g. 2 × 2 upsampling results in an 80 × 16 × 256 image: (b) ((r))x2) Further, the 80 × 16 × 256 image (x2) A second convolutional layer c2, conv 2, can be input, for example a convolutional layer of 3 × 3 yields a second profile p2 of 80 × 16 × 512; in addition, the 80 × 16 × 256 image (x2) The third stage s3 may also be input, i.e. stage 3 processing results in a 40 × 8 × 512 image, which in turn may be input to a third upsampling u3, i.e. up-Sampling 3, e.g. 2 × 2 upsampling results in an 80 × 16 × 512 image ((r))x3）。

Further, as shown in fig. 2, the output of the first convolutional layer, i.e. the first characteristic map p1 of 80 × 16 × 512, and the output of the second convolutional layer can be combined by using highway mechanismThat is, the element-by-element addition of the 80 × 16 × 512 second feature map p2 forms an 80 × 16 × 512 element-by-element addition map p3, and the 80 × 16 × 512 element-by-element addition map p3 may also be added to the output quantity of the third upsampling, that is, the 80 × 16 × 512 image (80 × 16 × 512 image)x3) By adding the feature maps p4 of 80 × 16 × 512 element by element, the length and width of the feature map finally obtained are half the length and width of the input low-resolution image, respectively, and feature maps having deep semantics can be extracted without enlarging the low-resolution image.

In one embodiment of the invention, the first processing module 30 may include a region candidate network and a region of interest pooling layer. Specifically, the first processing module 30 may process the feature map by using the area candidate network to obtain different types of candidate frames, and may process the feature map and the different types of candidate frames by using the region of interest pooling layer to obtain corresponding detection frames. And a first non-maximum suppression algorithm and a second non-maximum suppression algorithm can be arranged in the region of interest pooling layer.

Specifically, as shown in fig. 3, the first Non-Maximum Suppression algorithm is a Non-classified Non-Maximum Suppression algorithm (Non-classified Non-Maximum Suppression algorithm) of the NC-NMS, and the NC-NMS Non-Maximum Suppression algorithm may be used to suppress different classes of candidate frames of the target to be detected as the same class, for example, a class set to which the target to be detected belongsCEach category candidate box in

Specifically, as shown in fig. 4, the second non-maximum suppression algorithm selects a non-maximum suppression algorithm for the top-N score (the non-maximum suppression algorithm selected by the N scores highest), and the top-N score selection non-maximum suppression algorithm is used to suppress different types of candidate frames according to the length of the target to be detected, for example, the different types of candidate frames may be suppressed according to the known text length N of the target to be detected.

In one embodiment of the present invention, the second processing module 40 may include a first fully connected layer and a second fully connected layer. Specifically, the second processing module 40 may process the detection frame by using the first full-link layer to obtain the coordinates of the detection frame, and may process the detection frame by using the second full-link layer to obtain the classification probability score of the detection frame.

More specifically, as shown in fig. 7, the first fully-connected layer and the second fully-connected layer may be arranged in parallel, and a Softmax function is further provided corresponding to the second fully-connected layer to process the detection frame to obtain a corresponding classification probability score.

The working process of the low-resolution image-based target detection system according to the embodiment of the present invention will be specifically described with reference to fig. 7 by taking the low-resolution image of the handwritten digit string as an example.

The invention further provides a computer device corresponding to the embodiment.

The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the target detection method based on the low-resolution image according to any one of the embodiments is realized.

According to the computer device provided by the embodiment of the invention, when the computer program is executed by the processor, the target detection method based on the low-resolution image in any one of the embodiments is realized, so that the feature map with deep semantics can be extracted without amplifying the low-resolution image, and the detection time can be effectively reduced on the premise of not losing the detection precision.

The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.

A non-transitory computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, implements a low resolution image-based object detection method according to any one of the above-described embodiments.

According to the non-transitory computer-readable storage medium of the embodiment of the present invention, a computer program is stored thereon, and when the computer program is executed by a processor, the method for detecting a target based on a low resolution image according to any of the above embodiments is implemented, so that a feature map having deep semantics can be extracted without enlarging the low resolution image, and thus the detection time can be effectively reduced without losing the detection accuracy.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A target detection method based on a low-resolution image is characterized by comprising the following steps:

acquiring a low-resolution image of a target to be detected;

performing feature extraction on the low-resolution image to obtain a corresponding feature map, specifically performing feature extraction on the low-resolution image by using a low-resolution applicable convolutional neural network to obtain a corresponding feature map, where the low-resolution applicable convolutional neural network includes a first stage, a second stage, and a third stage of a residual error network, and in addition, the low-resolution applicable convolutional neural network further includes:

a first upsampling, the first upsampled image input connected to the image output of the first stage of the residual network and the first upsampled image output connected to the image input of the second stage of the residual network; the image input end of the first convolution layer is connected with the image output end of the first up-sampling layer; a second upsampling, the second upsampled image input connected to the image output of the second stage of the residual error network and the second upsampled image output connected to the image input of the third stage of the residual error network; the image input end of the second convolution layer is connected with the image output end of the second up-sampling layer; a third up-sampling, wherein an image input end of the third up-sampling is connected with an image output end of a third stage of the residual error network;

wherein the feature map is formed by element-by-element addition of the output quantities of the first convolution layer, the second convolution layer and the third upsampled, and the length and width of the feature map are half of the length and width of the low resolution image, respectively;

processing the characteristic diagram to obtain a corresponding detection frame;

processing the detection frame to obtain the coordinate and the classification probability score of the detection frame;

and detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame.

2. The method for detecting the target based on the low-resolution image according to claim 1, wherein the processing of the feature map to obtain the corresponding detection frame comprises the following steps:

processing the feature map by adopting a regional candidate network to obtain different types of candidate frames;

and processing the feature map and the different types of candidate frames by adopting an interested area pooling layer to obtain corresponding detection frames.

3. The low resolution image based object detection method of claim 2, wherein a first non-maximum suppression algorithm and a second non-maximum suppression algorithm are provided within the region of interest pooling layer, wherein,

the first non-maximum suppression algorithm is an NC-NMS non-maximum suppression algorithm, and the NC-NMS non-maximum suppression algorithm is used for suppressing different types of candidate frames of the target to be detected as the same type so as to reserve the candidate frame with the largest prediction score in the different types of candidate frames;

the second non-maximum suppression algorithm is a top-N score selection non-maximum suppression algorithm which is used for suppressing the candidate frames of different categories according to the length of the target to be detected.

4. The method for detecting the target based on the low-resolution image as claimed in claim 3, wherein the step of processing the detection frame to obtain the coordinate and the classification probability score of the detection frame comprises the following steps:

processing the detection frame by adopting a first full-connection layer to obtain a coordinate of the detection frame;

processing the detection frame by adopting a second full-connection layer to obtain a classification probability score of the detection frame;

wherein the first fully-connected layer and the second fully-connected layer are arranged in parallel.

5. A low resolution image based object detection system, comprising:

the acquisition module is used for acquiring a low-resolution image of a target to be detected;

a feature extraction module, configured to perform feature extraction on the low-resolution image to obtain a corresponding feature map, where the feature extraction module specifically performs feature extraction on the low-resolution image by using a low-resolution applicable convolutional neural network to obtain a corresponding feature map, where the low-resolution applicable convolutional neural network includes a first stage, a second stage, and a third stage of a residual error network, and in addition, the low-resolution applicable convolutional neural network further includes:

the first processing module is used for processing the characteristic diagram to obtain a corresponding detection frame;

the second processing module is used for processing the detection frame to obtain the coordinate and the classification probability score of the detection frame;

and the detection module is used for detecting and classifying the target to be detected according to the coordinate and the classification probability score of the detection frame.

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the low resolution image based object detection method according to any one of claims 1 to 4.

7. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a low resolution image based object detection method according to any one of claims 1 to 4.