CN113011415A

CN113011415A - Improved target detection method and system based on Grid R-CNN model

Info

Publication number: CN113011415A
Application number: CN202011343605.2A
Authority: CN
Inventors: 刘嵩; 周梓涵
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-06-22

Abstract

The invention discloses a Grid R-CNN model-based improved target detection method and a system, comprising the following steps: acquiring a target image to be detected, performing image complexity processing on the image, and then distributing the image to different feature extraction networks to extract feature maps; for the extracted feature map, determining the position of an anchor frame by using grid-guided positioning; dynamically determining the shape of the anchor frame by combining shape prediction based on the position of the anchor frame, and determining the anchor frame with the anchor branch by the position and the shape; processing the anchor frame with the anchor branches and the FSAF branches in parallel to obtain a more accurate anchor frame; and based on the obtained anchor frame, classifying and positioning the target by utilizing a cascade detector. The invention has the beneficial effects that: according to the invention, the complexity of the image is processed, and the number of layers of the network is extracted according to the threshold range feature belonging to the complexity of the feature map, so that the accuracy of feature extraction can be improved. The position prediction is combined with the shape prediction, and the accuracy of the anchor frame is improved by continuously increasing the threshold of the detector.

Description

Improved target detection method and system based on Grid R-CNN model

Technical Field

The invention relates to the technical field of target detection, in particular to a Grid R-CNN (Grid area convolutional neural network) model-based improved target detection method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the increase of the computing speed of computers, the artificial intelligence technology continuously obtains new achievements in a plurality of fields. Deep learning, one of the most important artificial intelligence techniques, is one of the latest trends of machine learning and artificial intelligence research.

The object detection is one of basic tasks to be solved in the field of computer vision, and the purpose of the object detection is to determine the position of a target object in an image or a video and the type of the target. With the research on the machine learning theory technology, the detection of the specific class of targets has been achieved well in recent ten years. However, the real environment is complex and various, and many problems are encountered when detecting the target. With the introduction of the deep learning concept, the research on the target detection is not limited to the detection of a certain class of targets, but extends to the detection of multiple classes of targets.

With the proposal of the deep neural network, the target detection is further developed, and the overall method is divided into two methods, namely an anchor frame-based target detection method and an anchor-free target detection method. The method based on the anchor frame is to preset a large number of anchor points for prediction on an image, further refine the anchor points and finally obtain an accurate detection result. The anchorless approach is to use a center point or region of the object to determine the location of the target.

With the increase of the complexity of the picture and the increase of the information contained in the picture, the target detection is more and more difficult, and the accurate positioning and classification of the target is the most important of the target detection.

The existing target detection method achieves end-to-end training and improves the performance of target detection. However, a large number of anchor frames are generally generated, and since the size of the anchor frame is fixed, but the size of the target object to be detected is not fixed, the problem that an object with an extreme size cannot be accurately detected is caused.

Grid R-CNN is a relatively excellent target detection algorithm at present. Although Grid R-CNN makes certain improvement in determination of the shape of the anchor frame, the anchor frame generated by the method cannot accurately mark the target object. And the Grid R-CNN uses a detector with a single IOU threshold value for detection, so that the problem that the anchor frame cannot be positioned more accurately exists.

Disclosure of Invention

In order to solve the problems, the invention provides a Grid R-CNN model-based improved target detection method and system. In the method, the shape prediction method in GA-RPN is used to replace the anchor frame prediction of Grid R-CNN, so that more accurate anchor frame shape can be obtained. Meanwhile, in order to solve the problem of selecting the overlapped anchor frame, the FSAF branch is adopted, and the anchor branch and the Grid R-CNN have anchor branches to perform maximum suppression in parallel, so that a proper anchor frame is selected.

In the implementation mode of the method, the following technical scheme is adopted:

a Grid R-CNN model-based improved target detection method comprises the following steps:

acquiring a target image to be detected, performing image complexity processing on the image, and then distributing the image to different feature extraction networks to extract feature maps;

for the extracted feature map, determining the position of an anchor frame by using grid-guided positioning;

dynamically determining the shape of the anchor frame by combining shape prediction based on the position of the anchor frame, and determining the anchor frame with the anchor branch by the position and the shape;

processing the anchor frame with the anchor branches and the FSAF branches in parallel to obtain a more accurate anchor frame;

and based on the obtained anchor frame, classifying and positioning the target by utilizing a cascade detector.

In other embodiments, the following technical solutions are adopted:

an improved target detection system based on a Grid R-CNN model comprises:

a module for obtaining a target image to be detected, processing the image complexity of the image, and then distributing the image to different feature extraction networks for extracting feature maps;

means for determining a location of an anchor box using grid-guided positioning for the extracted feature map;

means for dynamically determining an anchor frame with anchor branches in conjunction with shape prediction based on anchor frame position;

a module for processing the anchor frame with the anchor branch and the FSAF branch in parallel to obtain a more accurate anchor frame;

and a module for classifying and positioning the target by using the cascade detector based on the obtained anchor frame.

In other embodiments, the following technical solutions are adopted:

a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer-readable storage medium is configured to store a plurality of instructions adapted to be loaded by a processor and to perform the above-described method for object detection based on the Grid R-CNN model refinement.

In other embodiments, the following technical solutions are adopted:

a computer-readable storage medium, wherein a plurality of instructions are stored, the instructions being adapted to be loaded by a processor of a terminal device and to execute the above-mentioned Grid R-CNN model-based improved object detection method.

Compared with the prior art, the invention has the beneficial effects that:

(1) according to the invention, the complexity of the image is processed, and the number of layers of the network is extracted according to the threshold range feature belonging to the complexity of the feature map, so that the accuracy of feature extraction can be improved.

(2) The GA-RPN shape prediction method is used for replacing the Grid R-CNN anchor frame prediction, the GA-RPN anchor frame shape prediction method does not change the position of the anchor frame, and the accuracy of the position of the anchor frame can be ensured to the greatest extent; the anchor frame with the dynamic size can better predict an extremely high or wide object, so that a more accurate anchor frame shape can be obtained.

(3) The invention uses the anchor-free mode of FSAF branch to code the example, and uses the output prediction and the anchor branch to process in parallel, and the two branches work together to achieve the purpose of automatically selecting the best feature.

(4) The Cascade detection method of Cascade R-CNN is used, the output of the previous stage is used as the input of the next stage, and a better detection result is obtained through a continuously improved threshold value.

Additional features and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flow chart of an improved target detection method based on a Grid R-CNN model in the embodiment of the present invention.

FIG. 2 is a diagram of a cascaded detector method in an embodiment of the invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

In one or more embodiments, an improved target detection method based on a Grid R-CNN model is disclosed, as shown in fig. 1, including the following steps:

(1) acquiring a target image to be detected, performing image complexity processing on the image, and then distributing the image to different feature extraction networks to extract feature maps;

specifically, firstly, the image complexity of the picture to be detected is calculated, and the feature map complexity calculation method is as follows:

where C represents the complexity of the image texture, z represents the gray scale, p (z)_i) A histogram representing z, L representing the number of different grey levels, m representing the mean of z, is i representing the number of all individuals occupied.

The feature extraction is to extract all possible information that a target object may exist in an image on an input image. Due to the fact that the complexity of the images is different, the images with different complexity need feature extraction networks with different scales.

In this embodiment, the number of network layers for feature extraction is determined according to the threshold range to which the image complexity belongs. The pictures with higher complexity enter the processing network with more network layers, and the pictures with lower complexity enter the processing network with less network layers.

(2) The location of the anchor box is determined using grid-guided positioning based on the extracted feature map.

In this embodiment, it is necessary to mark the anchor frame as accurate as possible in the extracted candidate regions. The anchor frame is a bounding box which is generated by taking each pixel as a center and has different length-width ratios and sizes and is determined by combining the coordinate of one pixel point with the length and the width.

First, we use the Grid-guided positioning proposed in Grid R-CNN to determine the location of the anchor frame; the method specifically comprises the following steps:

wherein (H)_x，H_y) Is the pixel coordinate on the heat map output by the trellis prediction branch, (I)_x，I_y) Is the coordinate on the original image, (P)_x，P_y) Is the coordinate of the upper left corner of the input image, w_pAnd h_pIs the length and the sum of the candidate regions, w₀And h₀Is the length and width of the output in the heatmap.

(3) Dynamically determining the shape of the anchor frame by combining shape prediction based on the position of the anchor frame, and determining the anchor frame with the anchor branch by the position and the shape;

dynamically determining the shape of an anchor frame by using a GA-RPN shape prediction method; the shape prediction is not a predefined fixed-size anchor frame, but is dynamically variable, with a dynamic anchor frame at a determined position. Anchor boxes with dynamic sizes may be better predicted. The shape at each location is determined using the shape prediction method in the GA-RPN.

The GA-RPN anchor frame shape prediction method does not change the position of the anchor frame, and can ensure the accuracy of the position of the anchor frame to the maximum extent.

In the embodiment, the GA-RPN shape prediction method is used for replacing the Grid R-CNN anchor frame prediction, so that more accurate anchor frame shapes can be obtained.

Since the method of determining the anchor frame of Grid R-CNN is not sufficient to accurately locate the boundary of the target object, the present embodiment proposes to determine the shape of the anchor frame using shape prediction of GA-RPN. The shape prediction is not a predefined fixed-size anchor frame, but is dynamically variable, with a dynamic anchor frame at a determined position. An anchor frame with dynamic dimensions may better predict objects that are extremely high or wide.

The GA-RPN anchor frame shape prediction method does not change the position of an anchor frame, and can ensure the accuracy of the position of the anchor frame to the maximum extent, which is different from the traditional bounding box regression. The shape prediction is to predict the length h and the width w of the anchor frame, and since the numerical range of h and w is large, it is difficult to directly predict the two numerical values, so the embodiment converts h and w. The calculation formula is as follows:

w＝σ·s·e^dw，h＝σ·s·e^dh

where s is the step size, σ is the empirical scaling factor, and dw and dh are the mapping of (w, h). The value [0,1000] is converted into [ -1,1] through the formula, the value space is reduced, and the learning target is easier.

In the concrete designUsing a sub-network N_sTo perform shape prediction, which is a two-channel 1 × 1 convolutional layer containing dw and dh values.

Due to the dynamic random size, the shape prediction has higher recall rate, and extremely wide or extremely high objects can be predicted better.

(4) According to the determined anchor frame with the anchor branch, the anchor frame and the FSAF branch are processed in parallel, and a more accurate anchor frame is selected;

the FSAF anchorless branch and the anchored branch guided and positioned by the anchor frame are processed in parallel, and two anchorless convolution layers (conv) are additionally added on the basis of the FSAF anchorless branch and the branch based on the anchor frame, so that the anchorless branch and the anchored branch work jointly, and the optimal anchor frame is selected.

(5) And classifying and positioning the target by utilizing the cascade detector according to the determined more accurate anchor frame.

By utilizing a Cascade R-CNN Cascade method, each detector model is trained by cascading a plurality of detector modules, IOU thresholds which are continuously improved in three stages are set, an anchor frame output by a detector in the previous stage is used as input of pooling in the next stage, and a better detection effect is obtained through the continuously improved thresholds.

As shown in FIG. 2, the first stage is backbone network processing, and then using the abstraction header network (N)₀) A preliminary prediction is generated. The second phase is candidate area extraction network subnetwork (N)₁) For the extraction head network (N)₀) Generated prediction result (Bbox)₀) And (6) processing. Third and fourth stages replace the region of interest sub-network with N on the basis of repeating the second stage₂And N₃. Equation (1) is the loss equation for the cascaded detector. The main purpose of the cascade is to train for the next stage to find a higher positive sample of the IOU by adjusting the bounding box. The performance of the detector is improved by the ever-increasing IOU threshold.

L(x^t,g)＝L_cls(h_t(x^t),y^t)+λ[y^t≥1]L_loc(f_t(x^t,b^t),g) (1)

Wherein, b^t＝f_t-1(x^t-1,b^t-1) G is x^tThe basic truth object of, lambda trade-off coefficient, [ ·]Indicating function, y^tIs x^tThe label of (1).

Example two

In one or more embodiments, disclosed is an improved object detection system based on a Grid R-CNN model, comprising:

It should be noted that the specific working process of the above module is the same as the method disclosed in the first embodiment, and is not described again.

EXAMPLE III

In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements an object detection method based on a Grid R-CNN model improvement in embodiment one when executing the program. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The Grid R-CNN model-based improved target detection method in the first embodiment may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Example four

In one or more embodiments, a computer-readable storage medium is disclosed, in which a plurality of instructions are stored, the instructions being adapted to be loaded by a processor of a terminal device and implementing the Grid R-CNN model-based improved object detection method described in the first embodiment.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A Grid R-CNN model-based improved target detection method is characterized by comprising the following steps:

2. The Grid R-CNN model-based improved target detection method as claimed in claim 1, wherein the image complexity processing is performed on the image, and the specific process comprises:

calculating the complexity of the pictures to be detected, determining the number of network layers for feature extraction according to the threshold range to which the complexity of the pictures belongs, and inputting the pictures with different complexities into feature extraction networks with different numbers of layers.

3. The improved object detection method based on the Grid R-CNN model as claimed in claim 1, wherein the Grid-guided localization is used to determine the position of the anchor frame by the following specific procedures:

and predicting the positions of the predefined grid points by using the full convolution network, and selecting one point with the maximum probability value from the generated grid points as the position location of the prediction anchor frame.

4. The improved target detection method based on the Grid R-CNN model as claimed in claim 1, wherein the anchor frame with anchor branches is dynamically determined based on the anchor frame position in combination with shape prediction, specifically comprising:

dynamically determining the shape of an anchor frame by using a GA-RPN shape prediction method; the size of the anchor frame is dynamically variable.

5. The improved target detection method based on the Grid R-CNN model as claimed in claim 1, wherein processing the anchor frame with the anchor branch and the FSAF branch in parallel specifically comprises:

and additionally adding two non-anchored convolution layers on the basis of the anchored branch, wherein the two non-anchored convolution layers are respectively used for classifying the branch and predicting regression.

6. The improved target detection method based on the Grid R-CNN model as claimed in claim 1, wherein based on the obtained anchor frame, the classification and positioning of the target are performed by using a cascade detector, the specific process is as follows:

7. An improved target detection system based on a Grid R-CNN model is characterized by comprising:

8. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for object detection based on the Grid R-CNN model refinement of any of claims 1-6.

9. A computer-readable storage medium having stored therein a plurality of instructions, wherein the instructions are adapted to be loaded by a processor of a terminal device and to perform the Grid R-CNN model-based improved object detection method according to any one of claims 1 to 6.