CN111950551A

CN111950551A - Target detection method based on convolutional neural network

Info

Publication number: CN111950551A
Application number: CN202010816397.7A
Authority: CN
Inventors: 李松江; 吴宁; 王鹏
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-17
Anticipated expiration: 2040-08-14
Also published as: CN111950551B

Abstract

The invention relates to a target detection method based on a convolutional neural network, which comprises the following steps: extracting features based on a residual volume neural network to obtain a layer-by-layer basic feature map; sequentially fusing the basic feature maps from shallow to deep to obtain fused feature maps; extracting candidate frames of the fusion characteristic graph based on a region generation network to obtain a candidate target region characteristic graph; obtaining a region-of-interest feature map according to the fusion feature map and the candidate target region feature map; and obtaining classification scores and frame regression based on the full convolution layer according to the interesting region feature graph. The invention has higher detection precision for small targets and shielded targets.

Description

Target detection method based on convolutional neural network

Technical Field

The invention relates to the technical field of image information processing, in particular to a target detection method based on a convolutional neural network.

Background

With the increasing road traffic pressure, intelligent management and control of road vehicles through computer technology has become a popular research; the road monitoring equipment is used for detecting the vehicle target, the vehicle data and the driving track of a road network are mastered on the premise of optimizing traffic and relieving traffic pressure, and meanwhile, the vehicle target detection is the research basis in the fields of unmanned driving, vehicle tracking and vehicle characteristic identification.

At present, a convolutional neural network is widely applied to the field of vehicle target detection, and is generally divided into a single-stage detection algorithm and a double-stage detection algorithm, wherein the single-stage detection algorithm is a regression-based target detection algorithm, and the double-stage detection algorithm firstly generates a candidate region and then carries out classification and refinement. Due to the difference of the algorithm structures, the double-stage detection algorithm has higher detection precision, but the detection speed is lower than that of the single-stage detection algorithm, so that the method is suitable for scenes with higher requirements on the detection precision.

The existing two-stage target detection algorithm has the following problems: due to the fact that the characteristics of the shielded target and the small target are few, the existing algorithm is insufficient in utilization of shallow position information and context information, and the detection accuracy of the small target and the shielded target is low.

Disclosure of Invention

The invention aims to provide a target detection method based on a convolutional neural network, which has higher detection precision for small targets and occluded targets.

In order to achieve the purpose, the invention provides the following scheme:

a target detection method based on a convolutional neural network comprises the following steps:

extracting features based on a residual volume neural network to obtain a layer-by-layer basic feature map;

sequentially fusing the basic feature maps from shallow to deep to obtain fused feature maps;

extracting candidate frames of the fusion characteristic graph based on a region generation network to obtain a candidate target region characteristic graph;

obtaining a region-of-interest feature map according to the fusion feature map and the candidate target region feature map;

and obtaining classification scores and frame regression based on the full convolution layer according to the interesting region feature graph.

Preferably, the basic feature map includes a first feature map, a second feature map, a third feature map, and a fourth feature map.

Preferably, the sequentially fusing the basic feature maps from shallow to deep to obtain a fused feature map, including:

carrying out down-sampling processing on the first feature map to obtain a down-sampling feature map;

performing convolution dimensionality reduction on the second feature map to obtain a dimensionality reduction feature map, wherein the number of channels of the dimensionality reduction feature map is the same as that of the channels of the downsampling feature map;

fusing the down-sampling feature map and the dimension reduction feature map to obtain an initial fusion feature map; and finally obtaining the fusion characteristic graph in the same way.

Preferably, the down-sampling processing on the first feature map to obtain a down-sampled feature map includes:

respectively carrying out downsampling processing on the first feature map based on convolution of n branch holes; n is a positive integer greater than 1;

and fusing the first feature maps subjected to downsampling processing by convolution of the branch holes to obtain the downsampled feature map.

Preferably, n is 3, and the void ratios of the 3 branches are 1, 2 and 3 respectively.

Preferably, the extracting candidate frames from the fusion feature map based on the region-based generation network to obtain a candidate target region feature map includes:

performing convolution processing on the fusion feature map based on a first set convolution core to obtain a first convolution feature map;

performing convolution processing on the first convolution feature map based on a second set convolution core to obtain a second convolution feature map;

performing convolution processing on the second convolution characteristic diagram based on a second set convolution core to obtain a third convolution characteristic diagram;

and respectively inputting the second convolution feature map and the third convolution feature map into two parallel full-connection layers, and processing based on a set anchor frame to obtain the candidate target area feature map.

Preferably, the obtaining of classification scores and bounding box regression based on a full convolution layer according to the region of interest feature map includes:

obtaining an initial classification score and an initial frame regression based on a full convolution layer according to the interesting region feature map;

replacing the set anchor frame with the initial frame regression, sequentially executing subsequent steps, and repeatedly executing the process m times by setting m threshold values to obtain the classification score and the frame regression; m is a positive integer greater than or equal to 1.

Preferably, the first set convolution kernel is 3 × 3; the second set convolution kernel is 1 × 1.

Preferably, the obtaining a feature map of a region of interest according to the fusion feature map and the candidate target region feature map includes:

fusing the fused feature map and the candidate target region feature map based on ROIAlign to obtain an initial region-of-interest feature map;

amplifying the initial interested area feature map according to a set multiple to obtain an amplified interested area feature map;

performing global context extraction on the initial region-of-interest feature map based on the amplified region-of-interest feature map to obtain context information;

and fusing the initial region-of-interest feature map and the context information based on ROIAlign to obtain the region-of-interest feature map.

Preferably, the residual convolutional neural network is a ResNet-101 network.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of a target detection method based on a convolutional neural network according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a flowchart of a target detection method based on a convolutional neural network, and as shown in fig. 1, the present invention provides a target detection method based on a convolutional neural network, including:

step S1, extracting features based on the residual error rolling neural network ResNet-101 to obtain a layer-by-layer basic feature map; the method specifically comprises a first characteristic diagram, a second characteristic diagram, a third characteristic diagram and a fourth characteristic diagram. In this embodiment, the specific conditions of each convolutional layer of ResNet-101 are shown in Table 1.

TABLE 1 respective convolution layers of ResNet-101

Where w is the width of the region of interest and h is the height of the region of interest.

And step S2, sequentially fusing the basic feature maps from shallow to deep to obtain a fused feature map.

Taking the first characteristic diagram and the second characteristic diagram as an example for integration, the specific process is as follows:

respectively carrying out downsampling processing on the first feature map based on convolution of n branch holes; n is a positive integer greater than 1. In this embodiment, n is 3, the size of the convolution kernel is 3 × 3, and the convolution step size is 2; the void ratios of the 3 branches are 1, 2 and 3 respectively.

And fusing the first feature maps subjected to downsampling processing by convolution of the branch holes to obtain the downsampled feature map. The specific calculation formula is as follows:

F＝H_3,1(x)+H_3,2(x)+H_3,3(x)；

in the formula: f denotes a down-sampled feature map after fusion, H_k,r,(x)Indicating a hole convolution, k indicating the convolution kernel size, r indicating the hole rate, and x being the first characteristic diagram.

And performing convolution dimensionality reduction on the second feature map by adopting a 1 multiplied by 1 convolution kernel to obtain a dimensionality reduction feature map, wherein the number of channels of the dimensionality reduction feature map is the same as that of the downsampling feature map.

And fusing the downsampling feature map and the dimension reduction feature map to obtain an initial fusion feature map.

And sequentially fusing according to the steps to obtain the fused characteristic diagram.

And step S3, extracting candidate frames of the fusion characteristic graph based on the region generation network to obtain a candidate target region characteristic graph.

As an alternative embodiment, step S3 of the present invention includes:

step S31, performing convolution processing on the fusion feature map based on the first set convolution kernel to obtain a first convolution feature map. In this embodiment, the first set convolution kernel size is 3 × 3.

Step S32, performing convolution processing on the first convolution feature map based on a second set convolution kernel to obtain a second convolution feature map. In this embodiment, the second set convolution kernel size is 1 × 1.

Step S33, performing convolution processing on the second convolution feature map based on the second set convolution kernel to obtain a third convolution feature map.

And step S34, inputting the second convolution feature map and the third convolution feature map into two parallel full-connection layers respectively, and processing based on a set anchor frame to obtain the candidate target area feature map.

And step S4, obtaining a region-of-interest feature map according to the fusion feature map and the candidate target region feature map.

Specifically, the step S4 includes:

and step S41, fusing the fusion characteristic map and the candidate target region characteristic map based on ROI Align to obtain an initial region-of-interest characteristic map.

And step S42, carrying out amplification processing on the initial region-of-interest feature map according to a set multiple to obtain an amplified region-of-interest feature map. In this embodiment, the set multiple is 1.5.

And step S43, performing global context extraction in four directions of up, down, left and right on the initial region-of-interest feature map based on the amplified region-of-interest feature map to obtain context information.

And step S44, mapping the initial region-of-interest feature map and the context information into rectangular boxes with the same size based on ROIAlign, and fusing to obtain the region-of-interest feature map.

And step S5, obtaining classification scores and frame regression based on the full convolution layer according to the region of interest feature map.

Specifically, an initial classification score and an initial border regression are obtained based on a full convolution layer according to the region of interest feature map.

Replacing the set anchor frame with the initial frame regression, sequentially executing the subsequent steps, and repeatedly executing the process m times by setting m threshold values to obtain the classification score and the frame regression; m is a positive integer greater than or equal to 1. In this embodiment, m is 3, and the three thresholds are 0.5, 0.6, and 0.7, respectively.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A target detection method based on a convolutional neural network is characterized by comprising the following steps:

2. The convolutional neural network-based target detection method of claim 1, wherein the basic feature map comprises a first feature map, a second feature map, a third feature map and a fourth feature map.

3. The target detection method based on the convolutional neural network as claimed in claim 2, wherein the sequentially fusing the basic feature maps from shallow to deep to obtain a fused feature map comprises:

4. The convolutional neural network-based target detection method according to claim 3, wherein the downsampling the first feature map to obtain a downsampled feature map comprises:

5. The convolutional neural network-based target detection method as claimed in claim 4, wherein n is 3, and the void rates of 3 branches are 1, 2 and 3 respectively.

6. The convolutional neural network-based target detection method according to claim 1, wherein the region-based generation network performs candidate frame extraction on the fused feature map to obtain a candidate target region feature map, and the method comprises:

7. The convolutional neural network-based target detection method of claim 6, wherein the obtaining classification scores and bounding box regression based on a full convolutional layer according to the region of interest feature map comprises:

8. The convolutional neural network-based object detection method of claim 6, wherein the first set convolution kernel is 3 x 3; the second set convolution kernel is 1 × 1.

9. The convolutional neural network-based object detection method as claimed in claim 1, wherein said obtaining a region-of-interest feature map according to the fused feature map and the candidate object region feature map comprises:

fusing the fused feature map and the candidate target region feature map based on ROI Align to obtain an initial region-of-interest feature map;

and fusing the initial region-of-interest feature map and the context information based on ROI Align to obtain the region-of-interest feature map.

10. The convolutional neural network-based object detection method of claim 1, wherein the residual convolutional neural network is a ResNet-101 network.