CN113177511A

CN113177511A - Rotating frame intelligent perception target detection method based on multiple data streams

Info

Publication number: CN113177511A
Application number: CN202110549535.4A
Authority: CN
Inventors: 张智超; 尹晓晴; 李卫丽; 陈晖�; 邓劲生
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2021-07-27

Abstract

The invention discloses a rotating frame intelligent perception target detection method based on multiple data streams, which comprises the following steps: constructing a target detection data set, including a training set and a test set; constructing a rotating frame intelligent perception target detection network model based on multiple data streams, wherein the rotating frame intelligent perception target detection network model comprises a feature extraction sub-network, an attention intelligent identification module, a multi-scale feature fusion sub-network and a target detection sub-network; training a rotating frame intelligent perception target detection network model based on multiple data streams by using a training set; and carrying out target detection on the test set picture by using the trained target detection network model. The method for detecting the target through intelligent sensing of the rotating frame based on the multiple data streams detects an input picture, and the rotating frame attached to a target object in the picture is quickly and accurately generated, so that the specific position of target detection is obtained. Due to the introduction of intelligent recognition and multi-scale feature fusion technology, the background picture searching efficiency and the recognition accuracy and speed of targets with various sizes are greatly improved.

Description

Rotating frame intelligent perception target detection method based on multiple data streams

Technical Field

The invention relates to the technology of image recognition and target type estimation, in particular to a rotating frame intelligent perception target detection method based on multiple data streams.

Background

Object detection is a general term for identifying object properties, categories and probabilities in images and videos, and is one of computer vision core problems. Target object recognition is an important analysis processing stage in image processing and is a core foundation for target tracking, target segmentation and other visual field applications. The target detection adopts a frame selection mode to select the detected object, and also adopts a method that a central point represents the target object and the like.

The main disadvantages of the existing target detection method are that: (1) target object recognition can only use horizontal and vertical candidate frames, and is not close enough to the actual target. (2) Most of target object labeling adopts a manual labeling method, and no specific search strategy is provided to assist in quickly perceiving the association degree of peripheral objects and core target detection. (3) The detection can only be carried out aiming at a target object with a fixed size, and the detection is not flexible enough and has limited performance.

Disclosure of Invention

The invention aims to overcome the problems and provides a rotating frame intelligent perception target detection method based on multiple data streams, which utilizes a large amount of marked image data to automatically learn the characteristics and structural information of specific types of objects and predict and analyze the positions and types of related objects so as to solve the problems of insufficient accuracy of target detection modeling, inaccurate frame selection and insufficient intelligent degree. Because the invention introduces intelligent identification and multi-scale feature fusion technology, the background picture searching efficiency and the accuracy and speed of identifying targets with various sizes are greatly improved.

A rotating frame intelligent perception target detection method based on multiple data streams comprises the following steps:

the method comprises the following steps: constructing a target detection data set, wherein the target detection data set comprises a training set and a test set, the training set comprises pictures and text files, the text files are labeled with specific position, type and frame selection size information of targets in corresponding pictures, and the test set comprises pictures of targets to be detected;

step two: the method comprises the steps that a rotating frame intelligent perception target detection network model based on multiple data streams is built, the target detection network model comprises a feature extraction sub-network, an attention intelligent identification module, a multi-scale feature fusion sub-network and a target detection sub-network, input of the target detection network model is a common image, the common image is rapidly screened and identified on multiple scales through the attention intelligent identification module, and output of the image is an image containing framing information and category probability;

step three: training a rotating frame intelligent perception target detection network model based on multiple data streams by using the training set in the step one to obtain a trained target detection network model;

step four: and carrying out target detection on the test set picture by using the trained target detection network model.

The feature extraction sub-network extracts the pictures into a plurality of feature images with different sizes; the attention intelligent identification module associates similar positions and types in an image background, and a plurality of frames are selected to obtain candidate object type and position information; the multi-scale feature fusion sub-network fuses the multi-size feature images and performs feature fusion from coarse to fine; the target detection sub-network screens and outputs the object type and the position candidate frame obtained by fusion on the picture; the size of the pictures in the data set is H multiplied by W, wherein H and W respectively represent the height and the width of the image, and the output of the target detection network model is H multiplied by W1 prediction pictures containing rotating frames, prediction target types and probability sizes.

Specifically, the function of the target detection subnetwork in the second step is to predict and output a final target detection result, the input of the target detection subnetwork is target detection feature information subjected to fusion and refinement, the output of the target detection subnetwork is a final frame selection with rotation property, the frame selection is a quadrangular target frame surrounded by four point coordinates, and the type of the object and the possible probability belonging to the type are represented, and the size and the number of the output images processed by the network layer are as follows: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

Specifically, the target detection result in the second step is obtained through an intelligent similarity detection algorithm, the intelligent similarity detection algorithm is used for comparing and matching target key information obtained through feature extraction with peripheral image information, and when similar same key target information is found, targets with the same type in the region are deduced; the rotation boundary is obtained through a rotation candidate frame conversion algorithm, and the rotation candidate frame is designed for target detection of a specific shape and a specific angle, so that the identification of dense and tiny objects is facilitated.

Furthermore, the feature extraction sub-network in the second step includes 5 down-sampling neural network layers, the input of which is an image containing the information of the labeled position and the labeled type, and the output of which is a feature image of various sizes; the size and number of the output characteristic graphs of the network layer processing are H multiplied by W multiplied by 32, H/2 multiplied by W/2 multiplied by 64, H/4 multiplied by W/4 multiplied by 128, H/8 multiplied by W/8 multiplied by 256 and H/16 multiplied by W/16 multiplied by 512.

Specifically, the multi-scale feature fusion sub-network in the second step has the function of fusing and summarizing information of various sizes, inputting the information into information of multiple scales, and outputting the information into target detection feature information after fusion and refinement; the multi-scale feature fusion sub-network specifically comprises 3 convolutional neural network layers, and the size and the number of the images processed by the output network layers are as follows: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

Preferably, the training process of the multi-data-stream-based rotating-frame intelligent perception target detection network model described in step three uses a pytorech or tensrflow deep learning framework.

The invention has the beneficial effects that:

compared with the target detection method in the prior art, the method provided by the invention has the advantages that the method utilizes the multi-data-stream-based rotating frame intelligent perception target detection method to automatically learn the specific types, positions and probabilities of objects more accurately and more rapidly, the intelligent automation refinement degree is higher, the speed is high, the effect is good, and the identification types are multiple. The method has the advantages that the data stream features of multiple sizes are fully utilized for extraction, further refinement, purification and fusion are achieved, the most advanced attention perception module is combined for positioning and predicting the types of the target objects, the training times are multiple, the training types are comprehensive, and therefore stability is stronger. The input is an image containing a target object to be detected, and the output is an image containing a frame selection target of the target object. Due to the introduction of intelligent recognition and multi-scale feature fusion technology, the background picture searching efficiency and the recognition accuracy and speed of targets with various sizes are greatly improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a sample of a target detection data set;

FIG. 3 is a sample two schematic view of a target detection dataset;

FIG. 4 is a sample three schematic view of a target detection dataset;

FIG. 5 is a sample four schematic view of a target detection dataset;

FIG. 6 is an attention intelligent rotating target detecting neural network based on a multi-data flow network according to the present invention, which includes a feature extraction sub-network;

FIG. 7 is a schematic view of an attention module;

FIG. 8 is a schematic diagram of a portion of the similarity algorithm;

FIG. 9 is another schematic diagram of a portion of the similarity algorithm;

FIG. 10 is a graph of target detection results;

FIG. 11 is another graph of target detection results.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for detecting a target through intelligent sensing by using a rotating frame based on multiple data streams provided by this embodiment includes the following steps:

the method comprises the following steps: and constructing a target detection data set, wherein the target detection data set comprises a training set and a testing set, and images in the data set comprise various abundant images of daily life and multi-scale aerial images. The training set includes images and text files, wherein the text files are labeled with information of specific positions, types, frame selection sizes and the like of target objects in corresponding images, and the test set includes only images of targets to be detected, as shown in fig. 2, 3, 4 and 5. Through data set augmentation processing, the sample capacity of the generated data set is 22000, and the number ratio of the samples in the training set to the samples in the testing set is 10: 1.

Step two: and constructing a rotating frame intelligent perception target detection network model based on multiple data streams. As shown in fig. 6, a multi-data-stream-based rotating-frame intelligent sensing target detection network model is constructed, and includes a feature extraction sub-network, an attention intelligent recognition module, a multi-scale feature fusion sub-network, and a target detection sub-network, wherein the image with the type and position of an object calibrated is initially input, and the image is output to predict the type, position, framing information and probability of the object contained in a new image.

And the feature extraction sub-network in the second step is mainly used for extracting the features of the target image. The image marked with the target position and the type information is input, and the image is output as a feature extraction image with various sizes. The feature extraction sub-network comprises 5 down-sampling neural network layers, and the sizes of output network layer images are H multiplied by W multiplied by 32, H/2 multiplied by W/2 multiplied by 64, H/4 multiplied by W/4 multiplied by 128, H/8 multiplied by W/8 multiplied by 256 and H/16 multiplied by W/16 multiplied by 512 in sequence. The optimization strategy of each network layer is residual error linkage and regularization excitation, and the transmission efficiency of the characteristic information and the information between different scales are better guaranteed not to interfere with each other.

The attention intelligent recognition module may perform a correlation query on objects with similar positions and types, as shown in fig. 7, and obtain candidate object types and position information through multiple frame selection. Compared with the pure brute force search of the original method, the method has the advantages that the position and color information of the pixel are more prominent, and the search efficiency is improved by replacing the original method with the similarity formula and the algorithm calculation as shown in fig. 8 and 9.

The multi-scale feature fusion sub-network performs feature fusion on the multi-size feature images from coarse to fine. The image information with a plurality of sizes is input, and the target detection characteristic information after fusion and refinement is output. The method is specifically divided into 3 convolutional neural network layers, and the sizes of the images processed by the output network layers are as follows in sequence: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

And finally, marking the object type and the position candidate frame obtained by final fusion on the picture by a target detection sub-network and outputting a final target detection result. The input is target detection characteristic information which is subjected to fusion and refinement purification, the output is final frame selection with rotation property, and the calibrated object type and the possible probability of the object type are represented. The output image size processed by the network layer is as follows in sequence: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

The size of the initial image is H × W, where H and W represent the height and width of the image boundary, respectively, and specifically, the size of the input picture may be 512 × 512,1024 × 1024 or any other size that meets the condition.

In the second step, the initial target detection result can be obtained through an intelligent similarity detection algorithm, and the rotation boundary can be obtained through a rotation candidate frame conversion algorithm.

Step three: and (4) training the attention intelligent rotating target detection neural network based on the multi-data flow network by using the training set in the step one to obtain a trained network model.

The rotating frame intelligent perception target detection neural network training process based on multiple data streams in the third step can use a Pythrch and TensorFlow deep learning framework.

In a specific implementation process, a simulation data set is used for training the cooperative deep neural network to obtain a deep learning model after training. The deep learning model is trained by utilizing a Pythroch environment installed on a Ubuntu system, and is trained by adopting an ADAGARAD optimization algorithm, wherein the initial learning rate is 0.001, the training times are 600000, and when the training times are 300000, 400000 and 500000, the learning rate is sequentially divided by 10.

Step four: the trained rotating frame intelligent perception target detection neural network based on multiple data streams is utilized to predict the images of the test set, the images containing the positions of the rotating frames of the targets and the possible category probabilities are quickly and accurately generated, and the final result is shown in fig. 10 and 11.

The above description is only a preferred embodiment of the present invention, and certainly should not be taken as limiting the scope of the invention, which is defined by the claims and their equivalents.

Claims

1. A rotating frame intelligent perception target detection method based on multiple data streams is characterized by comprising the following steps: the method comprises the following steps:

step one, constructing a target detection data set, wherein the target detection data set comprises a training set and a test set, the training set comprises pictures and text files, the text files are marked with specific position, type and frame selection size information of targets in corresponding pictures, and the test set comprises pictures of the targets to be detected;

step two, constructing a rotating frame intelligent perception target detection network model based on multiple data streams, wherein the target detection network model comprises a feature extraction sub-network, an attention intelligent identification module, a multi-scale feature fusion sub-network and a target detection sub-network, the input of the target detection network model is a common image, the common image is rapidly screened and identified on multiple scales through the attention intelligent identification module, and the image is output as an image containing framing information and category probability;

step three, training a rotating frame intelligent perception target detection network model based on multiple data streams by using the training set in the step one to obtain a trained target detection network model;

step four, carrying out target detection on the test set picture by using the trained target detection network model;

2. The multiple data stream-based rotating frame intelligent perception target detection method according to claim 1, wherein: the function of the target detection subnetwork in the second step is to predict and output a final target detection result, the input of the target detection subnetwork is target detection characteristic information which is subjected to fusion and refinement, the output of the target detection subnetwork is a final frame selection with rotation property, the frame selection is a quadrangular target frame surrounded by four point coordinates, the type of a calibrated object and the possible probability of the object belonging to the type are represented, and the size and the number of the output images processed by the network layer are as follows: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

3. The multiple data stream-based rotating frame intelligent perception target detection method according to claim 2, wherein: the target detection result in the step two is obtained through an intelligent similarity detection algorithm, the intelligent similarity detection algorithm is used for comparing and matching target key information obtained through feature extraction with peripheral image information, and when similar same key target information is found, the same kind of targets in the region are deduced; the rotation boundary is obtained through a rotation candidate frame conversion algorithm, and the rotation candidate frame is designed for target detection of a specific shape and a specific angle, so that the identification of dense and tiny objects is facilitated.

4. The multiple data stream-based rotating frame intelligent perception target detection method according to claim 1, wherein: the feature extraction sub-network in the second step comprises 5 down-sampling neural network layers, the input of the down-sampling neural network layers is an image containing labeling position and type information, the output of the down-sampling neural network layers is feature images with various sizes, and the sizes and the number of feature images processed by the output network layers are H multiplied by W multiplied by 32, H/2 multiplied by W/2 multiplied by 64, H/4 multiplied by W/4 multiplied by 128, H/8 multiplied by W/8 multiplied by 256 and H/16 multiplied by W/16 multiplied by 512.

5. The multiple data stream-based rotating frame intelligent perception target detection method according to claim 1, wherein: the multi-scale feature fusion sub-network has the function of fusing and summarizing information of various sizes, inputting the information into information of various scales, and outputting the information into fused and refined target detection feature information; the multi-scale feature fusion sub-network specifically comprises 3 convolutional neural network layers, and the size and the number of the images processed by the output network layers are as follows: h/4 XW/4X 128, H/8 XW/8X 256, H/16 XW/16X 512.

6. The multiple data stream-based rotating frame intelligent perception target detection method according to claim 1, wherein: the training process of the multi-data-stream-based rotating frame intelligent perception target detection network model in the third step uses a Pythrch or TensorFlow deep learning framework.