CN112183558A

CN112183558A - Target detection and feature extraction integrated network based on YOLOv3

Info

Publication number: CN112183558A
Application number: CN202011066312.4A
Authority: CN
Inventors: 李利华; 韩勇强; 刘泳庆; 张路成; 魏晨晨; 余清鲜
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention discloses a YOLOv 3-based target detection and feature extraction integrated network, wherein a target detection module runs a YOLOv3 algorithm to output 3 branch detection results, 3 branch data are integrated according to a non-maximum suppression algorithm to obtain an output result of target detection, the target detection result and original image information in the network are input into a decoding module, the decoding module extracts an image of a target area according to input information and sends the image into a feature extraction module, the feature extraction module runs a convolutional neural network to extract feature information of the image and projects the feature information of each target onto a hypersphere, the target information output by the target detection module corresponds to the feature information output by the feature extraction module one by one, and finally the output of the target detection and the output of the feature extraction are combined to obtain the final output of the network. The integrated network provided by the invention can provide target detection information, and simultaneously provide characteristic information of the target for the tracking network, and can effectively improve the performance of the tracking algorithm.

Description

Target detection and feature extraction integrated network based on YOLOv3

Technical Field

The invention relates to the technical field of computer vision, in particular to a target detection and feature extraction integrated network based on YOLOv 3.

Background

The vision-based target detection and tracking is an important research topic in the field of computer vision, and has important research and practical values in the fields of video monitoring, virtual reality, human-computer interaction, autonomous navigation and the like. The target detection task can provide the information of the category, the position and the size of the target, and the performance of the target detection directly influences the performance of the subsequent target tracking task in the continuous frame sequence target tracking process.

In order to realize efficient target tracking, more accurate information of a target is expected to be obtained, wherein the more accurate information comprises the target category, the target position, the target size, the target color, texture, edge and other representation characteristics, and abundant target characteristics are the key for realizing target robust tracking.

The current target detection algorithm can output the information of the category, the position and the size of a target, but cannot output the representation characteristic information of the target, and the performance of the tracking algorithm is limited due to the limited information output by the detection algorithm.

Disclosure of Invention

In order to solve the limitations and defects of the prior art, the invention provides a target detection and feature extraction integrated network based on YOLOv3, which comprises a target detection module, a decoding module and a feature extraction module, wherein the target detection module runs a YOLOv3 algorithm, outputs 3 branch detection results, and integrates 3 branch data by using a non-maximum suppression algorithm to obtain an output result of target detection;

the decoding module receives the output result of the target detection and the original image information, extracts the image of the target area through decoding, and sends the image of the target area to the feature extraction module;

the feature extraction module operates a convolutional neural network to extract feature information of the target area image and projects image features onto a hypersphere, the coordinates of the hypersphere are the feature information of the target, and the feature information output by the feature detection module corresponds to the target information output by the target detection module one to one;

and combining the feature information output by the feature detection module with the target information output by the target detection module to obtain the final output of the target detection and feature extraction integrated network, wherein the final output comprises the category, the position, the size and the image feature information of each target.

Optionally, the method further includes:

the target detection module runs the YOLOv3 algorithm and outputs 3 branch results which are D1, D2 and D3 respectively, wherein

Integrating the 3 branch data through a non-maximum suppression algorithm to obtain an output result D of target detection, wherein

The maximum value M is the number of targets in the image;

the decoding module extracts an image of a target area and sends the image to the feature extraction module, the feature extraction module operates a convolutional neural network to extract feature information of the image to obtain a feature map, and a calculation formula is as follows:

converting information into a feature vector f with dimension (M multiplied by 10) by a full connection layer for the extracted feature map, wherein the calculation formula is as follows:

and operating the feature vector F, projecting the features of each target onto a hypersphere to obtain the final output F of the feature extraction network, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

combining the results of the target detection module and the feature extraction module to obtain a final output, wherein the final output contains the category, position, size and image feature information of each target, and the calculation formula is as follows:

wherein the content of the first and second substances,

optionally, the hypersphere is a 10-dimensional hypersphere.

The invention has the following beneficial effects:

the invention provides a YOLOv 3-based target detection and feature extraction integrated network, wherein a target detection module runs a YOLOv3 algorithm to output 3 branch detection results, 3 branch data are integrated according to a non-maximum suppression algorithm to obtain an output result of target detection, the target detection result and original image information are input into a decoding module, an image of a target area extracted by the decoding module is sent into a feature extraction module, the feature extraction module runs a convolutional neural network to extract feature information of the image, the feature information of each target is projected onto a hypersphere, and the coordinates of the hypersphere are the feature information of the target. And the target information output by the target detection module corresponds to the characteristic information output by the characteristic detection module one by one, and the output of the target detection and the output of the characteristic extraction are combined to obtain the final output of the network.

The integrated network provided by the invention can provide target detection information, and simultaneously provide characteristic information of the target for the tracking network, and can effectively improve the performance of the tracking algorithm. The output of the integrated network provided by the invention not only comprises a target detection result, but also comprises the image characteristic information of the target, and abundant information output lays a foundation for higher-level tasks. The feature extraction of the integrated network provided by the invention depends on the detection result, and only the effective target is subjected to feature extraction, so that the calculated amount is reduced, and the algorithm efficiency is improved. The integrated network provided by the invention extracts the target characteristics by using the convolutional neural network, can extract more image detail information, and has advantages compared with the traditional method. The integrated network provided by the invention maps the characteristic information to a 10-dimensional hypersphere, thereby being convenient for associating targets. The integrated network provided by the invention is subjected to modular design, is beneficial to the deployment and implementation of the network, and can perform transfer learning by utilizing the original network data of YOLOv3, thereby reducing the difficulty and cost of training.

Drawings

Fig. 1 is a schematic structural diagram of an integrated network for target detection and feature extraction based on YOLOv3 according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the target detection and feature extraction integrated network based on YOLOv3 provided in the present invention is described in detail below with reference to the accompanying drawings.

Example one

Fig. 1 is a schematic structural diagram of an integrated network for target detection and feature extraction based on YOLOv3 according to an embodiment of the present invention. As shown in fig. 1, the target detection and feature extraction integrated network based on YOLOv3 provided in this embodiment mainly includes three functional modules: the device comprises a target detection module, a decoding module and a feature extraction module.

In the working process, the target detection module operates a YOLOv3 algorithm to output 3 branch detection results, then 3 branch data are integrated through a Non-Maximum Suppression (NMS) algorithm to obtain the output result of the target detection, the target detection result and original image information are input into the decoding module, the decoding module extracts an image of a target area and sends the image into the feature extraction module, the feature extraction module operates a convolutional neural network to extract feature information of the image and projects the feature information of each target onto a 10-dimensional hypersphere, and the coordinates of the hypersphere are the feature information of the target. Due to the action of the decoding module, target information output by the target detection module in the network corresponds to feature information output by the feature detection module one to one, and finally, the target detection output and the feature extraction output are combined to obtain the final output of the network, and the final output contains the category, position, size and image feature information of each target.

The integrated network provided by the embodiment can provide target detection information, and meanwhile, provides characteristic information of the target for the tracking network, and can effectively improve the performance of the tracking algorithm. The output of the integrated network not only comprises a target detection result, but also comprises image characteristic information of the target, and abundant information output lays a foundation for higher-layer tasks. The feature extraction of the integrated network depends on the detection result, only the effective target is subjected to feature extraction, the calculated amount is reduced, and the algorithm efficiency is improved. The integrated network extracts the target features by using the convolutional neural network, can extract more image detail information, and has advantages compared with the traditional method. The integrated network maps the characteristic information to a 10-dimensional hypersphere, so that the target can be associated conveniently. The integrated network is in modular design, so that the deployment and implementation of the network are facilitated, migration learning can be performed by using the original network data of YOLOv3, and the training difficulty and cost are reduced.

The embodiment provides a YOLOv 3-based target detection and feature extraction integrated network, wherein a target detection module runs a YOLOv3 algorithm to output 3 branch detection results, 3 branch data are integrated according to a non-maximum suppression algorithm to obtain an output result of target detection, the target detection result and original image information are input into a decoding module, an image of a target area extracted by the decoding module is sent into a feature extraction module, the feature extraction module runs a convolutional neural network to extract feature information of the image, the feature information of each target is projected onto a hypersphere, and the coordinates of the hypersphere are the feature information of the target. And the target information output by the target detection module corresponds to the characteristic information output by the characteristic detection module one by one, and the output of the target detection and the output of the characteristic extraction are combined to obtain the final output of the network. Compared with the prior art, the integrated network provided by the embodiment integrates target detection and feature extraction, and a novel network structure is constructed. The network function is pioneering, the network output result not only comprises the target detection result but also comprises the image characteristic information of the target, the network output information is enriched, and a foundation is laid for high-level tasks. The integrated network provided by the embodiment adopts the convolutional neural network to extract the characteristic information of the network, can extract more image detail information, and has advantages compared with the traditional method.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A target detection and feature extraction integrated network based on YOLOv3 is characterized by comprising a target detection module, a decoding module and a feature extraction module, wherein the target detection module runs a YOLOv3 algorithm, outputs 3 branch detection results, and integrates 3 branch data by using a non-maximum suppression algorithm to obtain an output result of target detection;

2. The YOLOv 3-based integrated network for object detection and feature extraction as claimed in claim 1, further comprising:

The maximum value M is the number of targets in the image;

wherein the content of the first and second substances,

wherein the content of the first and second substances,

3. the integral network of object detection and feature extraction based on YOLOv3 of claim 1, wherein the hypersphere is a 10-dimensional hypersphere.