CN111160115B

CN111160115B - Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network

Info

Publication number: CN111160115B
Application number: CN201911260938.6A
Authority: CN
Inventors: 魏丹; 王子阳; 胡晓强; 罗一平
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2023-05-02
Anticipated expiration: 2039-12-10
Also published as: CN111160115A

Abstract

The invention relates to a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network, which is characterized in that each frame of picture of a pedestrian video is extracted into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard line layer of the twin double-flow 3D convolutional neural network; extracting the action information of the pedestrian by taking the optical flow feature map as the input of the action branch, and extracting the appearance information of the pedestrian by taking the rest feature maps as the input of the appearance branch; fusing the pedestrian action information into the extracted pedestrian appearance information; performing measurement contrast learning on the action information and the appearance information through fusion; updating network parameters and training a new convolutional neural network; and associating the target pedestrian image with the pedestrian image to be identified with the first similarity rank. Compared with the prior art, the method has the advantages of being closer to a real scene and the like.

Description

Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network

Technical Field

The invention relates to the field of machine vision based on image processing, in particular to a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network.

Background

Pedestrian re-identification is a problem faced when performing person matching on non-overlapping cameras, and has received increasing attention in recent years due to its importance in implementing automated monitoring systems. The video pedestrian re-identification method is closer to a real scene, is beneficial to realizing urban intellectualization, is beneficial to safely searching people in large public places such as airports and the like, is beneficial to realizing automatic searching of lost old people and children through the camera, and is beneficial to assisting an public security organization to automatically identify and track criminals.

In many applications, such as cross-camera tracking and pedestrian searching, it is desirable to identify a person from a group of people based on appearance information. However, due to the low resolution, motion blur, view and individual appearance illumination variations, it is very challenging to construct a differentiated representation that adapts to different camera conditions, and thus in multi-camera systems, matching of non-overlapping camera views is of increasing interest. Such as the behavior of a person in a large area of public space covered by a plurality of non-overlapping cameras, when the person disappears from one view, the object can be identified in the same group of people from another view. While computer vision researchers have made the greatest effort in the last decade, the problem of human re-identification remains largely unsolved. In particular, in a busy environment monitored by a remote camera, it is unreliable to authenticate a person by means of facial and gait biometric features.

Disclosure of Invention

The invention aims to overcome the defect of low recognition degree in a busy environment in the prior art, and provides a video pedestrian re-recognition method based on a twin double-flow 3D convolutional neural network.

The aim of the invention can be achieved by the following technical scheme:

a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network comprises the following steps:

step S1: extracting each frame of picture of the pedestrian video into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard line layer of the twin double-flow 3D convolutional neural network;

step S2: extracting the motion information of the pedestrian by taking the optical flow-x characteristic diagram and the optical flow-y characteristic diagram extracted in the step S1 as the input of the motion branch, and extracting the appearance information of the pedestrian by taking the gray characteristic diagram, the horizontal coordinate gradient characteristic diagram and the vertical coordinate gradient characteristic diagram as the input of the appearance branch;

step S3: fusing the pedestrian motion information extracted in the step S3 into the extracted pedestrian appearance information at a second layer of the twin double-flow 3D convolutional neural network;

step S4: performing metric contrast learning on the action information and the appearance information through fusion in a sixth layer of the twin double-flow 3D convolutional neural network;

step S5: updating network parameters through improved video triplet loss, and training a new convolutional neural network according to the updated network parameters;

step S6: and (5) merging the trained convolutional neural networks in the step (S5) at a full-connection layer, identifying target pedestrian images by comparing the measured distances, sorting the similarity, and associating the target pedestrian images with the pedestrian images to be identified with the first similarity rank.

In the step S2, the optical flow-x feature map and the optical flow-y feature map are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.

Weight sharing is carried out among convolution kernels of the 3D convolution neural network, and time information in a time dimension is increased on the basis of 2D convolution and 2D pooling.

The second layer of the twin double-current 3D convolutional neural network is P ₃ And (5) pooling the layers.

The sixth layer of the twin double-current 3D convolutional neural network is Conv ₆ A convolution layer.

Preferably, the motion information and the appearance information are fused by a Conv method.

The improved video triplet loss not only comprises pedestrian measurement loss between videos and in the videos, but also comprises action measurement loss and appearance measurement loss of pedestrians in the videos and between the videos, and specifically comprises the following steps:

wherein L is _VideoTriplet For the loss of a video triplet,

for the feature value of one unit at the x-th feature map (i, j, t) of layer l of the positive video sample W of the present video sample V, +.>

For the eigenvalue of one unit at the x-th eigenvector (i, j, t) of layer i of the present video sample V,/V>

The feature value of a unit at the x-th feature map (i, j, t) of layer L of the negative video sample U of the present video sample V, L _action L is the total loss of action information _appearance Delta is the sensitivity of each layer, which is the total loss of appearance information. .

The updating of the network parameters specifically comprises the following steps:

ω _new ＝ω _old -η▽J(ω)

b _new ＝b _old -η▽J(b)

wherein omega _new To update the weights after updating omega _old For the weight before updating, η is the learning rate, which is a fixed value, and typically 0.05, [ V ] J (ω) is the residual error about the weight, b _new B for updated bias term _old For pre-update weights, +.J (b) is the residual for the bias term.

Compared with the prior art, the invention has the following beneficial effects:

1. the pedestrian video recording is closer to a real scene, the intelligent pedestrian video recording system is beneficial to realizing urban intellectualization, is beneficial to safely searching people in large public places such as airports and the like, is beneficial to realizing automatic searching of lost old people and children through the camera, and is beneficial to assisting an public security organization to automatically identify and track criminals.

2. The video pedestrian re-identification method based on the twin double-current 3D convolutional neural network provided by the invention has the advantages that the operation information and the appearance information are learned through the two double-current 3D convolutional neural networks, the video pedestrian re-identification is completed, and the efficiency is higher.

3. After the twin double-flow 3D convolutional neural network is fused twice through the first-stage fusion and the second-stage fusion, the appearance information and the motion information are better fused, so that the trained convolutional neural network extracts the motion information and the appearance information of the pedestrian to be identified more accurately, and the identification accuracy is higher.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a diagram illustrating the difference between the 3D convolution and the 2D convolution according to the present invention;

FIG. 3 is a schematic diagram showing the difference between 3D pooling and 2D pooling according to the present invention;

FIG. 4 is a flow chart of the present invention for improving triplet loss.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

As shown in fig. 1, a video pedestrian re-identification method based on a twin double-current 3D convolutional neural network includes:

step S1: extracting each frame of picture of the pedestrian video 1 and the pedestrian video 2 into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard wire layer of the twin double-flow 3D convolutional neural network;

Pedestrian video 1 and pedestrian video 2 are the same segment

In step S2, the optical flow-x feature map and the optical flow-y feature map are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.

Weight sharing is carried out between convolution kernels of the 3D convolution neural network, and time information in a time dimension is increased on the basis of 2D convolution and 2D pooling, as shown in fig. 2 and 3.

The fusion method in step S3 and step S4 is the Conv method.

As shown in fig. 4, the improved video triplet loss includes not only pedestrian metric loss between videos and in videos, but also action metric loss and appearance metric loss of pedestrians in videos and between videos, specifically:

wherein L is _VideoTriplet For the loss of a video triplet,

The updating of the network parameters is specifically as follows:

ω _new ＝ω _old -η▽J(ω)

b _new ＝b _old -η▽J(b)

The positive sample with smaller measurement distance and the fixed sample are pulled up, the weight and the deviation term are updated, the negative sample with larger measurement distance and the fixed sample are pushed away, the weight and the deviation term are updated, the weights among the samples are shared, the fixed sample after the update is retrained, and a new neural network model is obtained.

Claims

1. A video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network is characterized by comprising the following steps:

step S3: fusing the pedestrian action information extracted in the step S2 into the extracted pedestrian appearance information in the twin double-flow 3D convolutional neural network;

step S4: the twin double-flow 3D convolutional neural network carries out measurement contrast learning on the action information and the appearance information through fusion;

2. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein in the step S2, the optical flow-x characteristic diagram and the optical flow-y characteristic diagram are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.

3. The method for re-identifying video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein weight sharing is carried out among convolution kernels of the 3D convolutional neural network.

4. The method for re-identifying video pedestrians based on twin double-current 3D convolutional neural network according to claim 1, wherein said step S3 is located at P ₃ And (5) pooling the layers.

5. The method for re-identifying video pedestrians based on the twin-stream 3D convolutional neural network according to claim 1, wherein the step S4 is located in Conv ₆ Convolutional layer。

6. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the action information and the appearance information are fused by a Conv method.

7. The method for re-identifying the pedestrians in the video based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the improved video triplet loss not only comprises pedestrian measurement loss between videos and in the videos, but also comprises action measurement loss and appearance measurement loss of pedestrians in the videos and between the videos, and is specifically as follows:

wherein L is _VideoTriplet For the loss of a video triplet,

The feature value of a unit at the x-th feature map (i, j, t) of layer L of the negative video sample U of the present video sample V, L _action L is the total loss of action information _appearance Delta is the sensitivity of each layer, which is the total loss of appearance information.

8. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the updating of the network parameters is specifically as follows:

ω _new ＝ω _old -η▽J(ω)

b _new ＝b _old -η▽J(b)

wherein omega _new To update the weights after updating omega _old For the weight before updating, η is the learning rate, which is a fixed value, J (ω) is the residual error about the weight, b _new B for updated bias term _old For pre-update weights, +.J (b) is the residual for the bias term.