CN111160115B - Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network - Google Patents
Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network Download PDFInfo
- Publication number
- CN111160115B CN111160115B CN201911260938.6A CN201911260938A CN111160115B CN 111160115 B CN111160115 B CN 111160115B CN 201911260938 A CN201911260938 A CN 201911260938A CN 111160115 B CN111160115 B CN 111160115B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- convolutional neural
- neural network
- video
- characteristic diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network, which is characterized in that each frame of picture of a pedestrian video is extracted into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard line layer of the twin double-flow 3D convolutional neural network; extracting the action information of the pedestrian by taking the optical flow feature map as the input of the action branch, and extracting the appearance information of the pedestrian by taking the rest feature maps as the input of the appearance branch; fusing the pedestrian action information into the extracted pedestrian appearance information; performing measurement contrast learning on the action information and the appearance information through fusion; updating network parameters and training a new convolutional neural network; and associating the target pedestrian image with the pedestrian image to be identified with the first similarity rank. Compared with the prior art, the method has the advantages of being closer to a real scene and the like.
Description
Technical Field
The invention relates to the field of machine vision based on image processing, in particular to a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network.
Background
Pedestrian re-identification is a problem faced when performing person matching on non-overlapping cameras, and has received increasing attention in recent years due to its importance in implementing automated monitoring systems. The video pedestrian re-identification method is closer to a real scene, is beneficial to realizing urban intellectualization, is beneficial to safely searching people in large public places such as airports and the like, is beneficial to realizing automatic searching of lost old people and children through the camera, and is beneficial to assisting an public security organization to automatically identify and track criminals.
In many applications, such as cross-camera tracking and pedestrian searching, it is desirable to identify a person from a group of people based on appearance information. However, due to the low resolution, motion blur, view and individual appearance illumination variations, it is very challenging to construct a differentiated representation that adapts to different camera conditions, and thus in multi-camera systems, matching of non-overlapping camera views is of increasing interest. Such as the behavior of a person in a large area of public space covered by a plurality of non-overlapping cameras, when the person disappears from one view, the object can be identified in the same group of people from another view. While computer vision researchers have made the greatest effort in the last decade, the problem of human re-identification remains largely unsolved. In particular, in a busy environment monitored by a remote camera, it is unreliable to authenticate a person by means of facial and gait biometric features.
Disclosure of Invention
The invention aims to overcome the defect of low recognition degree in a busy environment in the prior art, and provides a video pedestrian re-recognition method based on a twin double-flow 3D convolutional neural network.
The aim of the invention can be achieved by the following technical scheme:
a video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network comprises the following steps:
step S1: extracting each frame of picture of the pedestrian video into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard line layer of the twin double-flow 3D convolutional neural network;
step S2: extracting the motion information of the pedestrian by taking the optical flow-x characteristic diagram and the optical flow-y characteristic diagram extracted in the step S1 as the input of the motion branch, and extracting the appearance information of the pedestrian by taking the gray characteristic diagram, the horizontal coordinate gradient characteristic diagram and the vertical coordinate gradient characteristic diagram as the input of the appearance branch;
step S3: fusing the pedestrian motion information extracted in the step S3 into the extracted pedestrian appearance information at a second layer of the twin double-flow 3D convolutional neural network;
step S4: performing metric contrast learning on the action information and the appearance information through fusion in a sixth layer of the twin double-flow 3D convolutional neural network;
step S5: updating network parameters through improved video triplet loss, and training a new convolutional neural network according to the updated network parameters;
step S6: and (5) merging the trained convolutional neural networks in the step (S5) at a full-connection layer, identifying target pedestrian images by comparing the measured distances, sorting the similarity, and associating the target pedestrian images with the pedestrian images to be identified with the first similarity rank.
In the step S2, the optical flow-x feature map and the optical flow-y feature map are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.
Weight sharing is carried out among convolution kernels of the 3D convolution neural network, and time information in a time dimension is increased on the basis of 2D convolution and 2D pooling.
The second layer of the twin double-current 3D convolutional neural network is P 3 And (5) pooling the layers.
The sixth layer of the twin double-current 3D convolutional neural network is Conv 6 A convolution layer.
Preferably, the motion information and the appearance information are fused by a Conv method.
The improved video triplet loss not only comprises pedestrian measurement loss between videos and in the videos, but also comprises action measurement loss and appearance measurement loss of pedestrians in the videos and between the videos, and specifically comprises the following steps:
wherein L is VideoTriplet For the loss of a video triplet,for the feature value of one unit at the x-th feature map (i, j, t) of layer l of the positive video sample W of the present video sample V, +.>For the eigenvalue of one unit at the x-th eigenvector (i, j, t) of layer i of the present video sample V,/V>The feature value of a unit at the x-th feature map (i, j, t) of layer L of the negative video sample U of the present video sample V, L action L is the total loss of action information appearance Delta is the sensitivity of each layer, which is the total loss of appearance information. .
The updating of the network parameters specifically comprises the following steps:
ω new =ω old -η▽J(ω)
b new =b old -η▽J(b)
wherein omega new To update the weights after updating omega old For the weight before updating, η is the learning rate, which is a fixed value, and typically 0.05, [ V ] J (ω) is the residual error about the weight, b new B for updated bias term old For pre-update weights, +.J (b) is the residual for the bias term.
Compared with the prior art, the invention has the following beneficial effects:
1. the pedestrian video recording is closer to a real scene, the intelligent pedestrian video recording system is beneficial to realizing urban intellectualization, is beneficial to safely searching people in large public places such as airports and the like, is beneficial to realizing automatic searching of lost old people and children through the camera, and is beneficial to assisting an public security organization to automatically identify and track criminals.
2. The video pedestrian re-identification method based on the twin double-current 3D convolutional neural network provided by the invention has the advantages that the operation information and the appearance information are learned through the two double-current 3D convolutional neural networks, the video pedestrian re-identification is completed, and the efficiency is higher.
3. After the twin double-flow 3D convolutional neural network is fused twice through the first-stage fusion and the second-stage fusion, the appearance information and the motion information are better fused, so that the trained convolutional neural network extracts the motion information and the appearance information of the pedestrian to be identified more accurately, and the identification accuracy is higher.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram illustrating the difference between the 3D convolution and the 2D convolution according to the present invention;
FIG. 3 is a schematic diagram showing the difference between 3D pooling and 2D pooling according to the present invention;
FIG. 4 is a flow chart of the present invention for improving triplet loss.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
As shown in fig. 1, a video pedestrian re-identification method based on a twin double-current 3D convolutional neural network includes:
step S1: extracting each frame of picture of the pedestrian video 1 and the pedestrian video 2 into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard wire layer of the twin double-flow 3D convolutional neural network;
step S2: extracting the motion information of the pedestrian by taking the optical flow-x characteristic diagram and the optical flow-y characteristic diagram extracted in the step S1 as the input of the motion branch, and extracting the appearance information of the pedestrian by taking the gray characteristic diagram, the horizontal coordinate gradient characteristic diagram and the vertical coordinate gradient characteristic diagram as the input of the appearance branch;
step S3: fusing the pedestrian motion information extracted in the step S3 into the extracted pedestrian appearance information at a second layer of the twin double-flow 3D convolutional neural network;
step S4: performing metric contrast learning on the action information and the appearance information through fusion in a sixth layer of the twin double-flow 3D convolutional neural network;
step S5: updating network parameters through improved video triplet loss, and training a new convolutional neural network according to the updated network parameters;
step S6: and (5) merging the trained convolutional neural networks in the step (S5) at a full-connection layer, identifying target pedestrian images by comparing the measured distances, sorting the similarity, and associating the target pedestrian images with the pedestrian images to be identified with the first similarity rank.
Pedestrian video 1 and pedestrian video 2 are the same segment
In step S2, the optical flow-x feature map and the optical flow-y feature map are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.
Weight sharing is carried out between convolution kernels of the 3D convolution neural network, and time information in a time dimension is increased on the basis of 2D convolution and 2D pooling, as shown in fig. 2 and 3.
The second layer of the twin double-current 3D convolutional neural network is P 3 And (5) pooling the layers.
The sixth layer of the twin double-current 3D convolutional neural network is Conv 6 A convolution layer.
The fusion method in step S3 and step S4 is the Conv method.
As shown in fig. 4, the improved video triplet loss includes not only pedestrian metric loss between videos and in videos, but also action metric loss and appearance metric loss of pedestrians in videos and between videos, specifically:
wherein L is VideoTriplet For the loss of a video triplet,for the feature value of one unit at the x-th feature map (i, j, t) of layer l of the positive video sample W of the present video sample V, +.>For the eigenvalue of one unit at the x-th eigenvector (i, j, t) of layer i of the present video sample V,/V>The feature value of a unit at the x-th feature map (i, j, t) of layer L of the negative video sample U of the present video sample V, L action L is the total loss of action information appearance Delta is the sensitivity of each layer, which is the total loss of appearance information. .
The updating of the network parameters is specifically as follows:
ω new =ω old -η▽J(ω)
b new =b old -η▽J(b)
wherein omega new To update the weights after updating omega old For the weight before updating, η is the learning rate, which is a fixed value, and typically 0.05, [ V ] J (ω) is the residual error about the weight, b new B for updated bias term old For pre-update weights, +.J (b) is the residual for the bias term.
The positive sample with smaller measurement distance and the fixed sample are pulled up, the weight and the deviation term are updated, the negative sample with larger measurement distance and the fixed sample are pushed away, the weight and the deviation term are updated, the weights among the samples are shared, the fixed sample after the update is retrained, and a new neural network model is obtained.
Claims (8)
1. A video pedestrian re-identification method based on a twin double-flow 3D convolutional neural network is characterized by comprising the following steps:
step S1: extracting each frame of picture of the pedestrian video into an optical flow-x characteristic diagram, an optical flow-y characteristic diagram, a gray characteristic diagram, a horizontal coordinate gradient characteristic diagram and a vertical coordinate gradient characteristic diagram through a hard line layer of the twin double-flow 3D convolutional neural network;
step S2: extracting the motion information of the pedestrian by taking the optical flow-x characteristic diagram and the optical flow-y characteristic diagram extracted in the step S1 as the input of the motion branch, and extracting the appearance information of the pedestrian by taking the gray characteristic diagram, the horizontal coordinate gradient characteristic diagram and the vertical coordinate gradient characteristic diagram as the input of the appearance branch;
step S3: fusing the pedestrian action information extracted in the step S2 into the extracted pedestrian appearance information in the twin double-flow 3D convolutional neural network;
step S4: the twin double-flow 3D convolutional neural network carries out measurement contrast learning on the action information and the appearance information through fusion;
step S5: updating network parameters through improved video triplet loss, and training a new convolutional neural network according to the updated network parameters;
step S6: and (5) merging the trained convolutional neural networks in the step (S5) at a full-connection layer, identifying target pedestrian images by comparing the measured distances, sorting the similarity, and associating the target pedestrian images with the pedestrian images to be identified with the first similarity rank.
2. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein in the step S2, the optical flow-x characteristic diagram and the optical flow-y characteristic diagram are subjected to optical flow calculation to obtain the change of the motion information of the adjacent 2 frames.
3. The method for re-identifying video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein weight sharing is carried out among convolution kernels of the 3D convolutional neural network.
4. The method for re-identifying video pedestrians based on twin double-current 3D convolutional neural network according to claim 1, wherein said step S3 is located at P 3 And (5) pooling the layers.
5. The method for re-identifying video pedestrians based on the twin-stream 3D convolutional neural network according to claim 1, wherein the step S4 is located in Conv 6 Convolutional layer。
6. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the action information and the appearance information are fused by a Conv method.
7. The method for re-identifying the pedestrians in the video based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the improved video triplet loss not only comprises pedestrian measurement loss between videos and in the videos, but also comprises action measurement loss and appearance measurement loss of pedestrians in the videos and between the videos, and is specifically as follows:
wherein L is VideoTriplet For the loss of a video triplet,for the feature value of one unit at the x-th feature map (i, j, t) of layer l of the positive video sample W of the present video sample V, +.>For the eigenvalue of one unit at the x-th eigenvector (i, j, t) of layer i of the present video sample V,/V>The feature value of a unit at the x-th feature map (i, j, t) of layer L of the negative video sample U of the present video sample V, L action L is the total loss of action information appearance Delta is the sensitivity of each layer, which is the total loss of appearance information.
8. The method for re-identifying the video pedestrians based on the twin double-flow 3D convolutional neural network according to claim 1, wherein the updating of the network parameters is specifically as follows:
ω new =ω old -η▽J(ω)
b new =b old -η▽J(b)
wherein omega new To update the weights after updating omega old For the weight before updating, η is the learning rate, which is a fixed value, J (ω) is the residual error about the weight, b new B for updated bias term old For pre-update weights, +.J (b) is the residual for the bias term.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260938.6A CN111160115B (en) | 2019-12-10 | 2019-12-10 | Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260938.6A CN111160115B (en) | 2019-12-10 | 2019-12-10 | Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111160115A CN111160115A (en) | 2020-05-15 |
CN111160115B true CN111160115B (en) | 2023-05-02 |
Family
ID=70556702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911260938.6A Active CN111160115B (en) | 2019-12-10 | 2019-12-10 | Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160115B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792594B (en) * | 2021-08-10 | 2024-04-12 | 南京大学 | Method and device for locating language fragments in video based on contrast learning |
CN114998995A (en) * | 2022-06-13 | 2022-09-02 | 西安电子科技大学 | Cross-view-angle gait recognition method based on metric learning and space-time double-flow network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146248A (en) * | 2017-04-27 | 2017-09-08 | 杭州电子科技大学 | A kind of solid matching method based on double-current convolutional neural networks |
CN108416266A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of video behavior method for quickly identifying extracting moving target using light stream |
CN109241834A (en) * | 2018-07-27 | 2019-01-18 | 中山大学 | A kind of group behavior recognition methods of the insertion based on hidden variable |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
-
2019
- 2019-12-10 CN CN201911260938.6A patent/CN111160115B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146248A (en) * | 2017-04-27 | 2017-09-08 | 杭州电子科技大学 | A kind of solid matching method based on double-current convolutional neural networks |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN108416266A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of video behavior method for quickly identifying extracting moving target using light stream |
CN109241834A (en) * | 2018-07-27 | 2019-01-18 | 中山大学 | A kind of group behavior recognition methods of the insertion based on hidden variable |
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
孙鹏 ; 于彤 ; 冯鹏定 ; 蒋庄浩 ; 魏丹妮 ; 单大国 ; .变化场景条件下视频目标色彩校正方法.中国刑警学院学报.2019,(第02期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111160115A (en) | 2020-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740413B (en) | Pedestrian re-identification method, device, computer equipment and computer storage medium | |
CN111160297B (en) | Pedestrian re-identification method and device based on residual attention mechanism space-time combined model | |
CN108830252B (en) | Convolutional neural network human body action recognition method fusing global space-time characteristics | |
CN106096561B (en) | Infrared pedestrian detection method based on image block deep learning features | |
CN108509859B (en) | Non-overlapping area pedestrian tracking method based on deep neural network | |
CN109961051B (en) | Pedestrian re-identification method based on clustering and block feature extraction | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN104598883B (en) | Target knows method for distinguishing again in a kind of multiple-camera monitoring network | |
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN110796074B (en) | Pedestrian re-identification method based on space-time data fusion | |
CN109064484B (en) | Crowd movement behavior identification method based on fusion of subgroup component division and momentum characteristics | |
CN104517095B (en) | A kind of number of people dividing method based on depth image | |
CN111639616A (en) | Heavy identity recognition method based on deep learning | |
CN109583373B (en) | Pedestrian re-identification implementation method | |
CN111241932A (en) | Automobile exhibition room passenger flow detection and analysis system, method and storage medium | |
CN110728216A (en) | Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning | |
CN109919073B (en) | Pedestrian re-identification method with illumination robustness | |
US20220366570A1 (en) | Object tracking device and object tracking method | |
CN111160115B (en) | Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network | |
CN112381132A (en) | Target object tracking method and system based on fusion of multiple cameras | |
CN112070010B (en) | Pedestrian re-recognition method for enhancing local feature learning by combining multiple-loss dynamic training strategies | |
CN111401113A (en) | Pedestrian re-identification method based on human body posture estimation | |
CN111639580A (en) | Gait recognition method combining feature separation model and visual angle conversion model | |
CN112132157B (en) | Gait face fusion recognition method based on raspberry pie | |
CN110321801B (en) | Clothing changing pedestrian re-identification method and system based on self-coding network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |