CN116597514A - Pedestrian gait recognition method and system based on triple network - Google Patents

Pedestrian gait recognition method and system based on triple network Download PDF

Info

Publication number
CN116597514A
CN116597514A CN202310563902.5A CN202310563902A CN116597514A CN 116597514 A CN116597514 A CN 116597514A CN 202310563902 A CN202310563902 A CN 202310563902A CN 116597514 A CN116597514 A CN 116597514A
Authority
CN
China
Prior art keywords
gait
data
pedestrian
triplet
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310563902.5A
Other languages
Chinese (zh)
Inventor
刘寿强
蒋明月
苏琳莹
曾熙茵
张路明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202310563902.5A priority Critical patent/CN116597514A/en
Publication of CN116597514A publication Critical patent/CN116597514A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a pedestrian gait recognition method and system based on a Triplet network, wherein the method comprises the following steps: acquiring a CASIA-B gait outline data set, performing data preprocessing, and acquiring a gait energy diagram; training the training set based on the measurement learning, and outputting the trained measurement learning training set; testing the trained metric learning Triplet network based on the test set, and calculating the characteristic distance based on the test result to obtain a calculation result; and performing ascending sort processing on the calculation result to obtain the pedestrian gait recognition result. The application can generate a gait energy diagram by utilizing the CASIA-B gait data set, then adopts the VGG network as a characteristic extraction network, and simultaneously uses the Triplet network to enable the sample to achieve better differentiation. The pedestrian gait recognition method and system based on the Triplet network can be widely applied to the technical field of pedestrian gait feature recognition.

Description

Pedestrian gait recognition method and system based on triple network
Technical Field
The application relates to the technical field of pedestrian gait feature recognition, in particular to a pedestrian gait recognition method and system based on a Triplet network.
Background
Gait recognition is used for classifying and recognizing walking gestures of people, and has various applications in the fields of identity recognition, abnormal behavior detection, medical treatment and the like; in recent years, the gait recognition field has been rapidly developed, and more people adopt deep neural networks to perform the research work of gait recognition; current gait recognition methods based on deep learning can be divided into two main categories: a discriminant method and a generative method; two types of discriminant methods exist in gait recognition, and x is required to be set as a section of gait sequence sample or gait template in the two types; the class 1 is a feature learning method, the method models a projection f by using a feature learning network (based on a deep neural network) to obtain a feature representation z=f (x) which is irrelevant to the covariate of x in Euclidean space with low dimension, wherein z is generated by using the acquired feature x. When the feature learning method is used, the low-dimensional features in the verification set can be conveniently stored, so that the calculation amount during feature matching is reduced, and the method is suitable for being used in clustering or retrieval tasks. Class 2 is the similarity function C (xp, xg) between the learning samples, xp is the enrollment sample, xg is the verification sample. Compared with the former method, the method needs to utilize a network to combine features of two groups, so the method has higher computational complexity in evaluation, and the method has the advantages that the method can directly use a learned similarity function to verify identities one to one, the generated method of gait recognition needs to acquire input gait features firstly, then converts the acquired gait features into another state and then performs feature extraction or matching, for example, for multi-view gait recognition, the generated method firstly utilizes an encoder to encode the gait features under various different views, then uses a certain feature transformation network to convert the encoded features under a verification set view or a conventional view, and finally uses a decoder to re-transform the features;
besides the gait recognition algorithm, the database used for training the model is also an important factor influencing the model index, even if the same gait recognition algorithm is used, the final training results on different databases are not necessarily the same, and in general, the database with wide data range, large scale and various types can be used for training a better model; for early gait recognition databases, they were all smaller in scale. Worldwide, the U.S. UCSD gait recognition database comprises 6 people, 7 video sequences per person, created by the university of California (University of California, san Diego, UCSD) in 1998. The earliest gait database in China is a CASIA gait database data set A created in 2001 of the national academy of sciences automation study, and comprises 20 people, 12 sequences of each person are shot from three angles of 0 degree, 45 degrees and 90 degrees in an outdoor environment, and 240 sequences are all included. Next, the automated institute again created a CASIA-B multiview library and a CASIA-C infrared library. The CASIA-B multi-view library is collected in 1 month of 2005, 124 persons in total have 11 views (0 degree, 18 degree, 36 degree … …,180 degree) for each person, gait video sequences under three conditions of common condition, wearing overcoat and carrying package, and many researchers use the CASIA-B database for application and study of gait recognition. In addition to the databases mentioned above, databases useful for gait recognition studies include OU-ISIR LP, USF, HIDUMD1 and 2, OU-ISIR Treadmill, CMU MoBo, UMST, AVAMVG, and the like;
however, existing gait recognition methods have complex gait data resulting from interactions between many factors, such as occlusion, camera view, individual appearance, sequence order, body part movements or light sources present in the data. These factors may interact in a complex manner, complicating the recognition task. Recently, there are more and more methods in other research fields, such as face recognition, motion recognition, emotion recognition, and pose estimation, focusing on learning decomposition features by extracting representations of various interpretation factors in a high-dimensional space of decomposition data. However, most of the existing deep gait recognition methods have not explored decomposition methods, and therefore cannot explicitly separate the underlying structure of the gait data in the form of meaningful disjoint variables. Despite recent advances in the use of decomposition methods in some gait recognition methods, there is room for improvement;
for the existing database, the defects are that most of database samples have fewer numbers, and the feature dimension of gait features is higher, so that data overfitting is easy to cause, and the performance of a model is reduced; most of the existing databases are horizontally shot pictures or videos, no depression angle exists, and the imaging equipment usually shoots the gait of a pedestrian at a high place in reality, so that the imaging equipment has a certain depression angle; in reality, the environment is quite different from the laboratory environment, the influence of obstacles, illumination, clothes, carrying objects, background, pedestrians and the like exists, in the existing database, the collected data are obtained under the conditions that the environment is clear and no interference objects exist, and the carrying objects and the clothes of the pedestrians are not too greatly changed.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a pedestrian gait recognition method and system based on a Triplet network, which can generate a gait energy diagram by utilizing a CASIA-B gait data set, then adopt a VGG network as a feature extraction network, and simultaneously use the Triplet network to enable a sample to achieve better differentiation.
The first technical scheme adopted by the application is as follows: a pedestrian gait recognition method based on a Triplet network comprises the following steps:
acquiring a CASIA-B gait outline data set, performing data preprocessing, and acquiring a gait energy diagram;
training a training set based on the measurement learning based on the gait energy diagram until the training set based on the measurement learning meets preset conditions, and outputting the trained measurement learning training network;
testing the trained metric learning Triplet network based on a test set in the gait energy diagram, and calculating the characteristic distance based on a test result to obtain a calculation result;
and performing ascending sort processing on the calculation result to obtain the pedestrian gait recognition result.
Further, the step of acquiring a CASIA-B gait contour data set and preprocessing the data to acquire a gait energy map specifically includes:
acquiring a CASIA-B gait contour data set and storing the CASIA-B gait contour data set into a perdata file, wherein the pedestrian state of the CASIA-B gait contour data set comprises normal walking, knapsack and wearing overcoat, and the CASIA-B gait contour data set comprises a training set and a testing set;
normalizing the CASIA-B gait contour data set by a min-max normalization method to obtain a normalized pedestrian gait image;
acquiring a pedestrian gait cycle of the normalized pedestrian gait image by a gait cycle detection method, wherein the pedestrian gait cycle is the interval time between heel strike of a pedestrian from a left foot/right foot to heel strike of a second left foot/right foot;
synthesizing the normalized gait images of the pedestrians according to the gait cycle of the pedestrians to obtain a preliminary gait energy diagram;
and (3) endowing the preliminary gait energy diagram with a data tag through a sampler to obtain the gait energy diagram.
Further, the step of obtaining the gait energy map by applying a data tag to the preliminary gait energy map through the sampler specifically includes:
based on a data set processing interface torch.utils.data.dataloader module in PyTorch, performing torch.utils.data.sampler type sampling processing on the gait energy diagram to acquire preset data;
and storing preset data into the collate_fn, and respectively carrying out secondary processing on the training set and the test set through the collate_fn_for_train and the collate_fn_for_test to obtain a gait energy diagram.
Further, the training set in the gait-based energy diagram trains the training set based on the metric learning until the training set based on the metric learning meets the preset condition, and the step of outputting the trained metric learning training set specifically includes:
performing feature extraction processing on a training set in the gait energy diagram through a VGG convolutional neural network GaitSetNet to obtain gait energy diagram feature data;
the VGG convolutional neural network GaitSetNet comprises four basic convolutional layer structures of set_layer1, set_layer2, set_layer3, set_layer4 and two downsampling layer structures of set_layer1_down and set_layer2_down;
based on a Ranger optimizer, carrying out RAdam algorithm optimization processing on gait energy diagram feature data, calculating gradient, optimizing variance by using an exponential weighted average method, improving k parameters by using a reverse feedback verification method, improving learning rate by using a backward fitting method, and carrying out Lookahead algorithm optimization on a data set to obtain an optimized gait energy diagram;
defining an optimized gait energy diagram to be divided into a reference sample, a positive sample and a negative sample, wherein the positive sample and the reference sample do not belong to the same sample but belong to the same pedestrian, and the negative sample and the reference sample do not belong to the same pedestrian;
and performing ternary loss optimization processing on the reference sample, the positive sample and the negative sample through the triple network based on measurement learning until a preset condition is met, and outputting the trained measurement learning triple network.
Further, the expression of the preset condition is:
in the above formula, m is an optional threshold, representingAnd->Distance between and->And->Minimum distance between the distances, +.>Representing training samples, ++>Representing prediction as positive samples, ++>Representing the prediction as a negative sample, τ represents the geometry of the ensemble of triples.
Further, the step of performing ternary loss function optimization processing on the reference sample, the positive sample and the negative sample through the Triplet network based on metric learning specifically comprises the following steps:
defining a ternary loss class, and constructing an input sample based on a reference sample, a positive sample and a negative sample;
based on the ternary loss class, inputting labels in a sample, and performing free combination of every two labels to generate a label matrix;
obtaining a mask of a positive sample pair and a negative sample pair from a label matrix, and performing free combination on every two groups of features in an input sample to generate a feature matrix;
calculating Euclidean distance of the feature matrix, and extracting the feature matrix with the distance according to the mask of the positive and negative sample pairs to obtain the distance between the positive and negative labels;
subtracting the distance between the positive label and the negative label and subtracting the interval value to obtain the ternary loss.
Further, the triple loss class includes simple triples, general triples, and difficult triples, wherein:
constructing a positive pair < a, p > and a negative pair < a, n >;
for the distance of the positive pair which is far smaller than that of the negative pair, selecting a simple triplet for optimization training;
for the distance between the positive pair and the distance between the positive pair are lower than a preset threshold, selecting a general triplet for optimization training;
and selecting a difficult triplet for optimization training when the distance between the positive pair and the distance between the positive pair are larger than a preset threshold.
Further, the expression of the ternary loss function is:
in the above equation, N represents the size of the training data set.
Further, the step of testing the trained metric learning Triplet network based on the test set in the gait energy diagram specifically includes:
loading a model file, designing a function for measuring Euclidean distance and calculating multi-angle accuracy, randomly selecting partial data from a training set and a testing set as acquisition data and comparison data, and acquiring acquisition data characteristics and comparison data characteristics;
calculating the distance between the acquired data features and the comparison data features;
sorting the distances, and returning the first 5 minimum sorting indexes;
the first 5 labels are taken out from the comparison data according to the index and compared with the labels in the acquired data;
and accumulating the correct number of the comparison results, so that each sample corresponds to 5 records, and the records respectively represent the recognized correct number in the first 5 fruits.
The second technical scheme adopted by the application is as follows: a Triplet network-based pedestrian gait recognition system, comprising:
the acquisition module is used for acquiring the CASIA-B gait outline data set and preprocessing data to acquire a gait energy diagram;
the training module is used for training the training set based on the metric learning based on the gait energy diagram until the training set based on the metric learning meets the preset condition, and outputting the trained metric learning training set;
the test module is used for testing the trained metric learning Triplet network based on a test set in the gait energy diagram, and calculating the characteristic distance based on a test result to obtain a calculation result;
and the recognition module is used for carrying out ascending sort processing on the calculation result to obtain a pedestrian gait recognition result.
The method and the system have the beneficial effects that: according to the application, the gait recognition problem is solved by adopting a network Triplet network based on metric learning, and meanwhile, a Ranger optimizer with better performance than a traditional optimizer is utilized to perform model optimization, so that a sample is better distinguished, namely, a VGG feature extraction network, the Ranger optimizer and a Triplet ternary loss network are used for training on a CASIA-B data set to obtain a gait model, and better accuracy is obtained under different shielding conditions and multi-view conditions.
Drawings
FIG. 1 is a flow chart of steps of a pedestrian gait recognition method based on a Triplet network of the present application;
FIG. 2 is a block diagram of a pedestrian gait recognition system based on a Triplet network of the present application;
FIG. 3 is a schematic representation of a CASIA-B gait profile dataset acquired by the present application;
FIG. 4 is an image of a pedestrian gait prior to normalization in accordance with the present application;
FIG. 5 is a normalized pedestrian gait image of the application;
FIG. 6 is a schematic representation of one gait cycle of the application;
FIG. 7 is a graph of gait energy obtained by the present application;
FIG. 8 is a flow chart of the data tag assigned by the sampler of the present application;
FIG. 9 is a schematic diagram of a Triplet network architecture based on metric learning of the present application;
FIG. 10 is a conceptual diagram of a Triplet loss of the present application;
FIG. 11 is a flow chart of the steps for performing ternary loss function calculation according to the present application;
FIG. 12 is a test flow diagram of the present application for learning a Triplet network based on trained metrics;
FIG. 13 is a graph showing the accuracy of the method of the present application compared with the four methods of SPAE, gaitGAN, PTSN, AE under carrying package conditions;
FIG. 14 is a graph showing the accuracy of the method of the present application compared to the four methods SPAE, gaitGAN and PTSN, AE under coat donning conditions;
FIG. 15 is a graph showing the comparison of recognition rates and change laws of the model of the present application and the classical model of gait recognition under normal walking;
FIG. 16 is a schematic diagram of the recognition rate comparison and change rule of the model of the present application and the classical model of gait recognition under the condition of carrying a parcel;
fig. 17 is a schematic diagram of the recognition rate comparison and change rule of the model of the present application and the gait recognition classical model under wearing a coat.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
The platform and software used for application development are:
development platform: autoDL;
operating system: windows 10;
development language: python 3.8;
deep learning framework: pytorch 1.11.0;
GPU model: RTX A5000;
data set: a CASIA-B dataset;
in the application, as other classification problems are greatly different from gait recognition, a common network model is adopted to not obtain better recognition rate, and a distance measurement learning method is often used for processing the gait recognition problem, so a network Triplet network based on measurement learning is adopted to process the gait recognition problem, and a Ranger optimizer with better performance than a traditional optimizer is utilized to optimize the model, so that a sample is better distinguished, and finally the obtained model is compared with other classical gait recognition models, thereby proving the effectiveness of the gait recognition method.
Referring to fig. 1, the application provides a pedestrian gait recognition method based on a Triplet network, which comprises the following steps:
s1, acquiring a data set;
specifically, referring to fig. 3, the data set used in the present application is a CASIA-B data set, which has two forms of video and contour, the present patent directly adopts the contour data set for training, meanwhile, the data set is divided into three categories by status, namely normal walking (nm), knapsack (bg), and coat (cl), the data set comprises 124 persons, each person has eleven shooting angles (0 °,18 °,36 ° … … °), because the CASIA-B data set itself does not divide the data set, the verification set and the test set, the first 50 names of 124 pedestrians are used as training sets, the last 70 names are used as test sets, besides, the training sets and the test sets all contain the same amount of three categories of pictures for achieving training and test effects, and the data set is stored in the perdata file according to the format shown in the figure, thereby facilitating the next reading.
S2, preprocessing data;
s21, normalization processing;
specifically, referring to fig. 4 and 5, since the positions of the cameras are unchanged in gait information acquisition, the sizes of the video image frames of pedestrians from different view angles and different positions are not consistent, and meanwhile, the specific positions of pedestrians in the images are also different, so that normalization processing is required to be performed on the images, the application adopts a min-max normalization mode, and sets max and min by traversing each pixel in an image matrix to perform normalization processing on data, and the formula is as follows:
the function converts the original data into [0,1 ] in a linearized manner]Is calculated as result x For normalized data, x is the original data.
S22, gait energy diagram;
specifically, referring to fig. 6 and 7, it is generally necessary to convert the video sequence into gei energy maps before it can be sent to convolutional neural network training. Through normalization, we can obtain a series of roughly aligned contour sequences, the size is adjusted to 64x64, then we need to synthesize the processed images, and the images of one gait cycle are synthesized into a gait energy diagram, the gait cycle refers to the interval time between the heel strike of the left foot (right foot) and the heel strike of the second left foot (right foot), according to the periodical change, a plurality of gait cycle detection methods exist at present, and the different methods are all detected according to the periodical law of contour information when a human body walks, the application adopts a manually specified method to confirm the gait cycle, and then the gait energy diagram (gei) can be generated, and the specific calculation formula of the gait energy diagram is as follows:
wherein N is gait cycle, I t (x, y) is the gray value of the pixel point at the time t, G (x, y) is the gait energy diagram, and the part with high energy in GEI represents the body region with smaller movement amplitude of the human body during walkingThe domain contains static information of a human body, such as a head, an upper body and the like, and the part with low energy represents a region with larger movement amplitude and contains dynamic information of the human body, such as an arm, a leg and the like, so that the GEI can well express time sequence information of a person when walking, and is very suitable for being used as input of a network;
the above data processing and gait energy graph generating process is designed into classes and functions and stored in the file gaitset_dataloader.py, and the data set can be directly acquired and processed and then trained for the next step only by calling the incoming file path.
S23, a sampler;
specifically, referring to fig. 8, the gait recognition model used in the present application requires training with ternary loss in the subsequent process. The ternary loss can assist the orientation of model feature extraction, so that the feature distance of the same tag is closer, and the feature distance of different tags is farther, and because the ternary loss needs to contain different tags in batch data input (the positive/negative sample can be sampled in a matrix mode), the data set needs to be additionally processed, so that a custom sampler is used for completing the sampling function containing different tag data, and the torch. the torch.utils.data.dataloader is a data set processing interface in the PyTorch, specified data are taken out from a data source according to sampling indexes of the torch.utils.data.sampler, the data are put into a collate_fn for secondary processing, finally required batch data are returned, a custom sampler triplesampler class is realized, indexes of different labels are selected from the data set and returned, and two collate_fn functions collate_fn_for_ train, collate _fn_for_test are respectively used for secondary processing of training data and test data.
S3, training based on the training set;
s31, VGG convolutional neural network;
the application adopts VGG convolutional neural network to carry out characteristic extraction work, the specific architecture of the convolutional neural network is described in the second part, and specific implementation is described below. The basic component of classical convolutional neural networks is this sequence: convolutions with padding to maintain resolution; nonlinear activation functions, such as ReLU; the convergence layer, such as the maximum convergence layer, and a VGG block, like this, is composed of a series of convolution layers, followed by the maximum convergence layer for spatial downsampling, different VGG models being defined by the difference in the number of convolution layers and the number of output channels in each block;
the model name after changing the parameters was GaitSetNet. The DataParallel is used for parallel computation to speed up the training process. GaitSetNet contains multiple convolutional layers and bulk normalization layers organized in different structures, e.g., set_layer1, set_layer2, etc., put the model into DataParallel so that it can run in parallel on multiple GPUs. The custom GaitSetNet neural network class and the basic Conv2d convolution layer class comprise a convolution layer and a batch normalization layer. Set_layer1, set_layer2, set_layer3, set_layer4 are used as four basic convolutional layer structures of GaitSetNet, and two downsampling layer structures of set_layer1_down and set_layer2_down are added for reducing the feature map size, as shown in table 1:
table 1 convolutional neural network display
S32, a Ranger optimizer;
specifically, the Ranger optimizer combines two newer optimizers (RAdam, lookahead) into a single optimizer, wherein RAdam is advanced in that it can dynamically turn on or off the adaptive learning rate based on variance dispersion, providing a method that does not require an adjustable parameter learning rate warm-up. The method has the advantages of both Adam and SGD, not only can ensure fast convergence speed, but also is not easy to fall into a local optimal solution, and the accuracy is even better than SGD under the condition of larger learning rate. On the other hand, lookAhead is inspired by the progress of the deep neural network in losing the surface, and the deep learning training and the convergence speed can be stabilized. The LookAhead reduces the number of super parameters to be adjusted, can realize faster convergence of different deep learning tasks with minimum calculation cost, and the Ranger optimizer is a combination of the two optimizers, has the advantages, and therefore has better high accuracy.
S33, a Triplet network;
s331, a Triplet network structure;
specifically, the triple network consists of three networks, weight sharing is realized among the three networks, the three networks are networks based on measurement learning, the triple network is input into three gait energy diagrams, one energy diagram is selected as a reference sample (anchor), a positive sample (positive) and the diagram do not belong to the same sample but belong to the same pedestrian, a negative sample (negative) and the diagram do not belong to the same pedestrian, and a Triplet (triples) is the three gait energy diagrams input each time;
referring to fig. 9, in the process of using the Triplet network, the aforementioned network is still selected as the feature extraction network, the same weight is shared, the network inputs are anchor, positive and negative, and the distance D (a, p) between the reference sample and the positive sample and the distance D (a, n) between the reference sample and the negative sample are calculated after the three samples pass through the feature extraction network, respectively, and euclidean distance is selected as the distance measure. And then, learning the distances of the two types of feature vectors by using a ternary loss function (Triplet loss), wherein the aim is to realize that the similarity distance between a reference sample and a positive sample is smaller and the similarity distance between the reference sample and the positive sample is larger and larger than that between the reference sample and the negative sample.
S332, network principle;
specifically, referring to FIG. 10, the ternary Loss function triple Loss is made up of one Triplet < a, p, n >, where a: anchor represents a training sample. positive denotes predictive positive samples. negative indicates that the prediction is negative. The function of the ternary loss function is to reduce the distance between positive (positive sample) and the anchor and enlarge the distance between negative (negative sample) and the anchor;
triplet loss is the resultAnd->The distance of the feature expression is as small as possible, while +.>And->The distance between the feature expressions is as large as possible, and based on the triples, a positive pair can be constructed<a,p>And a negative pair<a,n>. the purpose of the triple loss is to separate the positive and negative pair over a distance (margin), so it is desirable to: d (a, p)<D (a, n). It is further desirable that this is satisfied over a distance (margin): d (a, p) +m<D(a,n);
The following formula is satisfied for a sample across the network:
wherein m is an optional threshold, representingAnd->Distance between and->And->Minimum spacing of distances between. While in training, there may be three cases:
(1) A simple triplet;
D(a,p)+margin<D(a,n)
at this time, loss=0, the distance of the active pair is far smaller than that of the negative pair, namely the intra-class distance is very small, and the inter-class distance is very large, so that optimization is not needed in the case;
(2) A general triplet;
D(a,p)<D(a,n)<D(a,p)+margin
the distance between the positive pair and the distance between the positive pair are relatively close, namely < a, n > and < a, p > are very close, but are in a boundary range, so that the positive pair is relatively easy to optimize;
(3) A difficult triplet;
D(a,n)<D(a,p)
the distance of the positive pair is larger than that of the negative pair, namely the intra-class distance is larger than that of the inter-class distance, so that the training is the most difficult sample, and the situation is harder to optimize;
in the case of a general triplet, the requirements for loss generation are:
D(a,p)+m-D(a,n)>0
resulting in a loss function to be optimized:
through the analysis, since the loss of the simple triplet is 0, the simple triplet is selected to be used for training, so that no feature can be learned, while the general triplet is more suitable for earlier training because < a, p > and < a, n > are in one margin, and similarly, the difficult triplet can be used for later training because the feature which can be extracted by the difficult triplet is the most, so that more features can be extracted, and therefore, the triplet mining is particularly important in the process.
S333, concrete implementation;
specifically, referring to fig. 11, a ternary loss (TripletLoss) class is defined, so as to implement ternary loss calculation, in the class, every two labels in an input sample are freely combined in one group to generate a label matrix, a mask of a positive/negative sample pair is obtained from the label matrix, every two features in the input sample are freely combined in one group to generate a feature matrix, euclidean distance of the feature matrix is calculated, then the feature matrix with the distance is extracted according to the mask of the positive/negative sample pair, so as to obtain the distance of the positive/negative labels, and finally, the distance of the positive/negative labels is subtracted, and then an interval value is subtracted, so that ternary loss is obtained. In the calculation process, the ternary loss is regarded as n, the ternary loss calculation is carried out on m samples and d dimensional characteristics in each part in a matrix mode, the n parts are averaged, the ternary loss is calculated continuously in the iteration process, the training process is just carried out, in each round of iteration, gradient zero clearing is carried out, the ternary loss is calculated, if the final loss value reaches the average loss value, the model is stored, and the model is stored in a folder, so that the next test is convenient; if the loss value does not reach the standard, the model is not stored, and the stored model is stored in a checkpoint folder.
S4, testing the trained metric learning Triplet network;
s41, calculating a distance;
specifically, in the process of testing the accuracy of the model, the distance between the features needs to be calculated, and the euclidean distance can be interpreted as the length of a line segment connecting two points. For example, there are two vectors x in n-dimensional space i And y i We use the euclidean distance to represent their distance, where the euclidean distance formula is very simple, and the distance is calculated from the cartesian coordinates of these points using the pythagorean theorem, as follows:
another common distance measurement is cosine similarity (Cosine Similarity), which is equal to the cosine of the angle between two vectors, and if the vectors are normalized to vectors of length 1, the dot product of the vectors is the same, and the formula is as follows:
the application selects Euclidean distance as a distance measurement learning method, because cosine similarity is more focused on the comparison of differences in directions than Euclidean distance, and in gait recognition problems, more parts are absolute distances to measure individual differences, so the Euclidean distance is a better distance measurement method for the problems.
S42, a testing process;
specifically, referring to fig. 12, firstly, loading a model file, designing a function for measuring euclidean distance and calculating multi-angle accuracy, then defining a walking condition of collected data and a walking condition of comparison data, taking the first 5 data closest to the collected data, comparing the collected data with the comparison data in sequence, obtaining sample characteristics of specified conditions, matching corresponding labels according to the distance between the collected data characteristics and the comparison data, and calculating the accuracy, wherein the specific steps are as follows:
(1) Calculating the distance between the acquired data features and the comparison data features;
(2) Sorting the distances, and returning the first 5 minimum sorting indexes;
(3) The first 5 labels are taken out from the comparison data according to the index and compared with the labels in the acquired data;
(4) The correct numbers of the comparison results are accumulated, so that each sample corresponds to 5 records, and the records respectively represent the correct numbers of the first 5 fruits, such as [ True, false ], and the accumulated results are [1,2, 3], which indicate that 3 correct results are identified in the first 3 sample characteristics closest to the acquired data, and 3 correct results are identified in the first 5 sample characteristics;
(5) Comparing the accumulated result with 0, and judging the number of the accumulated result greater than 0 in each ranking;
(6) Dividing the correct identification number of the ranks 1-5 by the number of collected samples, and multiplying by 100 to obtain the accuracy of each rank;
after the test function is designed, traversing the data set, and storing the characteristic result and the sample label to finally obtain the test result.
S5, simulation experiments and result analysis;
specifically, in the experiment of the method, 74 data set objects are used for training a network model, the rest 50 objects are used for testing the performance of the network, the results are compared with the accuracy of the four methods of SPAE, gaitGAN, PTSN and AE, the comparison result under the conditions of BG (carrying package) and CL (wearing coat) is shown in fig. 13 and 14, the accuracy of the method is higher under the condition of BG (carrying package), the accuracy of the method is higher than that of the three methods of SPAE, gaitGAN and AE, the accuracy of the method is up to 75.9% under the condition of CL (wearing coat), the accuracy of the method is higher than that of the four methods under the condition of NM (normal state), in general, the recognition accuracy of 84.4% of the model is lower than that of other models, and the inventor considers that gait characteristics extracted by a network under the condition of no shielding object are limited, samples of different types cannot be effectively distinguished, and the visual angle change is not large, so that the result is obtained, and under the condition of shielding objects (BG and CL), the sample is divided more fully due to the effect of a triplet loss function and a measurement learning method, so that a better recognition effect is obtained, the effectiveness of the model is fully proved, after the recognition accuracy of the model under the condition of shielding objects under the condition of unchanged visual angle is compared, the average accuracy under different visual angles is compared, namely the visual angles within a range of 0-180 degrees are detected respectively, and the result is shown in table 2:
TABLE 2 average accuracy of recognition at different viewing angles
Furthermore, the comparison of the recognition rate and the change rule of the model and the gait recognition classical model under the three conditions of NM, BG and CL are more intuitively displayed by using the line diagram, and as shown in fig. 15, 16 and 17, firstly, it can be seen that in any condition, the gait recognition accuracy of the model shows a change rule of rising and then falling. This is because the information of the picture under the condition of 90 ° can completely exhibit the gait of the pedestrian, so that a good recognition effect can be obtained. Under the condition of 0 degrees or 180 degrees, the pedestrians in the pictures stand on the front side or the back side completely, so that the recognition accuracy is lower, and secondly, for the model provided by the application, under the condition that the visual angle is gradually increased, the accuracy is in a trend of rising and then falling in the whole, but the fluctuation is larger in certain visual angles, and the model is probably because different features are not well separated under certain visual angles, so that the coupling degree among the features is higher; and finally, comparing and analyzing different models. Under NM conditions, the model identification accuracy rate proposed by the application is lower than that of other two comparison models, and is behind the other two models under most angles, which is also consistent with the results under the condition of unchanged visual angles, while under BG and CL conditions, the model identification accuracy rate proposed by the application is higher than that of the other two comparison models under a plurality of visual angles, and the model identification accuracy rate is lower under only a few visual angles, which indicates that the model proposed by the application still has better effect for processing the shielding object under the condition of visual angle change.
Referring to fig. 2, a pedestrian gait recognition system based on a Triplet network includes:
the acquisition module is used for acquiring the CASIA-B gait outline data set and preprocessing data to acquire a gait energy diagram;
the training module is used for training the training set based on the metric learning based on the gait energy diagram until the training set based on the metric learning meets the preset condition, and outputting the trained metric learning training set;
the test module is used for testing the trained metric learning Triplet network based on a test set in the gait energy diagram, and calculating the characteristic distance based on a test result to obtain a calculation result;
and the recognition module is used for carrying out ascending sort processing on the calculation result to obtain a pedestrian gait recognition result.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (10)

1. The pedestrian gait recognition method based on the Triplet network is characterized by comprising the following steps of:
acquiring a CASIA-B gait outline data set, performing data preprocessing, and acquiring a gait energy diagram;
training a training set based on the measurement learning based on the gait energy diagram until the training set based on the measurement learning meets preset conditions, and outputting the trained measurement learning training network;
testing the trained metric learning Triplet network based on a test set in the gait energy diagram, and calculating the characteristic distance based on a test result to obtain a calculation result;
and performing ascending sort processing on the calculation result to obtain the pedestrian gait recognition result.
2. The pedestrian gait recognition method based on the triple network according to claim 1, wherein the step of acquiring the caia-B gait profile data set and performing data preprocessing to acquire a gait energy pattern specifically comprises:
acquiring a CASIA-B gait contour data set and storing the CASIA-B gait contour data set into a perdata file, wherein the pedestrian state of the CASIA-B gait contour data set comprises normal walking, knapsack and wearing overcoat, and the CASIA-B gait contour data set comprises a training set and a testing set;
normalizing the CASIA-B gait contour data set by a min-max normalization method to obtain a normalized pedestrian gait image;
acquiring a pedestrian gait cycle of the normalized pedestrian gait image by a gait cycle detection method, wherein the pedestrian gait cycle is the interval time between heel strike of a pedestrian from a left foot/right foot to heel strike of a second left foot/right foot;
synthesizing the normalized gait images of the pedestrians according to the gait cycle of the pedestrians to obtain a preliminary gait energy diagram;
and (3) endowing the preliminary gait energy diagram with a data tag through a sampler to obtain the gait energy diagram.
3. The pedestrian gait recognition method based on the triple network according to claim 2, wherein the step of obtaining the gait energy map by applying a data tag to the preliminary gait energy map by the sampler specifically comprises:
based on a data set processing interface torch.utils.data.dataloader module in PyTorch, performing torch.utils.data.sampler type sampling processing on the gait energy diagram to acquire preset data;
and storing preset data into the collate_fn, and respectively carrying out secondary processing on the training set and the test set through the collate_fn_for_train and the collate_fn_for_test to obtain a gait energy diagram.
4. The pedestrian gait recognition method based on the Triplet network according to claim 3, wherein the training set in the gait-based energy diagram trains the Triplet network based on the metric learning until the Triplet network based on the metric learning meets a preset condition, and the step of outputting the trained metric learning Triplet network specifically comprises:
performing feature extraction processing on a training set in the gait energy diagram through a VGG convolutional neural network GaitSetNet to obtain gait energy diagram feature data;
the VGG convolutional neural network GaitSetNet comprises four basic convolutional layer structures of set_layer1, set_layer2, set_layer3, set_layer4 and two downsampling layer structures of set_layer1_down and set_layer2_down;
based on a Ranger optimizer, carrying out RAdam algorithm optimization processing on gait energy diagram feature data, calculating gradient, optimizing variance by using an exponential weighted average method, improving k parameters by using a reverse feedback verification method, improving learning rate by using a backward fitting method, and carrying out Lookahead algorithm optimization on a data set to obtain an optimized gait energy diagram;
defining an optimized gait energy diagram to be divided into a reference sample, a positive sample and a negative sample, wherein the positive sample and the reference sample do not belong to the same sample but belong to the same pedestrian, and the negative sample and the reference sample do not belong to the same pedestrian;
and performing ternary loss optimization processing on the reference sample, the positive sample and the negative sample through the triple network based on measurement learning until a preset condition is met, and outputting the trained measurement learning triple network.
5. The pedestrian gait recognition method based on the triple network of claim 4, wherein the expression of the preset condition is:
in the above formula, m is an optional threshold, representingAnd->Distance between and->And->Minimum distance between the distances, +.>Representing training samples, ++>Representing prediction as positive samples, ++>Representing the prediction as a negative sample, τ represents the geometry of the ensemble of triples.
6. The pedestrian gait recognition method based on the Triplet network according to claim 5, wherein the step of performing ternary loss function optimization processing on the reference sample, the positive sample and the negative sample through the Triplet network based on metric learning specifically comprises:
defining a ternary loss class, and constructing an input sample based on a reference sample, a positive sample and a negative sample;
based on the ternary loss class, inputting labels in a sample, and performing free combination of every two labels to generate a label matrix;
obtaining a mask of a positive sample pair and a negative sample pair from a label matrix, and performing free combination on every two groups of features in an input sample to generate a feature matrix;
calculating Euclidean distance of the feature matrix, and extracting the feature matrix with the distance according to the mask of the positive and negative sample pairs to obtain the distance between the positive and negative labels;
and subtracting the distance between the positive label and the negative label and subtracting the interval value to obtain the ternary loss.
7. The pedestrian gait recognition method based on the Triplet network of claim 6, wherein the Triplet loss class includes a simple Triplet, a general Triplet, and a difficult Triplet, wherein:
constructing a positive pair < a, p > and a negative pair < a, n >;
for the distance of the positive pair which is far smaller than that of the negative pair, selecting a simple triplet for optimization training;
for the distance between the positive pair and the distance between the positive pair are lower than a preset threshold, selecting a general triplet for optimization training;
and selecting a difficult triplet for optimization training when the distance between the positive pair and the distance between the positive pair are larger than a preset threshold.
8. The pedestrian gait recognition method based on the triple network of claim 7, wherein the expression of the ternary loss function is:
in the above equation, N represents the size of the training data set.
9. The pedestrian gait recognition method based on the Triplet network according to claim 8, wherein the step of testing the trained metric learning Triplet network based on the test set in the gait energy diagram specifically comprises:
loading a model file, designing a function for measuring Euclidean distance and calculating multi-angle accuracy, randomly selecting partial data from a training set and a testing set as acquisition data and comparison data, and acquiring acquisition data characteristics and comparison data characteristics;
calculating the distance between the acquired data features and the comparison data features;
sorting the distances, and returning the first 5 minimum sorting indexes;
the first 5 labels are taken out from the comparison data according to the index and compared with the labels in the acquired data;
and accumulating the correct number of the comparison results, so that each sample corresponds to 5 records, and the records respectively represent the recognized correct number in the first 5 fruits.
10. Pedestrian gait recognition system based on triple network, which is characterized by comprising the following modules:
the acquisition module is used for acquiring the CASIA-B gait outline data set and preprocessing data to acquire a gait energy diagram;
the training module is used for training the training set based on the metric learning based on the gait energy diagram until the training set based on the metric learning meets the preset condition, and outputting the trained metric learning training set;
the test module is used for testing the trained metric learning Triplet network based on a test set in the gait energy diagram, and calculating the characteristic distance based on a test result to obtain a calculation result;
and the recognition module is used for carrying out ascending sort processing on the calculation result to obtain a pedestrian gait recognition result.
CN202310563902.5A 2023-05-18 2023-05-18 Pedestrian gait recognition method and system based on triple network Pending CN116597514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310563902.5A CN116597514A (en) 2023-05-18 2023-05-18 Pedestrian gait recognition method and system based on triple network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310563902.5A CN116597514A (en) 2023-05-18 2023-05-18 Pedestrian gait recognition method and system based on triple network

Publications (1)

Publication Number Publication Date
CN116597514A true CN116597514A (en) 2023-08-15

Family

ID=87604151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310563902.5A Pending CN116597514A (en) 2023-05-18 2023-05-18 Pedestrian gait recognition method and system based on triple network

Country Status (1)

Country Link
CN (1) CN116597514A (en)

Similar Documents

Publication Publication Date Title
Alani et al. Hand gesture recognition using an adapted convolutional neural network with data augmentation
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN114821640A (en) Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network
CN109558814A (en) A kind of three-dimensional correction and weighting similarity measurement study without constraint face verification method
Liu et al. Target recognition of sport athletes based on deep learning and convolutional neural network
CN111626152B (en) Space-time line-of-sight direction estimation prototype design method based on Few-shot
CN113158861A (en) Motion analysis method based on prototype comparison learning
CN112906520A (en) Gesture coding-based action recognition method and device
Arnaud et al. Tree-gated deep mixture-of-experts for pose-robust face alignment
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN111695507B (en) Static gesture recognition method based on improved VGGNet network and PCA
Özbay et al. 3D Human Activity Classification with 3D Zernike Moment Based Convolutional, LSTM-Deep Neural Networks.
Martı́nez Carrillo et al. A compact and recursive Riemannian motion descriptor for untrimmed activity recognition
CN113591797B (en) Depth video behavior recognition method
Xie et al. ResNet with attention mechanism and deformable convolution for facial expression recognition
Chaturvedi et al. Landmark calibration for facial expressions and fish classification
CN116597514A (en) Pedestrian gait recognition method and system based on triple network
CN110210336B (en) Low-resolution single-sample face recognition method
Zhou et al. Motion balance ability detection based on video analysis in virtual reality environment
Nallapu et al. Intelligent video analytics & facial emotion recognition using artificial intelligence
CN113762082B (en) Unsupervised skeleton action recognition method based on cyclic graph convolution automatic encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination