CN111639719B

CN111639719B - Footprint image retrieval method based on space-time motion and feature fusion

Info

Publication number: CN111639719B
Application number: CN202010511912.0A
Authority: CN
Inventors: 唐俊; 鹿新; 王年; 朱明�; 樊旭晨; 吴洛天; 李双双
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2023-04-07
Anticipated expiration: 2040-06-08
Also published as: CN111639719A

Abstract

The invention discloses a footprint image retrieval method based on space-time motion and feature fusion, which comprises the following steps: 1. preparing a step-by-step footprint image data set; 2. establishing a step-forming footprint image preprocessing module; 3. establishing a preprocessing layer of multi-scale stepping footprint images and integral normalization; 4. initializing weight, 5, establishing a spatial feature extraction module; 6. establishing a time sequence feature extraction module; 7. and training, testing and optimizing the network. According to the invention, the spatial characteristic information and the time sequence characteristic information of the turn-into-turn footprint image are extracted, and a specific characteristic fusion module is combined, so that richer time-space information of the turn-into-turn footprint image can be obtained, and the differential characteristic information among different people can be clustered, thereby greatly improving the accurate value of the turn-into-turn footprint image retrieval.

Description

Footprint image retrieval method based on space-time motion and feature fusion

Technical Field

The invention relates to the field of image processing and degree learning, in particular to a stepping footprint image retrieval method based on space-time motion and feature fusion.

Background

Due to the influence of some factors such as bones and acquired living habits, the footprint image has the characteristic of being difficult to disguise, and is more unique and unique compared with other marks such as palm prints and fingerprints. The research on the footprint images not only has scientific research significance, but also can be applied to the fields of commerce, security protection, criminal investigation and the like.

In recent years, the rapid rise of deep learning makes the turn-by-turn footprint image retrieval have a new breakthrough, and the neural network has strong learning ability. The turn-into-turn footprint image is subjected to deep learning, so that the manpower and material resources for analyzing and processing data can be reduced, and the efficiency and the accuracy of turn-into-turn footprint image retrieval are greatly improved. In general, the conventional method of searching the footmark images in batches depends on some experiences of experts or simple comparison algorithms, and the methods have low accuracy. But also consumes a lot of time, manpower and material resources.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a footprint image retrieval method based on space-time motion and feature fusion so as to obtain richer space-time information of the turn-into-turn footprint image and cluster differential feature information among different people, thereby improving the accuracy and efficiency of the turn-into-turn footprint image retrieval.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a footprint image retrieval method based on space-time motion and feature fusion, which is characterized by comprising the following steps of:

step 1: constructing a training set and a test set;

step 1.1: collecting continuous stepping footprint images of any test object at a certain walking speed;

step 1.2: respectively carrying out pseudo-colorization and denoising treatment on the turn-into-turn footprint image to obtain a processed turn-into-turn footprint image sample;

step 1.3: sequentially dividing each footprint image in the track-forming footprint image samples according to a frame sequence to obtain a footprint sequence sample set X = { X = _k |k＝1,2,3,···,K}；x _k Representing a kth frame footprint sequence sample; k is more than or equal to 1 and less than or equal to K; k represents the total number of footprints in the sample of the stepping image;

step 1.4: respectively defining a label for each footprint sequence sample in the footprint sequence sample set, wherein the label comprises ID information and serial number information;

step 1.5: repeating the steps 1.1-1.4, so as to collect a plurality of continuous stepping footprint images of a plurality of test objects under different walking speeds and carry out corresponding processing, thereby forming a footprint image data set;

step 1.6: dividing a footprint image data set into a test set and a training set, and subdividing the test set into a test query set and a test base library set;

and 2, step: establishing a stepping footprint image retrieval model with space-time motion and feature fusion, wherein the stepping footprint image retrieval model consists of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module;

step 2.1: the preprocessing layer carries out size resetting processing on the footprint sequence sample set X to obtain a footprint sequence sample set X' containing multi-scale features;

step 2.2: the preprocessing layer utilizes a formula (1) to carry out normalization processing on the footprint sequence sample set X 'containing the multi-scale features to obtain a normalized footprint sequence sample set X';

in formula (1), image (k') represents footprint sequence samples containing multi-scale features at the kth frame; mean represents the Mean of a sample set X' of the sequence of footprints containing the multi-scale feature; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale features; image (k ") represents the normalized kth frame footprint sequence sample;

step 2.3: establishing a spatial feature extraction module consisting of a convolutional neural network of M layers of small convolutional kernels, wherein the convolutional neural network of any mth layer of small convolutional kernels sequentially comprises the following steps: the convolution neural network of the 2 nd layer of small convolution kernels is also provided with a batch normalization layer between the corresponding convolution layer and the activation layer; the value range of M is [5,10];

step 2.3.1: initializing weights of all convolution layers in the spatial feature extraction module by using an Xavier method;

step 2.3.2: obtaining an output result Z of the m-th convolutional layer by using the formula (2) _m ：

Z _m ＝W _m *X _m +B _m (2)

In the formula (2), X _m Step size S of mth convolution layer _m Inputting a part of the image to be convolved; b _m Is the step length S of the mth convolution layer _m Bias down, W _m Step size S of mth convolution layer _m A lower sharing weight;

step 2.3.3: obtaining the output size Y of the m-th convolution layer by using the formula (3) _m ：

In formula (3), S _m Is the step size of the mth convolution layer, K _m Convolution kernel size, P, of the mth convolution layer _m Number of filled pixels of m-th convolution layer, C _m Is the output channel of the mth convolution layer, H _m Is the height, R, of the mth convolution layer _m Is the width of the mth convolution layer;

step 2.3.4: the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F ₁ ,F ₂ ,···,F _k ,···,F _K }、F _k The footprint characteristic map of the kth frame is obtained after the normalized footprint sequence sample Image (k') is processed by the spatial characteristic extraction module;

step 2.4: constructing a feature fusion module consisting of K feature mask operation layers and K frame full-connection layers;

step 2.4.1: output channel C of k characteristic mask operation layer on M convolution layer _M Footprint characteristic map F of up-to-k frame _k Performing superposition operation to obtain a superposed kth frame feature map F' _k ；

Step 2.4.2: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' _k Summing pixel point values, averaging to obtain an erasing threshold value, and adding the overlapped kth frame feature map F' _k The pixels in the k frame are larger than the erasing threshold value, and the characteristic map F ″' of the k frame characteristic mask is obtained _k ；

Step 2.4.3: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' _k And a k frame feature mask feature map F ″ _k Overlapping to obtain the k frame feature fusion map

Step 2.4.4: fusing the map on the k frame characteristic

Performing dimensionality reduction to obtain a kth frame feature fusion map

Corresponding full connection layer vector _k ；

Step 2.4.5: full connection layer vector for k frame _k Averagely cutting the image into I pieces to obtain I feature vectors, wherein the ith feature vector _ik And the value range of I is [4,8 ]]；

Step 2.4.6: ith feature vector for full link layer of kth frame _ik Given a weight w _ik And obtaining a k frame full link layer LastVector after feature fusion by using the formula (4) _k ：

Step 2.4.7: repeating the steps 2.4.1-2.4.6 to obtain the K frame full connection layer { LastVector after feature fusion _k |k＝1,2,…,K}；

Step 2.5: constructing a time sequence characteristic extraction module consisting of a ConvLSTM convolution long-term and short-term memory network and a full connection layer;

step 2.5.1: for the K frame full connection layer { LastVector after the feature fusion _k I K =1,2, \8230, K } performs dimensionality increasing operation to obtain a network input vector after dimensionality increasing;

step 2.5.2: initializing a weight value of the ConvLSTM convolution long-term and short-term memory network by using Gaussian distribution;

step 2.5.3: extracting sequence feature information of the network input vector after the dimension is increased by using the initialized ConvLSTM convolution long-short term memory network, thereby obtaining a time sequence feature map F';

step 2.5.4: performing dimensionality reduction on the time sequence feature map F' to obtain a full connection layer vector;

step 2.5.5: connecting a fully connected output layer vector' which is the same as the dimension A of the number of the types of the ID information in all the labels in the footprint image dataset behind the fully connected output layer vector;

step 2.5.6: connecting the fully-connected output layer vector' with a SoftMax function so as to form the fully-connected layer and correspondingly output a probability set { p ₀ ,p ₁ ,…,p _a ,…,p _A-1 }; selecting a maximum value p from the probability set _max The corresponding subscript max is used as a label identified by the normalized kth frame footprint sequence sample Image (k');

step 2.5.7: and reversely propagating the probability set into the step-by-step footprint image retrieval model, and matching with an adaptive variable learning Rate L _ Rate and cross entropy loss Cross Entry to update the shared weight W _m Weight w _ik And bias term B _m And obtaining an optimal footprint image retrieval model for realizing retrieval results of the ID information corresponding to different footprint images.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention combines image processing, deep learning and track-forming image retrieval to form a set of complete track-forming image retrieval framework. In terms of image processing: a whole set of preprocessing method is provided for the footprint images, the turn-into-turn footprint image samples are optimized, and the turn-into-turn footprint image samples can be converted into a footprint sequence sample set required by turn-into-turn footprint image retrieval model training; in terms of network structure: the one-pass footprint image retrieval model is composed of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module.

2. The image processing part strengthens the pressure characteristic information of the turn-into-turn footprint image through pseudo-colorization, and overcomes the condition that the pressure of the gray-scale footprint image is not obvious; by removing the background noise of the image, the step-by-step footprint image sample is cleaned, and the integrity of the original footprint image information is kept to the maximum extent; through cutting, each footprint image in the stepping footprint image sample is sequentially divided according to the frame sequence to form an input footprint sequence sample set of the stepping footprint image retrieval model, so that the network can better aggregate characteristic information among the footprint images in the footprint sequence sample set.

3. The preprocessing layer can obtain the multi-scale spatial information of the stepping footprint images by resetting the size of the footprint images in the footprint sequence sample set, and can well reduce the loss of partial spatial characteristic information caused by the pooling layer by fusing the multi-scale spatial information of the footprint images in the footprint sequence sample set; meanwhile, different from the traditional image normalization operation, the method provided by the invention uniformly performs the normalization operation on the footprint sequence sample set, and gives consideration to the global information of the completed footprint images.

4. The spatial feature extraction module part of the invention increases the extraction of the neural network to the feature information at the details by using a smaller convolution kernel, and adopts a shallower convolution network layer number according to the feature that the footprint image has sparsity, so that the spatial feature information in the turn-to-turn footprint image can be better extracted, and the performance of the convolutional neural network feature extraction expression is enhanced.

5. The feature fusion module of the invention obtains the feature mask by a feature erasing mode, strengthens the detail feature information of the footprint edge texture area, and then superposes the detail feature information with the complete footprint feature information and considers the overall information of the footprint feature; through the method of cutting, weighting and splicing the full connection layer, the robustness of the final pre-training model is enhanced, and the pre-training model has strong generalization capability.

6. The sequence feature extraction module of the invention completes the extraction of the time sequence information of the footprint sequence sample set by constructing a convolution long-term and short-term memory network and a full connection layer, so that the division is more effective, the distinctive feature information among the footprint images is clustered, the extraction of the global feature information and the local feature information is considered, and compared with the traditional footprint retrieval comparison method, the accuracy value of the track image retrieval in one pass is greatly improved.

Drawings

FIG. 1 is an overall flow diagram of the present invention for a stepping footprint image retrieval;

FIG. 2 is a diagram of a spatiotemporal motion and feature fusion network architecture in accordance with the present invention;

Detailed Description

In the embodiment, a footprint image retrieval method based on space-time motion and feature fusion mainly extracts space-time features in a stepping footprint sequence by using a convolutional neural network and a convolutional long-term and short-term memory network, and increases the performance of a network model in a feature fusion mode. The data set adopted by the invention comprises more than 3600 pieces of one-time-formation footprint data, the data after pretreatment about more than 36000 pieces of single-piece footprint image data totally comprises more than 100 persons, each person has at least 36 pieces of one-time-formation footprint data images, the data comprise barefoot prints, different types of sole pattern shoe prints and three different walking speeds, and each image is provided with a person ID information label. As shown in fig. 1: the whole process can be divided into the following steps:

step 1, taking a continuous one-pass footprint image of any test object at a certain walking speed, and carrying out preprocessing operations of pseudo colorization and denoising to obtain a pass footprint image sample. In the process of collecting data, different pressure information can appear due to different pressures, and the pressure characteristics are enhanced by converting single-channel gray data into pseudo-color images according to a certain proportional relation. A brand-new denoising method is designed according to the characteristics of the footprint image, the pixel value of a noise point obtained through statistics of the pixel value of the image is generally (255, X and Y), and meanwhile, two different denoising methods are designed according to the fact that noise is on the footprint image and outside the footprint image. If the noise is outside the footprint image, scanning a column of the image, if the number of black pixel values of the column is h1, the number of pixel values of (255, X, Y) is h2, the height of the image is h, and simultaneously considering that the pixel value of the footprint image is effective information of a (255, X, Y) point, and taking the threshold value of removing the pixel value of (255, X, Y) noise point when h1+ h2= h and h2> h/10; if the noise is on the footprint image, a threshold value of noise points with pixel values (255, X, Y) removed when h2> h/5 is assumed. By adopting the algorithm, the background noise of the image can be well removed, and the integrity of the original footprint image information can be greatly maintained;

and 2, cutting the stepping footprint image samples in the step 2 to obtain a footprint sequence sample set. The invention designs an algorithm, which is characterized in that pixel information of each column of a scanning track-forming image sample is counted, the ending and the beginning of a single track in a track sequence sample set are determined, the average pixel is more than five, and the average value of the sum of the two adjacent columns is taken as the column to be cut. This algorithm can divide the lap footprint image samples into a continuous footprint sequence sample set.

Step 3, respectively defining labels for each footprint sequence sample in the footprint sequence sample set in the step 3, wherein the labels comprise ID information and serial number information;

step 4, repeating the step 1 to the step 3, collecting a plurality of continuous stepping footprint images of a plurality of test objects at different walking speeds, and performing corresponding processing to form a footprint image data set;

step 5, dividing the data set into three parts according to 9; the second part is a test bottom library set which comprises two groups of fast walking, slow walking and normal walking; the third part is a test query library set which comprises a group of fast walking, slow walking and normal walking. The training set has no repeated personnel with the testing base library set and the testing query library set, and the testing base library set and the query library set are different data of the same personnel and are distributed according to the proportion of 2.

And 6, sending the footprint image data set into a space-time motion and feature fused footprint image retrieval model for training, wherein the footprint image data set converges and classifies distinctive feature information among different personnel footprint images through a preprocessing layer, a spatial information extraction module, a feature fusion module and a time sequence information extraction module in a space-time motion and feature fusion network to obtain a space-time motion and feature fused footprint image retrieval pre-training model. As shown in fig. 2: the step-by-step footprint image retrieval model consists of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module:

6.1, the pretreatment layer carries out size resetting treatment on the footprint sequence sample set X to obtain a footprint sequence sample set X' containing multi-scale features, and partial spatial feature information loss caused by the pooling layer can be well relieved by fusing multi-scale spatial information of footprint images in the footprint sequence sample set;

6.2, performing normalization processing on the footprint sequence sample set X 'containing the multi-scale features by using the formula (1) to obtain a normalized footprint sequence sample set X', wherein the overall information of the stepping footprint images is considered;

in formula (1), image (k') represents a footprint sequence sample containing multi-scale features at the kth frame; mean represents the Mean of a sample set X' of the sequence of footprints containing the multi-scale feature; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale feature; image (k ") represents the normalized kth frame footprint sequence sample.

And 6.3, the spatial feature extraction module consists of a convolutional neural network of M layers of small convolutional kernels, and the convolutional neural network of any mth layer of small convolutional kernels sequentially comprises the following steps: the convolution neural network of the 2 nd layer of small convolution kernels is also provided with a batch normalization layer between the corresponding convolution layer and the activation layer; the value range of M is [5,10].

Initializing weights of all convolution layers in the spatial feature extraction module by using an Xavier method;

obtaining an output result Z of the m-th convolutional layer by using the formula (2) _m ：

Z _m ＝W _m *X _m +B _m (2)

In the formula (2), X _m Is the step length S of the mth convolution layer _m Inputting a part of the image to be convolved; b _m Step size S of mth convolution layer _m Lower bias, W _m Step size S of mth convolution layer _m A lower sharing weight; obtaining the output size Y of the m-th convolution layer by using the formula (3) _m ：

In the formula (3), S _m Is the step size of the mth convolution layer, K _m Convolution kernel size, P, for the mth convolution layer _m The number of filled pixels of the m-th convolution layer, C _m Is the output channel of the mth convolution layer, H _m Is the height of the mth convolution layer, R _m Is the width of the mth convolution layer; the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F ₁ ,F ₂ ,···,F _k ,···,F _K }、F _k Representing a k frame footprint characteristic map obtained after the normalized k frame footprint sequence sample Image (k') is processed by a spatial characteristic extraction module; the convolution kernels of the spatial feature extraction module part are all small convolution kernels, extraction of feature information of a neural network at details can be increased, meanwhile, according to the feature that the footprint image has sparsity, the spatial feature information in the footprint image can be extracted better by adopting a range of a shallow convolution network layer number of 5-10 layers, and the performance of feature extraction expression of the convolution neural network is enhanced.

And 6.4, the feature fusion module comprises a feature mask operation layer and a full connection layer.

The feature mask operation layer is first on the output channel C of the Mth convolution layer _M Footprint characteristic map F of up-to-k frame _k Performing superposition operation to obtain a superposed kth frame feature map F' _k (ii) a Then, the feature map F 'of the k frame after superposition' _k Summing pixel point values, averaging to obtain an erasure threshold, and then carrying out feature map F 'on the k-th frame after superposition' _k The pixels in the k frame are larger than the erasing threshold value, and the characteristic map F ″' of the k frame characteristic mask is obtained _k Finally, the superposed k frame feature map F' _k And a k frame feature mask feature map F ″ _k Overlapping to obtain a k frame feature fusion map

The feature fusion module obtains a feature mask in a feature erasing mode, strengthens the detail feature information of the footprint edge texture area, and then overlaps the complete footprint feature information and considers the global information of the footprint feature;

after the processing of the characteristic mask operation layer, the feature fusion map of the kth frame is processed

Performing dimension reduction treatment to obtain the k frame characteristic fusion map->

Corresponding full connection layer vector _k Then for the full connection layer vector of the k frame _k Averagely cutting the image into I pieces to obtain I feature vectors, wherein the ith feature vector _ik The value range of I is [4,8 ]]Then the ith feature vector for the fully-connected layer of the kth frame _ik Given a weight w _ik And obtaining a k frame full link layer LastVector after feature fusion by using the formula (4) _k ：

Through the method of cutting, weighting and splicing the full connection layer, the robustness of the final pre-training model is effectively enhanced, the final pre-training model has strong generalization capability, and the operations of the pre-processing layer, the spatial information extraction module and the feature extraction module are repeated, so that the K frame full connection layer { LastVector after feature fusion is obtained _k L K =1,2, \8230 |, K }; the preprocessing layer can obtain the multi-scale spatial information of the track images in the track sequence sample set by resetting the size of the track images in the track sequence sample set, and can well reduce the loss of partial spatial characteristic information caused by the pooling layer by fusing the multi-scale spatial information of the track images in the track sequence sample set; meanwhile, different from the traditional image normalization operation, the method uniformly performs the normalization operation on the footprint sequence sample set, and gives consideration to the overall information of the completed footprint images;

and 6.5, the time sequence feature extraction module consists of a ConvLSTM convolution long-term and short-term memory network and a full connection layer.

Firstly, a K frame full connection layer { LastVector after feature fusion _k I K =1,2, \8230, K carries out dimension raising operation to obtain a network input vector after dimension raising, then carries out initialization weight on the ConvLSTM convolution long-short term memory network by using Gaussian distribution, and then carries out extraction of sequence characteristic information on the network input vector after dimension raising by using the initialized ConvLSTM convolution long-short term memory network to obtain a time sequence characteristic map F'; the convolution long-short term memory network can perfect the extraction of the time series information of the footprint series sample set and take the global characteristic information and the local characteristic information into consideration;

after the processing of the convolution long-term and short-term memory network, performing dimensionality reduction on the sequence feature map F 'to obtain a full-connection layer vector, and then connecting a full-connection output layer vector' which has the same dimensionality A as the variety number of the ID information in all the labels in the footprint image dataset behind the full-connection layer vector; then, connecting the vector' of the fully-connected output layer with the SoftMax function, thereby forming the fully-connected layer and correspondingly outputting a probability set { p ₀ ,p ₁ ,…,p _a ,…,p _A-1 }; then selecting the maximum value p from the probability set _max The corresponding subscript max serves as a label identified by the normalized kth frame footprint sequence sample Image (k'), and finally the probability set is propagated reversely to the step-by-step footprint Image retrieval model and matched with the self-adaptive variable learning Rate L _ Rate and the cross entropy loss Cross Engine, so that the shared weight W is updated _m Weight w _ik And bias term B _m And obtaining an optimal track image retrieval model for realizing retrieval results of the ID information corresponding to different track images. The method has the advantages that the method is more effective in division, the distinctive feature information among the footprint images is clustered, and compared with the traditional footprint retrieval comparison method, the accurate value of the track-in-track image retrieval is greatly improved.

The convolution long-short term memory network has three gates at each sequence index position, including an input gate, an output gate and a forgetting gate.As a variation of Recurrent Neural Networks (RNNs), covnLSTM can learn long-term dependency information and can give good consideration to both global and local feature information. The input gate (InputGate) contains two parts: the first part is the Sigmoid layer, which decides what values we are going to update, and the second part is the Tanh layer, which creates a vector of candidate values

Candidate value vector pick>

Will be added to the hidden state C _t The preparation method comprises the following steps of (1) performing; thereby obtaining a new hidden cell state C _t (ii) a The output gate (OutputGate) determines the final C _t How much information is sent to the hidden state h _t The preparation method comprises the following steps of (1) performing; the forgetting gate (ForgetGate) controls the amount of the previous layer of hidden state information in CovnLSTM with a certain probability. The functions corresponding to the input gate, the output gate and the forgetting gate are Sigmoid functions, and the Sigmoid functions are used for controlling the percentage of gate filtering.

The input gate is as shown in equation (5):

in the formula (5), σ is a Sigmoid function, h _t-1 Is the previous hidden cell state, x _t Is the current input, i _t Is the final output of the input gate, W _i Is the input gate weight parameter, b _i Is the input of the gate bias parameter,

it is determined how much input information enters the cell state, W, at that time _c Is to determine how much input information enters the cell state weight parameter at that time, b _c Determining how much input information enters the cell state bias parameter at the moment;

cell state C _t Is shown as (6):

in the formula (6), f _t Is the last output of the forgetting gate, i _t Is the last output of the input gate, c _t-1 Is the state of the cells at the previous moment,

determining how much input information enters the cell state at the moment;

the formula of the output gate is shown as (7):

in the formula (7), σ is Sigmoid function, o _t Is the final output of the output gate, W _o Is the output gate weight parameter, b _o Is an output gate offset parameter, C _t Is the cell state, h _t Is the hidden state at this time;

the forgetting gate is shown as (8):

f _t ＝σ(W _f *[h _t-1 ,x _t ]+b _f ) (8)

in the formula (8), σ is a Sigmoid function, h _t-1 Is the previous hidden cell state, x _t Is the current input, f _t Is the last output of the forgetting gate, W _f Is a forgetting gate weight parameter, b _f Is a forgetting gate bias parameter;

the Sigmoid formula is shown as (9):

inputting the preprocessed test query set data and the test base set data into a footprint image retrieval pre-training model with space-time motion and feature fusion to extract features, comparing the footprint image feature information extracted by the test query set with the feature information extracted by the test base set, calculating the difference between the two by using cosine to obtain a retrieved index map value and a rank value, and adjusting the setting of network related parameters according to the test result to enable the network to be optimal.

Claims

1. A footprint image retrieval method based on space-time motion and feature fusion is characterized by comprising the following steps:

step 1: constructing a training set and a test set;

step 1.1: collecting continuous one-pass footprint images of any test object at a certain walking speed;

step 1.3: sequentially dividing each footprint image in the stepping footprint image samples according to a frame sequence to obtain a footprint sequence sample set X = { X = _k |k＝1,2,3,···,K}；x _k Representing a kth frame footprint sequence sample; k is more than or equal to 1 and less than or equal to K; k represents the total number of footprints in the sample of the stepping image;

step 1.5: repeating the steps 1.1-1.4, so as to collect a plurality of continuous stepping footprint images of a plurality of test objects at different walking speeds and carry out corresponding processing, thereby forming a footprint image data set;

step 2: establishing a stepping footprint image retrieval model with space-time motion and feature fusion, wherein the stepping footprint image retrieval model consists of a pretreatment layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module;

in formula (1), image (k') represents footprint sequence samples containing multi-scale features at the kth frame; mean represents the Mean of the sample set X' of footprint sequences containing multi-scale features; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale feature; image (k ") represents the normalized kth frame footprint sequence sample;

Z _m ＝W _m *X _m +B _m (2)

In the formula (2), X _m Step size S of mth convolution layer _m Inputting a part of the image to be convolved; b _m Is the step length S of the mth convolution layer _m Bias down, W _m Is the step length S of the mth convolution layer _m A lower sharing weight;

step 2.3.3: obtaining an output dimension Y of the mth convolution layer by the equation (3) _m ：

In the formula (3), S _m Is the step size of the mth convolution layer, K _m Convolution kernel size, P, of the mth convolution layer _m The number of filled pixels of the m-th convolution layer, C _m Is the output channel of the mth convolution layer, H _m Is the height of the mth convolution layer, R _m Is the width of the mth convolution layer;

step 2.3.4: the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F ₁ ,F ₂ ,···,F _k ,···,F _K }、F _k Representing a k frame footprint characteristic map obtained after the normalized k frame footprint sequence sample Image (k') is processed by the spatial characteristic extraction module;

Step 2.4.2: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' _k Summing pixel point values, averaging to obtain an erasure threshold value, and performing superposition on a k-th frame feature map F' _k The pixels in the k frame are larger than the erasing threshold value, and the erasing operation is carried out, so that a characteristic map F & lt & gt of the k frame characteristic mask is obtained _k ；

Step 2.4.3: the k characteristic mask operation layer is used for processing the superposed k frame characteristic map F' _k And a k frame feature mask feature map F ″ _k Overlapping to obtain the k frame feature fusion map

Step 2.4.4: fusing the characteristics of the kth frame with a map

Performing dimensionality reduction treatment to obtainFrame k feature fusion map->

Corresponding full connection layer vector _k ；

Step 2.4.5: full connection layer vector for k frame _k Averagely cutting the image into I slices to obtain I feature vectors, wherein the ith feature vector _ik The value range of I is [4,8 ]]；

step (ii) of2.5.6: connecting the fully-connected output layer vector' with a SoftMax function so as to form the fully-connected layer and correspondingly output a probability set { p ₀ ,p ₁ ,…,p _a ,…,p _A-1 }; selecting a maximum value p from the probability set _max The corresponding subscript max is used as a label identified by the normalized kth frame footprint sequence sample Image (k');

step 2.5.7: and reversely propagating the probability set into the step-by-step footprint image retrieval model, and matching with an adaptive variable learning Rate L _ Rate and Cross Entropy loss Cross Engine, thereby updating the shared weight W _m Weight w _ik And bias term B _m And obtaining an optimal track image retrieval model for realizing retrieval results of the ID information corresponding to different track images.