CN111639719B - Footprint image retrieval method based on space-time motion and feature fusion - Google Patents

Footprint image retrieval method based on space-time motion and feature fusion Download PDF

Info

Publication number
CN111639719B
CN111639719B CN202010511912.0A CN202010511912A CN111639719B CN 111639719 B CN111639719 B CN 111639719B CN 202010511912 A CN202010511912 A CN 202010511912A CN 111639719 B CN111639719 B CN 111639719B
Authority
CN
China
Prior art keywords
footprint
layer
feature
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010511912.0A
Other languages
Chinese (zh)
Other versions
CN111639719A (en
Inventor
唐俊
鹿新
王年
朱明�
樊旭晨
吴洛天
李双双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202010511912.0A priority Critical patent/CN111639719B/en
Publication of CN111639719A publication Critical patent/CN111639719A/en
Application granted granted Critical
Publication of CN111639719B publication Critical patent/CN111639719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a footprint image retrieval method based on space-time motion and feature fusion, which comprises the following steps: 1. preparing a step-by-step footprint image data set; 2. establishing a step-forming footprint image preprocessing module; 3. establishing a preprocessing layer of multi-scale stepping footprint images and integral normalization; 4. initializing weight, 5, establishing a spatial feature extraction module; 6. establishing a time sequence feature extraction module; 7. and training, testing and optimizing the network. According to the invention, the spatial characteristic information and the time sequence characteristic information of the turn-into-turn footprint image are extracted, and a specific characteristic fusion module is combined, so that richer time-space information of the turn-into-turn footprint image can be obtained, and the differential characteristic information among different people can be clustered, thereby greatly improving the accurate value of the turn-into-turn footprint image retrieval.

Description

Footprint image retrieval method based on space-time motion and feature fusion
Technical Field
The invention relates to the field of image processing and degree learning, in particular to a stepping footprint image retrieval method based on space-time motion and feature fusion.
Background
Due to the influence of some factors such as bones and acquired living habits, the footprint image has the characteristic of being difficult to disguise, and is more unique and unique compared with other marks such as palm prints and fingerprints. The research on the footprint images not only has scientific research significance, but also can be applied to the fields of commerce, security protection, criminal investigation and the like.
In recent years, the rapid rise of deep learning makes the turn-by-turn footprint image retrieval have a new breakthrough, and the neural network has strong learning ability. The turn-into-turn footprint image is subjected to deep learning, so that the manpower and material resources for analyzing and processing data can be reduced, and the efficiency and the accuracy of turn-into-turn footprint image retrieval are greatly improved. In general, the conventional method of searching the footmark images in batches depends on some experiences of experts or simple comparison algorithms, and the methods have low accuracy. But also consumes a lot of time, manpower and material resources.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a footprint image retrieval method based on space-time motion and feature fusion so as to obtain richer space-time information of the turn-into-turn footprint image and cluster differential feature information among different people, thereby improving the accuracy and efficiency of the turn-into-turn footprint image retrieval.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a footprint image retrieval method based on space-time motion and feature fusion, which is characterized by comprising the following steps of:
step 1: constructing a training set and a test set;
step 1.1: collecting continuous stepping footprint images of any test object at a certain walking speed;
step 1.2: respectively carrying out pseudo-colorization and denoising treatment on the turn-into-turn footprint image to obtain a processed turn-into-turn footprint image sample;
step 1.3: sequentially dividing each footprint image in the track-forming footprint image samples according to a frame sequence to obtain a footprint sequence sample set X = { X = k |k=1,2,3,···,K};x k Representing a kth frame footprint sequence sample; k is more than or equal to 1 and less than or equal to K; k represents the total number of footprints in the sample of the stepping image;
step 1.4: respectively defining a label for each footprint sequence sample in the footprint sequence sample set, wherein the label comprises ID information and serial number information;
step 1.5: repeating the steps 1.1-1.4, so as to collect a plurality of continuous stepping footprint images of a plurality of test objects under different walking speeds and carry out corresponding processing, thereby forming a footprint image data set;
step 1.6: dividing a footprint image data set into a test set and a training set, and subdividing the test set into a test query set and a test base library set;
and 2, step: establishing a stepping footprint image retrieval model with space-time motion and feature fusion, wherein the stepping footprint image retrieval model consists of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module;
step 2.1: the preprocessing layer carries out size resetting processing on the footprint sequence sample set X to obtain a footprint sequence sample set X' containing multi-scale features;
step 2.2: the preprocessing layer utilizes a formula (1) to carry out normalization processing on the footprint sequence sample set X 'containing the multi-scale features to obtain a normalized footprint sequence sample set X';
Figure SMS_1
in formula (1), image (k') represents footprint sequence samples containing multi-scale features at the kth frame; mean represents the Mean of a sample set X' of the sequence of footprints containing the multi-scale feature; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale features; image (k ") represents the normalized kth frame footprint sequence sample;
step 2.3: establishing a spatial feature extraction module consisting of a convolutional neural network of M layers of small convolutional kernels, wherein the convolutional neural network of any mth layer of small convolutional kernels sequentially comprises the following steps: the convolution neural network of the 2 nd layer of small convolution kernels is also provided with a batch normalization layer between the corresponding convolution layer and the activation layer; the value range of M is [5,10];
step 2.3.1: initializing weights of all convolution layers in the spatial feature extraction module by using an Xavier method;
step 2.3.2: obtaining an output result Z of the m-th convolutional layer by using the formula (2) m
Z m =W m *X m +B m (2)
In the formula (2), X m Step size S of mth convolution layer m Inputting a part of the image to be convolved; b m Is the step length S of the mth convolution layer m Bias down, W m Step size S of mth convolution layer m A lower sharing weight;
step 2.3.3: obtaining the output size Y of the m-th convolution layer by using the formula (3) m
Figure SMS_2
In formula (3), S m Is the step size of the mth convolution layer, K m Convolution kernel size, P, of the mth convolution layer m Number of filled pixels of m-th convolution layer, C m Is the output channel of the mth convolution layer, H m Is the height, R, of the mth convolution layer m Is the width of the mth convolution layer;
step 2.3.4: the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F 1 ,F 2 ,···,F k ,···,F K }、F k The footprint characteristic map of the kth frame is obtained after the normalized footprint sequence sample Image (k') is processed by the spatial characteristic extraction module;
step 2.4: constructing a feature fusion module consisting of K feature mask operation layers and K frame full-connection layers;
step 2.4.1: output channel C of k characteristic mask operation layer on M convolution layer M Footprint characteristic map F of up-to-k frame k Performing superposition operation to obtain a superposed kth frame feature map F' k
Step 2.4.2: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' k Summing pixel point values, averaging to obtain an erasing threshold value, and adding the overlapped kth frame feature map F' k The pixels in the k frame are larger than the erasing threshold value, and the characteristic map F ″' of the k frame characteristic mask is obtained k
Step 2.4.3: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' k And a k frame feature mask feature map F ″ k Overlapping to obtain the k frame feature fusion map
Figure SMS_3
Step 2.4.4: fusing the map on the k frame characteristic
Figure SMS_4
Performing dimensionality reduction to obtain a kth frame feature fusion map
Figure SMS_5
Corresponding full connection layer vector k
Step 2.4.5: full connection layer vector for k frame k Averagely cutting the image into I pieces to obtain I feature vectors, wherein the ith feature vector ik And the value range of I is [4,8 ]];
Step 2.4.6: ith feature vector for full link layer of kth frame ik Given a weight w ik And obtaining a k frame full link layer LastVector after feature fusion by using the formula (4) k
Figure SMS_6
Step 2.4.7: repeating the steps 2.4.1-2.4.6 to obtain the K frame full connection layer { LastVector after feature fusion k |k=1,2,…,K};
Step 2.5: constructing a time sequence characteristic extraction module consisting of a ConvLSTM convolution long-term and short-term memory network and a full connection layer;
step 2.5.1: for the K frame full connection layer { LastVector after the feature fusion k I K =1,2, \8230, K } performs dimensionality increasing operation to obtain a network input vector after dimensionality increasing;
step 2.5.2: initializing a weight value of the ConvLSTM convolution long-term and short-term memory network by using Gaussian distribution;
step 2.5.3: extracting sequence feature information of the network input vector after the dimension is increased by using the initialized ConvLSTM convolution long-short term memory network, thereby obtaining a time sequence feature map F';
step 2.5.4: performing dimensionality reduction on the time sequence feature map F' to obtain a full connection layer vector;
step 2.5.5: connecting a fully connected output layer vector' which is the same as the dimension A of the number of the types of the ID information in all the labels in the footprint image dataset behind the fully connected output layer vector;
step 2.5.6: connecting the fully-connected output layer vector' with a SoftMax function so as to form the fully-connected layer and correspondingly output a probability set { p 0 ,p 1 ,…,p a ,…,p A-1 }; selecting a maximum value p from the probability set max The corresponding subscript max is used as a label identified by the normalized kth frame footprint sequence sample Image (k');
step 2.5.7: and reversely propagating the probability set into the step-by-step footprint image retrieval model, and matching with an adaptive variable learning Rate L _ Rate and cross entropy loss Cross Entry to update the shared weight W m Weight w ik And bias term B m And obtaining an optimal footprint image retrieval model for realizing retrieval results of the ID information corresponding to different footprint images.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention combines image processing, deep learning and track-forming image retrieval to form a set of complete track-forming image retrieval framework. In terms of image processing: a whole set of preprocessing method is provided for the footprint images, the turn-into-turn footprint image samples are optimized, and the turn-into-turn footprint image samples can be converted into a footprint sequence sample set required by turn-into-turn footprint image retrieval model training; in terms of network structure: the one-pass footprint image retrieval model is composed of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module.
2. The image processing part strengthens the pressure characteristic information of the turn-into-turn footprint image through pseudo-colorization, and overcomes the condition that the pressure of the gray-scale footprint image is not obvious; by removing the background noise of the image, the step-by-step footprint image sample is cleaned, and the integrity of the original footprint image information is kept to the maximum extent; through cutting, each footprint image in the stepping footprint image sample is sequentially divided according to the frame sequence to form an input footprint sequence sample set of the stepping footprint image retrieval model, so that the network can better aggregate characteristic information among the footprint images in the footprint sequence sample set.
3. The preprocessing layer can obtain the multi-scale spatial information of the stepping footprint images by resetting the size of the footprint images in the footprint sequence sample set, and can well reduce the loss of partial spatial characteristic information caused by the pooling layer by fusing the multi-scale spatial information of the footprint images in the footprint sequence sample set; meanwhile, different from the traditional image normalization operation, the method provided by the invention uniformly performs the normalization operation on the footprint sequence sample set, and gives consideration to the global information of the completed footprint images.
4. The spatial feature extraction module part of the invention increases the extraction of the neural network to the feature information at the details by using a smaller convolution kernel, and adopts a shallower convolution network layer number according to the feature that the footprint image has sparsity, so that the spatial feature information in the turn-to-turn footprint image can be better extracted, and the performance of the convolutional neural network feature extraction expression is enhanced.
5. The feature fusion module of the invention obtains the feature mask by a feature erasing mode, strengthens the detail feature information of the footprint edge texture area, and then superposes the detail feature information with the complete footprint feature information and considers the overall information of the footprint feature; through the method of cutting, weighting and splicing the full connection layer, the robustness of the final pre-training model is enhanced, and the pre-training model has strong generalization capability.
6. The sequence feature extraction module of the invention completes the extraction of the time sequence information of the footprint sequence sample set by constructing a convolution long-term and short-term memory network and a full connection layer, so that the division is more effective, the distinctive feature information among the footprint images is clustered, the extraction of the global feature information and the local feature information is considered, and compared with the traditional footprint retrieval comparison method, the accuracy value of the track image retrieval in one pass is greatly improved.
Drawings
FIG. 1 is an overall flow diagram of the present invention for a stepping footprint image retrieval;
FIG. 2 is a diagram of a spatiotemporal motion and feature fusion network architecture in accordance with the present invention;
Detailed Description
In the embodiment, a footprint image retrieval method based on space-time motion and feature fusion mainly extracts space-time features in a stepping footprint sequence by using a convolutional neural network and a convolutional long-term and short-term memory network, and increases the performance of a network model in a feature fusion mode. The data set adopted by the invention comprises more than 3600 pieces of one-time-formation footprint data, the data after pretreatment about more than 36000 pieces of single-piece footprint image data totally comprises more than 100 persons, each person has at least 36 pieces of one-time-formation footprint data images, the data comprise barefoot prints, different types of sole pattern shoe prints and three different walking speeds, and each image is provided with a person ID information label. As shown in fig. 1: the whole process can be divided into the following steps:
step 1, taking a continuous one-pass footprint image of any test object at a certain walking speed, and carrying out preprocessing operations of pseudo colorization and denoising to obtain a pass footprint image sample. In the process of collecting data, different pressure information can appear due to different pressures, and the pressure characteristics are enhanced by converting single-channel gray data into pseudo-color images according to a certain proportional relation. A brand-new denoising method is designed according to the characteristics of the footprint image, the pixel value of a noise point obtained through statistics of the pixel value of the image is generally (255, X and Y), and meanwhile, two different denoising methods are designed according to the fact that noise is on the footprint image and outside the footprint image. If the noise is outside the footprint image, scanning a column of the image, if the number of black pixel values of the column is h1, the number of pixel values of (255, X, Y) is h2, the height of the image is h, and simultaneously considering that the pixel value of the footprint image is effective information of a (255, X, Y) point, and taking the threshold value of removing the pixel value of (255, X, Y) noise point when h1+ h2= h and h2> h/10; if the noise is on the footprint image, a threshold value of noise points with pixel values (255, X, Y) removed when h2> h/5 is assumed. By adopting the algorithm, the background noise of the image can be well removed, and the integrity of the original footprint image information can be greatly maintained;
and 2, cutting the stepping footprint image samples in the step 2 to obtain a footprint sequence sample set. The invention designs an algorithm, which is characterized in that pixel information of each column of a scanning track-forming image sample is counted, the ending and the beginning of a single track in a track sequence sample set are determined, the average pixel is more than five, and the average value of the sum of the two adjacent columns is taken as the column to be cut. This algorithm can divide the lap footprint image samples into a continuous footprint sequence sample set.
Step 3, respectively defining labels for each footprint sequence sample in the footprint sequence sample set in the step 3, wherein the labels comprise ID information and serial number information;
step 4, repeating the step 1 to the step 3, collecting a plurality of continuous stepping footprint images of a plurality of test objects at different walking speeds, and performing corresponding processing to form a footprint image data set;
step 5, dividing the data set into three parts according to 9; the second part is a test bottom library set which comprises two groups of fast walking, slow walking and normal walking; the third part is a test query library set which comprises a group of fast walking, slow walking and normal walking. The training set has no repeated personnel with the testing base library set and the testing query library set, and the testing base library set and the query library set are different data of the same personnel and are distributed according to the proportion of 2.
And 6, sending the footprint image data set into a space-time motion and feature fused footprint image retrieval model for training, wherein the footprint image data set converges and classifies distinctive feature information among different personnel footprint images through a preprocessing layer, a spatial information extraction module, a feature fusion module and a time sequence information extraction module in a space-time motion and feature fusion network to obtain a space-time motion and feature fused footprint image retrieval pre-training model. As shown in fig. 2: the step-by-step footprint image retrieval model consists of a preprocessing layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module:
6.1, the pretreatment layer carries out size resetting treatment on the footprint sequence sample set X to obtain a footprint sequence sample set X' containing multi-scale features, and partial spatial feature information loss caused by the pooling layer can be well relieved by fusing multi-scale spatial information of footprint images in the footprint sequence sample set;
6.2, performing normalization processing on the footprint sequence sample set X 'containing the multi-scale features by using the formula (1) to obtain a normalized footprint sequence sample set X', wherein the overall information of the stepping footprint images is considered;
Figure SMS_7
in formula (1), image (k') represents a footprint sequence sample containing multi-scale features at the kth frame; mean represents the Mean of a sample set X' of the sequence of footprints containing the multi-scale feature; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale feature; image (k ") represents the normalized kth frame footprint sequence sample.
And 6.3, the spatial feature extraction module consists of a convolutional neural network of M layers of small convolutional kernels, and the convolutional neural network of any mth layer of small convolutional kernels sequentially comprises the following steps: the convolution neural network of the 2 nd layer of small convolution kernels is also provided with a batch normalization layer between the corresponding convolution layer and the activation layer; the value range of M is [5,10].
Initializing weights of all convolution layers in the spatial feature extraction module by using an Xavier method;
obtaining an output result Z of the m-th convolutional layer by using the formula (2) m
Z m =W m *X m +B m (2)
In the formula (2), X m Is the step length S of the mth convolution layer m Inputting a part of the image to be convolved; b m Step size S of mth convolution layer m Lower bias, W m Step size S of mth convolution layer m A lower sharing weight; obtaining the output size Y of the m-th convolution layer by using the formula (3) m
Figure SMS_8
In the formula (3), S m Is the step size of the mth convolution layer, K m Convolution kernel size, P, for the mth convolution layer m The number of filled pixels of the m-th convolution layer, C m Is the output channel of the mth convolution layer, H m Is the height of the mth convolution layer, R m Is the width of the mth convolution layer; the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F 1 ,F 2 ,···,F k ,···,F K }、F k Representing a k frame footprint characteristic map obtained after the normalized k frame footprint sequence sample Image (k') is processed by a spatial characteristic extraction module; the convolution kernels of the spatial feature extraction module part are all small convolution kernels, extraction of feature information of a neural network at details can be increased, meanwhile, according to the feature that the footprint image has sparsity, the spatial feature information in the footprint image can be extracted better by adopting a range of a shallow convolution network layer number of 5-10 layers, and the performance of feature extraction expression of the convolution neural network is enhanced.
And 6.4, the feature fusion module comprises a feature mask operation layer and a full connection layer.
The feature mask operation layer is first on the output channel C of the Mth convolution layer M Footprint characteristic map F of up-to-k frame k Performing superposition operation to obtain a superposed kth frame feature map F' k (ii) a Then, the feature map F 'of the k frame after superposition' k Summing pixel point values, averaging to obtain an erasure threshold, and then carrying out feature map F 'on the k-th frame after superposition' k The pixels in the k frame are larger than the erasing threshold value, and the characteristic map F ″' of the k frame characteristic mask is obtained k Finally, the superposed k frame feature map F' k And a k frame feature mask feature map F ″ k Overlapping to obtain a k frame feature fusion map
Figure SMS_9
The feature fusion module obtains a feature mask in a feature erasing mode, strengthens the detail feature information of the footprint edge texture area, and then overlaps the complete footprint feature information and considers the global information of the footprint feature;
after the processing of the characteristic mask operation layer, the feature fusion map of the kth frame is processed
Figure SMS_10
Performing dimension reduction treatment to obtain the k frame characteristic fusion map->
Figure SMS_11
Corresponding full connection layer vector k Then for the full connection layer vector of the k frame k Averagely cutting the image into I pieces to obtain I feature vectors, wherein the ith feature vector ik The value range of I is [4,8 ]]Then the ith feature vector for the fully-connected layer of the kth frame ik Given a weight w ik And obtaining a k frame full link layer LastVector after feature fusion by using the formula (4) k
Figure SMS_12
Through the method of cutting, weighting and splicing the full connection layer, the robustness of the final pre-training model is effectively enhanced, the final pre-training model has strong generalization capability, and the operations of the pre-processing layer, the spatial information extraction module and the feature extraction module are repeated, so that the K frame full connection layer { LastVector after feature fusion is obtained k L K =1,2, \8230 |, K }; the preprocessing layer can obtain the multi-scale spatial information of the track images in the track sequence sample set by resetting the size of the track images in the track sequence sample set, and can well reduce the loss of partial spatial characteristic information caused by the pooling layer by fusing the multi-scale spatial information of the track images in the track sequence sample set; meanwhile, different from the traditional image normalization operation, the method uniformly performs the normalization operation on the footprint sequence sample set, and gives consideration to the overall information of the completed footprint images;
and 6.5, the time sequence feature extraction module consists of a ConvLSTM convolution long-term and short-term memory network and a full connection layer.
Firstly, a K frame full connection layer { LastVector after feature fusion k I K =1,2, \8230, K carries out dimension raising operation to obtain a network input vector after dimension raising, then carries out initialization weight on the ConvLSTM convolution long-short term memory network by using Gaussian distribution, and then carries out extraction of sequence characteristic information on the network input vector after dimension raising by using the initialized ConvLSTM convolution long-short term memory network to obtain a time sequence characteristic map F'; the convolution long-short term memory network can perfect the extraction of the time series information of the footprint series sample set and take the global characteristic information and the local characteristic information into consideration;
after the processing of the convolution long-term and short-term memory network, performing dimensionality reduction on the sequence feature map F 'to obtain a full-connection layer vector, and then connecting a full-connection output layer vector' which has the same dimensionality A as the variety number of the ID information in all the labels in the footprint image dataset behind the full-connection layer vector; then, connecting the vector' of the fully-connected output layer with the SoftMax function, thereby forming the fully-connected layer and correspondingly outputting a probability set { p 0 ,p 1 ,…,p a ,…,p A-1 }; then selecting the maximum value p from the probability set max The corresponding subscript max serves as a label identified by the normalized kth frame footprint sequence sample Image (k'), and finally the probability set is propagated reversely to the step-by-step footprint Image retrieval model and matched with the self-adaptive variable learning Rate L _ Rate and the cross entropy loss Cross Engine, so that the shared weight W is updated m Weight w ik And bias term B m And obtaining an optimal track image retrieval model for realizing retrieval results of the ID information corresponding to different track images. The method has the advantages that the method is more effective in division, the distinctive feature information among the footprint images is clustered, and compared with the traditional footprint retrieval comparison method, the accurate value of the track-in-track image retrieval is greatly improved.
The convolution long-short term memory network has three gates at each sequence index position, including an input gate, an output gate and a forgetting gate.As a variation of Recurrent Neural Networks (RNNs), covnLSTM can learn long-term dependency information and can give good consideration to both global and local feature information. The input gate (InputGate) contains two parts: the first part is the Sigmoid layer, which decides what values we are going to update, and the second part is the Tanh layer, which creates a vector of candidate values
Figure SMS_13
Candidate value vector pick>
Figure SMS_14
Will be added to the hidden state C t The preparation method comprises the following steps of (1) performing; thereby obtaining a new hidden cell state C t (ii) a The output gate (OutputGate) determines the final C t How much information is sent to the hidden state h t The preparation method comprises the following steps of (1) performing; the forgetting gate (ForgetGate) controls the amount of the previous layer of hidden state information in CovnLSTM with a certain probability. The functions corresponding to the input gate, the output gate and the forgetting gate are Sigmoid functions, and the Sigmoid functions are used for controlling the percentage of gate filtering.
The input gate is as shown in equation (5):
Figure SMS_15
in the formula (5), σ is a Sigmoid function, h t-1 Is the previous hidden cell state, x t Is the current input, i t Is the final output of the input gate, W i Is the input gate weight parameter, b i Is the input of the gate bias parameter,
Figure SMS_16
it is determined how much input information enters the cell state, W, at that time c Is to determine how much input information enters the cell state weight parameter at that time, b c Determining how much input information enters the cell state bias parameter at the moment;
cell state C t Is shown as (6):
Figure SMS_17
in the formula (6), f t Is the last output of the forgetting gate, i t Is the last output of the input gate, c t-1 Is the state of the cells at the previous moment,
Figure SMS_18
determining how much input information enters the cell state at the moment;
the formula of the output gate is shown as (7):
Figure SMS_19
in the formula (7), σ is Sigmoid function, o t Is the final output of the output gate, W o Is the output gate weight parameter, b o Is an output gate offset parameter, C t Is the cell state, h t Is the hidden state at this time;
the forgetting gate is shown as (8):
f t =σ(W f *[h t-1 ,x t ]+b f ) (8)
in the formula (8), σ is a Sigmoid function, h t-1 Is the previous hidden cell state, x t Is the current input, f t Is the last output of the forgetting gate, W f Is a forgetting gate weight parameter, b f Is a forgetting gate bias parameter;
the Sigmoid formula is shown as (9):
Figure SMS_20
inputting the preprocessed test query set data and the test base set data into a footprint image retrieval pre-training model with space-time motion and feature fusion to extract features, comparing the footprint image feature information extracted by the test query set with the feature information extracted by the test base set, calculating the difference between the two by using cosine to obtain a retrieved index map value and a rank value, and adjusting the setting of network related parameters according to the test result to enable the network to be optimal.

Claims (1)

1. A footprint image retrieval method based on space-time motion and feature fusion is characterized by comprising the following steps:
step 1: constructing a training set and a test set;
step 1.1: collecting continuous one-pass footprint images of any test object at a certain walking speed;
step 1.2: respectively carrying out pseudo-colorization and denoising treatment on the turn-into-turn footprint image to obtain a processed turn-into-turn footprint image sample;
step 1.3: sequentially dividing each footprint image in the stepping footprint image samples according to a frame sequence to obtain a footprint sequence sample set X = { X = k |k=1,2,3,···,K};x k Representing a kth frame footprint sequence sample; k is more than or equal to 1 and less than or equal to K; k represents the total number of footprints in the sample of the stepping image;
step 1.4: respectively defining a label for each footprint sequence sample in the footprint sequence sample set, wherein the label comprises ID information and serial number information;
step 1.5: repeating the steps 1.1-1.4, so as to collect a plurality of continuous stepping footprint images of a plurality of test objects at different walking speeds and carry out corresponding processing, thereby forming a footprint image data set;
step 1.6: dividing a footprint image data set into a test set and a training set, and subdividing the test set into a test query set and a test base library set;
step 2: establishing a stepping footprint image retrieval model with space-time motion and feature fusion, wherein the stepping footprint image retrieval model consists of a pretreatment layer, a spatial feature extraction module, a feature fusion module and a time sequence feature extraction module;
step 2.1: the preprocessing layer carries out size resetting processing on the footprint sequence sample set X to obtain a footprint sequence sample set X' containing multi-scale features;
step 2.2: the preprocessing layer utilizes a formula (1) to carry out normalization processing on the footprint sequence sample set X 'containing the multi-scale features to obtain a normalized footprint sequence sample set X';
Figure QLYQS_1
in formula (1), image (k') represents footprint sequence samples containing multi-scale features at the kth frame; mean represents the Mean of the sample set X' of footprint sequences containing multi-scale features; std represents the variance of the sample set X' of the footprint sequence containing the multi-scale feature; image (k ") represents the normalized kth frame footprint sequence sample;
step 2.3: establishing a spatial feature extraction module consisting of a convolutional neural network of M layers of small convolutional kernels, wherein the convolutional neural network of any mth layer of small convolutional kernels sequentially comprises the following steps: the convolution neural network of the 2 nd layer of small convolution kernels is also provided with a batch normalization layer between the corresponding convolution layer and the activation layer; the value range of M is [5,10];
step 2.3.1: initializing weights of all convolution layers in the spatial feature extraction module by using an Xavier method;
step 2.3.2: obtaining an output result Z of the m-th convolutional layer by using the formula (2) m
Z m =W m *X m +B m (2)
In the formula (2), X m Step size S of mth convolution layer m Inputting a part of the image to be convolved; b m Is the step length S of the mth convolution layer m Bias down, W m Is the step length S of the mth convolution layer m A lower sharing weight;
step 2.3.3: obtaining an output dimension Y of the mth convolution layer by the equation (3) m
Figure QLYQS_2
In the formula (3), S m Is the step size of the mth convolution layer, K m Convolution kernel size, P, of the mth convolution layer m The number of filled pixels of the m-th convolution layer, C m Is the output channel of the mth convolution layer, H m Is the height of the mth convolution layer, R m Is the width of the mth convolution layer;
step 2.3.4: the spatial feature extraction module processes the normalized footprint sequence sample set X' and outputs a spatial feature sequence { F 1 ,F 2 ,···,F k ,···,F K }、F k Representing a k frame footprint characteristic map obtained after the normalized k frame footprint sequence sample Image (k') is processed by the spatial characteristic extraction module;
step 2.4: constructing a feature fusion module consisting of K feature mask operation layers and K frame full-connection layers;
step 2.4.1: output channel C of k characteristic mask operation layer on M convolution layer M Footprint characteristic map F of up-to-k frame k Performing superposition operation to obtain a superposed kth frame feature map F' k
Step 2.4.2: the k characteristic mask operation layer is used for the superposed k frame characteristic map F' k Summing pixel point values, averaging to obtain an erasure threshold value, and performing superposition on a k-th frame feature map F' k The pixels in the k frame are larger than the erasing threshold value, and the erasing operation is carried out, so that a characteristic map F & lt & gt of the k frame characteristic mask is obtained k
Step 2.4.3: the k characteristic mask operation layer is used for processing the superposed k frame characteristic map F' k And a k frame feature mask feature map F ″ k Overlapping to obtain the k frame feature fusion map
Figure QLYQS_3
Step 2.4.4: fusing the characteristics of the kth frame with a map
Figure QLYQS_4
Performing dimensionality reduction treatment to obtainFrame k feature fusion map->
Figure QLYQS_5
Corresponding full connection layer vector k
Step 2.4.5: full connection layer vector for k frame k Averagely cutting the image into I slices to obtain I feature vectors, wherein the ith feature vector ik The value range of I is [4,8 ]];
Step 2.4.6: ith feature vector for full link layer of kth frame ik Given a weight w ik And obtaining a k frame full link layer LastVector after feature fusion by using the formula (4) k
Figure QLYQS_6
Step 2.4.7: repeating the steps 2.4.1-2.4.6 to obtain the K frame full connection layer { LastVector after feature fusion k |k=1,2,…,K};
Step 2.5: constructing a time sequence characteristic extraction module consisting of a ConvLSTM convolution long-term and short-term memory network and a full connection layer;
step 2.5.1: for the K frame full connection layer { LastVector after the feature fusion k I K =1,2, \8230, K } performs dimensionality increasing operation to obtain a network input vector after dimensionality increasing;
step 2.5.2: initializing a weight value of the ConvLSTM convolution long-term and short-term memory network by using Gaussian distribution;
step 2.5.3: extracting sequence feature information of the network input vector after the dimension is increased by using the initialized ConvLSTM convolution long-short term memory network, thereby obtaining a time sequence feature map F';
step 2.5.4: performing dimensionality reduction on the time sequence feature map F' to obtain a full connection layer vector;
step 2.5.5: connecting a fully connected output layer vector' which is the same as the dimension A of the number of the types of the ID information in all the labels in the footprint image dataset behind the fully connected output layer vector;
step (ii) of2.5.6: connecting the fully-connected output layer vector' with a SoftMax function so as to form the fully-connected layer and correspondingly output a probability set { p 0 ,p 1 ,…,p a ,…,p A-1 }; selecting a maximum value p from the probability set max The corresponding subscript max is used as a label identified by the normalized kth frame footprint sequence sample Image (k');
step 2.5.7: and reversely propagating the probability set into the step-by-step footprint image retrieval model, and matching with an adaptive variable learning Rate L _ Rate and Cross Entropy loss Cross Engine, thereby updating the shared weight W m Weight w ik And bias term B m And obtaining an optimal track image retrieval model for realizing retrieval results of the ID information corresponding to different track images.
CN202010511912.0A 2020-06-08 2020-06-08 Footprint image retrieval method based on space-time motion and feature fusion Active CN111639719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010511912.0A CN111639719B (en) 2020-06-08 2020-06-08 Footprint image retrieval method based on space-time motion and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010511912.0A CN111639719B (en) 2020-06-08 2020-06-08 Footprint image retrieval method based on space-time motion and feature fusion

Publications (2)

Publication Number Publication Date
CN111639719A CN111639719A (en) 2020-09-08
CN111639719B true CN111639719B (en) 2023-04-07

Family

ID=72329676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010511912.0A Active CN111639719B (en) 2020-06-08 2020-06-08 Footprint image retrieval method based on space-time motion and feature fusion

Country Status (1)

Country Link
CN (1) CN111639719B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100429B (en) * 2020-09-27 2022-09-13 安徽大学 Footprint pressure image retrieval method
CN112464746B (en) * 2020-11-10 2023-09-12 清华苏州环境创新研究院 Water quality monitoring method and system for satellite image and machine learning
CN112257662A (en) * 2020-11-12 2021-01-22 安徽大学 Pressure footprint image retrieval system based on deep learning
CN112580577B (en) * 2020-12-28 2023-06-30 出门问问(苏州)信息科技有限公司 Training method and device for generating speaker image based on facial key points
CN113220926B (en) * 2021-05-06 2022-09-09 安徽大学 Footprint image retrieval method based on multi-scale local attention enhancement network
CN113191443B (en) * 2021-05-14 2023-06-13 清华大学深圳国际研究生院 Clothing classification and attribute identification method based on feature enhancement
CN112990171B (en) * 2021-05-20 2021-08-06 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113536989B (en) * 2021-06-29 2024-06-18 广州博通信息技术有限公司 Refrigerator frosting monitoring method and system based on frame-by-frame analysis of camera video
CN113656623A (en) * 2021-08-17 2021-11-16 安徽大学 Time sequence shift and multi-branch space-time enhancement network-based stepping footprint image retrieval method
CN114840700B (en) * 2022-05-30 2023-01-13 来也科技(北京)有限公司 Image retrieval method and device for realizing IA by combining RPA and AI and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229338A (en) * 2017-12-14 2018-06-29 华南理工大学 A kind of video behavior recognition methods based on depth convolution feature
CN109325546A (en) * 2018-10-19 2019-02-12 大连海事大学 A kind of combination footwork feature at time footprint recognition method
CN110378288A (en) * 2019-07-19 2019-10-25 合肥工业大学 A kind of multistage spatiotemporal motion object detection method based on deep learning
CN110956111A (en) * 2019-11-22 2020-04-03 苏州闪驰数控***集成有限公司 Artificial intelligence CNN, LSTM neural network gait recognition system
CN111177446A (en) * 2019-12-12 2020-05-19 苏州科技大学 Method for searching footprint image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776628B2 (en) * 2017-10-06 2020-09-15 Qualcomm Incorporated Video action localization from proposal-attention
US10957053B2 (en) * 2018-10-18 2021-03-23 Deepnorth Inc. Multi-object tracking using online metric learning with long short-term memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229338A (en) * 2017-12-14 2018-06-29 华南理工大学 A kind of video behavior recognition methods based on depth convolution feature
CN109325546A (en) * 2018-10-19 2019-02-12 大连海事大学 A kind of combination footwork feature at time footprint recognition method
CN110378288A (en) * 2019-07-19 2019-10-25 合肥工业大学 A kind of multistage spatiotemporal motion object detection method based on deep learning
CN110956111A (en) * 2019-11-22 2020-04-03 苏州闪驰数控***集成有限公司 Artificial intelligence CNN, LSTM neural network gait recognition system
CN111177446A (en) * 2019-12-12 2020-05-19 苏州科技大学 Method for searching footprint image

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Abnormal Gait Recognition Algorithm Based;JING GAO et al.;《IEEE Access》;20191029;全文 *
Gait-Based Human Identification by Combining Shallow Convolutional Neural Network-Stacked;GANBAYAR BATCHULUUN et al.;《IEEE Access》;20181022;全文 *
一种基于CNN的足迹图像检索与匹配方法;陈扬等;《南京师范大学学报(工程技术版)》;20180930;第18卷(第3期);全文 *
基于深度学习的步态识别方法研究;冯雨欣;《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》;20200215;第2020年卷(第02期);全文 *
基于稀疏表示的足迹花纹图像检索算法研究;张浩;《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》;20170715;第2017年卷(第07期);全文 *

Also Published As

Publication number Publication date
CN111639719A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
CN111639719B (en) Footprint image retrieval method based on space-time motion and feature fusion
CN111860612B (en) Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN111738124B (en) Remote sensing image cloud detection method based on Gabor transformation and attention
CN109948693B (en) Hyperspectral image classification method based on superpixel sample expansion and generation countermeasure network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109345508B (en) Bone age evaluation method based on two-stage neural network
CN112836773B (en) Hyperspectral image classification method based on global attention residual error network
CN109145992B (en) Hyperspectral image classification method for cooperatively generating countermeasure network and spatial spectrum combination
CN108648191B (en) Pest image recognition method based on Bayesian width residual error neural network
Lin et al. Hyperspectral image denoising via matrix factorization and deep prior regularization
CN104462494B (en) A kind of remote sensing image retrieval method and system based on unsupervised feature learning
CN111914611B (en) Urban green space high-resolution remote sensing monitoring method and system
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN106407986A (en) Synthetic aperture radar image target identification method based on depth model
CN110458192B (en) Hyperspectral remote sensing image classification method and system based on visual saliency
CN111090764B (en) Image classification method and device based on multitask learning and graph convolution neural network
CN113705580B (en) Hyperspectral image classification method based on deep migration learning
CN113344045B (en) Method for improving SAR ship classification precision by combining HOG characteristics
Hu et al. Classification of PolSAR images based on adaptive nonlocal stacked sparse autoencoder
CN112633386A (en) SACVAEGAN-based hyperspectral image classification method
CN114692732B (en) Method, system, device and storage medium for updating online label
Lin et al. Determination of the varieties of rice kernels based on machine vision and deep learning technology
CN112950780A (en) Intelligent network map generation method and system based on remote sensing image
CN114937173A (en) Hyperspectral image rapid classification method based on dynamic graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant