CN111339988B - Video face recognition method based on dynamic interval loss function and probability characteristic - Google Patents

Video face recognition method based on dynamic interval loss function and probability characteristic Download PDF

Info

Publication number
CN111339988B
CN111339988B CN202010166807.8A CN202010166807A CN111339988B CN 111339988 B CN111339988 B CN 111339988B CN 202010166807 A CN202010166807 A CN 202010166807A CN 111339988 B CN111339988 B CN 111339988B
Authority
CN
China
Prior art keywords
face
feature
uncertainty
face recognition
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010166807.8A
Other languages
Chinese (zh)
Other versions
CN111339988A (en
Inventor
柯逍
郑毅腾
朱敏琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010166807.8A priority Critical patent/CN111339988B/en
Publication of CN111339988A publication Critical patent/CN111339988A/en
Application granted granted Critical
Publication of CN111339988B publication Critical patent/CN111339988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video face recognition method based on a dynamic interval loss function and probability characteristics, which comprises the following steps: step S1: training a recognition network through a face recognition training set; step S2: adopting a trained recognition network as a feature extraction module, and training an uncertainty module through the same training set; and step S3: aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain aggregated features; and step S4: and comparing the aggregated features by using the mutual likelihood fraction to complete the final recognition. The method can effectively identify the face in the video.

Description

Video face recognition method based on dynamic interval loss function and probability characteristic
Technical Field
The invention relates to the field of pattern recognition and computer vision, in particular to a video face recognition method based on a dynamic interval loss function and probability characteristics.
Background
In recent years, deep convolutional neural networks have been successful in the field of computer vision, and the face recognition method based on deep learning also utilizes the advantages of the deep convolutional neural networks in the aspect of feature extraction, and continuously creates new records on public data sets and develops greatly. In addition, there are also an increasing number of researchers publishing papers related to face recognition in various computer vision meetings. Because face recognition has a wide application field and a great commercial value, new face recognition technology is continuously explored in both academic and industrial fields, and in recent years, with the help of a great breakthrough of deep learning and a convolutional neural network in the field of computer vision, face recognition algorithms continuously refresh records on various public reference data sets and generate a plurality of standard products in the industrial field.
Although the face recognition technology has made great progress, many challenges are faced in real environment, and many factors such as illumination, posture, shading, age and the like affect the performance of face recognition.
Disclosure of Invention
The invention aims to provide a video face recognition method based on a dynamic interval loss function and probability characteristics, which can effectively recognize faces in a video.
In order to realize the purpose, the invention adopts the technical scheme that: a video face recognition method based on a dynamic interval loss function and probability characteristics comprises the following steps:
step S1: training a recognition network through a face recognition training set;
step S2: adopting a trained recognition network as a feature extraction module, and training an uncertainty module through the same training set;
and step S3: aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain aggregated features;
and step S4: and comparing the aggregated features by using the mutual likelihood fraction to complete the final recognition.
Further, the step S1 specifically includes the following steps:
step S11: acquiring a public face recognition training set from a network, and acquiring related labels of training data;
step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value from pixel values of all input face images, and normalizing;
step S13: adopting 18 layers of ResNet as a network model for extracting the depth features of the human face, and replacing the first 7 multiplied by 7 convolution kernels with 3 multiplied by 3 convolution kernels; meanwhile, the step size of the first convolutional layer is set to 1, so that the output size of the last feature map is kept to be 7 × 7; in addition, the path where the identity mapping is located is set as an average pooling with step size of 2 followed by a 1 × 1 convolution with step size of 1 to prevent information loss; finally, the convolution layer with the size of 7 multiplied by 7 is adopted to replace an average pooling layer, and the final face feature x is output i
Step S14: let D = { D 1 ,d 2 ,...,d N Is the face image in the test set, d i For the ith human face image, E (-) is a deep convolutional neural network model for extracting depth features, x i =E(d i ) For the feature corresponding to the ith human face image, the depth feature x is used i Taking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth category i,j And inputting the classification probability P into a Softmax activation function i,j The calculation formula is as follows:
Figure BDA0002407730250000021
wherein C is the total number of categories and k is the subscript of different categories;
step S15: let y i Label corresponding to ith data,
Figure BDA0002407730250000022
As a depth feature x i And the corresponding class weight vector->
Figure BDA0002407730250000023
Angle therebetween, by>
Figure BDA0002407730250000024
In connection with>
Figure BDA0002407730250000025
The point with the maximum rate of change in the function curve of (4) is taken as a reference point, and the point is compared with
Figure BDA0002407730250000026
Is correlated, i.e. when the dynamic interval parameter for the ith sample is set->
Figure BDA0002407730250000027
Thereafter, it is taken up>
Figure BDA0002407730250000028
In connection with>
Figure BDA0002407730250000029
Curve of function of (a) at theta m The absolute value of the derivative reaches a maximum, where θ m For reference points in which the derivative of the functional curve is greatest, a dynamic interval parameter>
Figure BDA00024077302500000210
The calculation formula of (a) is as follows:
Figure BDA00024077302500000211
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure BDA00024077302500000212
the total score of all other categories except the category of the user is obtained;
step S16: obtaining a classification probability P i,j And dynamic interval parameter
Figure BDA00024077302500000213
Then, the predicted classification probability P is calculated by using a cross entropy loss function i And true probability Q i Difference between them and obtaining the loss value L CE (x i ) The calculation formula is as follows:
Figure BDA0002407730250000031
and then updating the network parameters by using a gradient descent and back propagation algorithm.
Further, the step S2 specifically includes the following steps:
step S21: taking the face recognition model trained in the step S1 as a feature extraction model, and extracting the depth feature x of each face image from the same training data set i Outputting the corresponding last characteristic diagram as the input of the uncertainty module;
step S22: the uncertainty module is a shallow neural network model which comprises two full connection layers, relu is used as an activation function, a batch normalization layer is inserted between the full connection layers and the activation function for input normalization operation, and finally an exponential function is used as the activation function to output uncertainty sigma corresponding to each face image i Which is related to the depth feature x i Having the same dimension, representing the variance of the corresponding feature in the feature space;
step S23: calculating a mutual likelihood fraction s (x) between any two samples using a function i ,x j ):
Figure BDA0002407730250000032
Wherein
Figure BDA0002407730250000033
And &>
Figure BDA0002407730250000034
Respectively representing the values of the characteristic mean value mu and the characteristic variance sigma in the ith dimension, wherein h is the dimension of the human face characteristic; />
Step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batch pair
Figure BDA0002407730250000035
Where R is the set of face pairs of all the same person and s (-) is a computation function of mutual likelihood scores used to compute the mutual likelihood scores between two face pairs, the goal of the loss function being to maximize the mutual likelihood score values between all the face pairs of the same person.
Further, the specific method of step S3 is:
deep face feature x output by feature extraction network i Reflects the most likely feature representation of the input face image, while the output σ of the uncertainty module i Then it represents the uncertainty, σ, of the feature in each dimension i Varies with the image quality, σ i Reflecting the importance of the corresponding depth feature in the entire set of input video images, as a weight for depth feature x i Performing weighted fusion, and obtaining the fused feature a i The calculation is as follows:
Figure BDA0002407730250000041
wherein M is the number of samples in a batch;
and fusing the uncertainties corresponding to the features by adopting a minimum uncertainty method, namely, taking the minimum value of each dimension as a final vector for all uncertainty vectors in the set.
Further, in the step S4, for the input feature x i And corresponding uncertainty σ i Comparing by using the mutual likelihood scores, specifically comprising the following steps:
step S41: performing ten-fold cross validation on the trained model on a validation set to obtain final average accuracy, traversing possible thresholds on each fold, and taking the threshold which enables the final accuracy to be highest as a comparison threshold t;
step S42: let G = { G 1 ,g 2 ,...,g M The feature x of a tested face image is taken as the face image in the database i And the face image characteristics x of each person in G j Comparing, and adopting a nearest neighbor and threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features x by using a trained feature extraction model and an uncertainty module i And the corresponding uncertainty σ i Calculating a mutual likelihood score, if the score is larger than a comparison threshold t, the person is regarded as the same person, otherwise, the person is regarded as a different person; and traversing each image in the database to obtain a final recognition result.
Compared with the prior art, the invention has the following beneficial effects:
1. the face recognition method and the face recognition device can effectively recognize the face in the video, improve the accuracy of face recognition and reduce the influence of image quality on face recognition.
2. The constraint can be gradually enhanced in the model training process, and the generalization of the features is improved.
3. Aiming at the problem that interval parameters are difficult to select in the traditional interval-based loss function, the loss function based on the dynamic interval is provided. The loss function does not need to adjust parameters of the interval, and can adaptively adjust the size of the interval according to different data sets and different network structures, so as to control the gradient size of each sample in a fine-grained manner. In addition, the constraint strength can be gradually increased in the training process along with the convergence of the model, so that the model can continuously receive effective gradients and update parameters, and the judgment of final characteristics is improved.
4. The method utilizes the uncertainty of the pre-training network learning characteristics, fuses set characteristics by the uncertainty, finally compares the fused characteristics by adopting mutual likelihood scores, and can effectively improve the face recognition effect in the non-limited scene.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present invention provides a video face recognition method based on dynamic interval loss function and probability feature, comprising the following steps:
step S1: and training the recognition network through a face recognition training set. The method specifically comprises the following steps:
step S11: and acquiring a public face recognition training set from the network, and acquiring related labels of training data.
Step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value of 127.5 from pixel values of all input face images, and dividing by 128 for normalization.
Step S13: adopting 18 layers of ResNet as a network model for extracting the depth features of the human face, and replacing the first 7 multiplied by 7 convolution kernels with 3 multiplied by 3 convolution kernels; meanwhile, the step size of the first convolutional layer is changed from 2 to 1, so that the output size of the last characteristic diagram is kept to be 7 × 7; in addition, the path where the identity mapping is located is changed into an average pooling with the step length of 2, and then is connected with a 1 × 1 convolution with the step length of 1, so that information loss is prevented; finally, the convolution layer with the size of 7 multiplied by 7 is adopted to replace an average pooling layer, and the final face feature x is output i
Step S14: let D = { D 1 ,d 2 ,...,d N The face image in the test set, d i For the ith human face image, E (-) is a deep convolutional neural network model for extracting depth features, x i =E(d i ) For the feature corresponding to the ith human face image, the depth feature x is used i Taking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth category i,j And inputting the classification probability P into a Softmax activation function to generate a classification probability i,j The calculation formula is as follows:
Figure BDA0002407730250000061
where C is the total number of categories and k is the subscript of the different categories.
Step S15: let y i Is the label corresponding to the ith data,
Figure BDA0002407730250000062
as a depth feature x i And the corresponding class weight vector->
Figure BDA0002407730250000063
Angle therebetween, by>
Figure BDA0002407730250000064
About>
Figure BDA0002407730250000065
The point with the maximum rate of change in the function curve of (4) is taken as a reference point, and the point is compared with
Figure BDA0002407730250000066
Is correlated, i.e. when the dynamic interval parameter for the ith sample is set->
Figure BDA0002407730250000067
Thereafter, it is taken up>
Figure BDA0002407730250000068
About>
Figure BDA0002407730250000069
Curve of function of (a) at theta m The absolute value of the derivative reaches a maximum, where θ m Reference point for maximizing the derivative of the functional curve, P (θ) m ) Close to 0.5, in the early part of the training, is/are present>
Figure BDA00024077302500000610
Relatively large, in order to be able to provide suitable constraints on the optimization of the network, we limit the reference point θ m Is less than pi/4, a dynamic interval parameter>
Figure BDA00024077302500000611
The calculation formula of (c) is as follows:
Figure BDA00024077302500000612
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure BDA00024077302500000613
is the sum of scores of all other classes except the self classGenerally, one may be subtracted from the total number of categories.
Step S16: obtaining a classification probability P i,j And dynamic interval parameter
Figure BDA00024077302500000614
Then, the predicted classification probability P is calculated by using a cross entropy loss function i And true probability Q i Difference between them and obtaining the loss value L CE (x i ) The calculation formula is as follows:
Figure BDA00024077302500000615
and then updating the network parameters by using a gradient descent and back propagation algorithm.
Step S2: and training the uncertainty module by using the trained recognition network as a feature extraction module and through the same training set. The method specifically comprises the following steps:
step S21: taking the face recognition model trained in the step S1 as a feature extraction model, and extracting the depth feature x of each face image from the same training data set i And outputting the corresponding last feature map as the input of the uncertainty module.
Step S22: the uncertainty module is a shallow neural network model and comprises two full connection layers, relu is used as an activation function, a batch normalization layer is inserted between the full connection layers and the activation function for input normalization operation, and finally an index function is used as the activation function to output uncertainty sigma corresponding to each face image i Which is related to the depth feature x i Having the same dimension, represent the variance of the corresponding feature in the feature space.
Step S23: calculating a mutual likelihood fraction s (x) between any two samples using a function i ,x j ):
Figure BDA0002407730250000071
Wherein
Figure BDA0002407730250000072
And &>
Figure BDA0002407730250000073
Respectively representing the values of the characteristic mean value mu and the characteristic variance sigma in the ith dimension, and h is the dimension of the face characteristic; from the formula, it can be seen that if the depth feature x i And x j With a large uncertainty, the value of the mutual likelihood score will be low regardless of the distance between its features; the value of the mutual likelihood score will be high only if both inputs have little uncertainty and the corresponding means are very close.
Step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batch pair
Figure BDA0002407730250000074
Where R is the set of face pairs of all the same person and s (·,) is the computation function of the mutual likelihood scores that is used to compute the mutual likelihood scores between the two face pairs, the objective of the penalty function being to maximize the value of the mutual likelihood scores between all the face pairs of the same person.
And step S3: and aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain the aggregated features.
Deep face feature x output by feature extraction network i Reflecting the most probable feature representation of the input face image, and the output sigma of the uncertainty module i Then the uncertainty, σ, of the feature in each dimension is represented i Varies with the image quality, σ i Reflects the importance of the corresponding depth feature in the entire set of input video images and is therefore used as a weight for depth feature x i Performing weighted fusion, the fused feature a i Is calculated asThe following:
Figure BDA0002407730250000081
wherein M is the number of samples in a batch;
in order to compare the aggregated features in the testing stage, the uncertainty corresponding to the features is fused by adopting a minimum uncertainty method, namely, the minimum value of each dimension is taken as a final vector for all uncertainty vectors in the set.
And step S4: and comparing the aggregated features by adopting the mutual likelihood fraction instead of the cosine similarity to finish final identification.
In the testing phase, for the input feature x i And corresponding uncertainty σ i The mutual likelihood fraction is adopted to replace the cosine similarity for comparison, and the mutual likelihood fraction considers the influence of the quality of the input image on the characteristics at the same time, so that the influence of the poor image quality on the final recognition result can be more effectively inhibited; the method specifically comprises the following steps:
step S41: compared with cosine similarity, the value range of the mutual likelihood score is wider, so that the selection of the comparison threshold is more difficult. In order to effectively select the comparison threshold, the trained model is subjected to cross validation by ten folds on a validation set to obtain the final average accuracy, the possible thresholds are traversed on each fold, and the threshold which enables the final accuracy to be the highest is taken as the comparison threshold t.
Step S42: let G = { G 1 ,g 2 ,...,g M The feature x of a tested face image is taken as the face image in the database i And the face image characteristics x of each person in G j Comparing, and adopting a nearest neighbor method and a threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features x by using a trained feature extraction model and an uncertainty module i And the corresponding uncertainty σ i Calculating a mutual likelihood score, if the score is larger than a comparison threshold t,the users are considered to be the same person, otherwise, the users are considered to be different persons; and traversing each image in the database to obtain a final identification result.
The above are preferred embodiments of the present invention, and all changes made according to the technical solutions of the present invention that produce functional effects do not exceed the scope of the technical solutions of the present invention belong to the protection scope of the present invention.

Claims (3)

1. A video face recognition method based on dynamic interval loss function and probability characteristic is characterized by comprising the following steps:
step S1: training a face recognition network model through a face recognition training set;
step S2: adopting a trained face recognition network model as a feature extraction model, and training an uncertainty module through the same training set;
and step S3: aggregating the input video image set by using the learned uncertainty as the importance degree of the features to obtain the aggregated features;
and step S4: comparing the aggregated features by adopting a mutual likelihood score to complete final recognition;
the step S1 specifically includes the steps of:
step S11: acquiring a public face recognition training set from a network, and acquiring related labels of training data;
step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value from pixel values of all input face images, and normalizing;
step S13: adopting 18 layers of ResNet as a face recognition network model for face depth feature extraction, and replacing the first 7 x 7 convolution kernel by 3 x 3 convolution kernels; meanwhile, the step size of the first convolutional layer is set to 1, so that the output size of the last feature map is kept to be 7 × 7; the average pooling step size of the paths in which the identity maps are located is set to 21 × 1 convolution of 1 to prevent information loss; and finally, replacing the average pooling layer with the convolution layer with the size of 7 multiplied by 7, and outputting the final face depth feature x i
Step S14: let D = { D 1 ,d 2 ,...,d N The face image in the training set, d i For the ith face image, E (-) is the face recognition network model for extracting depth features, x i =E(d i ) For the depth feature corresponding to the ith human face image, the depth feature x is used i Taking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth category i,j And inputting the classification probability P into a Softmax activation function to generate a classification probability i,j The calculation formula is as follows:
Figure FDA0004036320450000011
wherein C is the total number of categories and k is the subscript of different categories;
step S15: let y i The label corresponding to the ith human face image,
Figure FDA0004036320450000012
as a depth feature x i And corresponding category weight vector>
Figure FDA0004036320450000013
Angle therebetween, by>
Figure FDA0004036320450000014
About>
Figure FDA0004036320450000015
The point with the maximum rate of change in the function curve of (2) is taken as a reference point, and the point is compared with
Figure FDA0004036320450000021
Correlation, i.e. when the dynamic interval parameter of the ith face image is set/>
Figure FDA0004036320450000022
Thereafter, it is taken up>
Figure FDA0004036320450000023
About>
Figure FDA0004036320450000024
Curve of function of (a) at theta m The absolute value of the derivative reaches a maximum, where θ m The dynamic spacing parameter @, which is the reference point at which the derivative of the functional curve is maximized>
Figure FDA0004036320450000025
The calculation formula of (a) is as follows:
Figure FDA0004036320450000026
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure FDA0004036320450000027
the total score of all other categories except the category of the user is obtained;
step S16: obtaining a classification probability P i,j And dynamic interval parameter
Figure FDA0004036320450000028
Then, the predicted classification probability P is calculated by using a cross entropy loss function i,j And true probability Q i,j Difference between them and obtaining the loss value L CE (x i ) The calculation formula is as follows:
Figure FDA0004036320450000029
then updating network parameters by using a gradient descent and back propagation algorithm;
the step S2 specifically includes the following steps:
step S21: taking the face recognition network model trained in the step S1 as a feature extraction model, extracting the depth feature of each face image on the same training set, and outputting the last feature image corresponding to the depth feature of each face image as the input of an uncertainty module;
step S22: the uncertainty module is a shallow neural network model, comprises two full connection layers, adopts Relu as an activation function, inserts a batch normalization layer between the full connection layers and the activation function for input normalization operation, and finally adopts an exponential function as the activation function to output the uncertainty corresponding to each face image, wherein the uncertainty has the same dimensionality with the depth feature and represents the variance of the corresponding feature in a feature space;
step S23: calculating a mutual likelihood fraction s (x) between any two samples using a function i ,x b ):
Figure FDA00040363204500000210
Wherein
Figure FDA00040363204500000211
And &>
Figure FDA00040363204500000212
Respectively representing depth features x i And uncertainty σ i H is the dimension of the face feature;
step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batch pair
Figure FDA0004036320450000031
Where R is the set of face pairs of all the same person and s (·,) is the computation function of the mutual likelihood scores that is used to compute the mutual likelihood scores between the two face pairs, the objective of the penalty function being to maximize the value of the mutual likelihood scores between all the face pairs of the same person.
2. The video face recognition method based on the dynamic interval loss function and the probability feature of claim 1, wherein the specific method of the step S3 is as follows:
depth feature x output by face recognition network model k Reflecting the most probable feature representation of the input face image, and the output sigma of the uncertainty module k Then the uncertainty, σ, of the feature in each dimension is represented k Varies with the image quality, σ k Reflecting the importance of the corresponding depth feature in the entire set of input video images, as a weight for depth feature x k Carrying out weighted aggregation, feature a after aggregation f The calculation is as follows:
Figure FDA0004036320450000032
wherein M is the number of samples in a batch;
and aggregating the uncertainties corresponding to the features by adopting a minimum uncertainty method, namely taking the minimum value of each dimension as a final vector for all uncertainty vectors in the set.
3. The video face recognition method based on the dynamic interval loss function and the probability feature of claim 2, wherein the step S4 specifically comprises the following steps:
step S41: performing ten-fold cross validation on the trained model on a validation set to obtain final average accuracy, traversing possible thresholds on each fold, and taking the threshold which enables the final accuracy to be highest as a comparison threshold t;
step S42: let G = { G 1 ,g 2 ,...,g Z Comparing a tested face depth feature with the face depth features of each person in the G, and adopting a nearest neighbor and threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features and corresponding uncertainties by using a trained feature extraction model and an uncertainty module, calculating mutual likelihood scores of the aggregated features, and if the scores are greater than a comparison threshold t, the features are considered as the same person, otherwise, the features are considered as different persons; and traversing each image in the database to obtain a final recognition result.
CN202010166807.8A 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic Active CN111339988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010166807.8A CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010166807.8A CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Publications (2)

Publication Number Publication Date
CN111339988A CN111339988A (en) 2020-06-26
CN111339988B true CN111339988B (en) 2023-04-07

Family

ID=71182200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010166807.8A Active CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Country Status (1)

Country Link
CN (1) CN111339988B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116547A (en) * 2020-08-19 2020-12-22 南京航空航天大学 Feature map aggregation method for unconstrained video face recognition
CN112906810B (en) * 2021-03-08 2024-04-16 共达地创新技术(深圳)有限公司 Target detection method, electronic device, and storage medium
CN113033345B (en) * 2021-03-10 2024-02-20 南京航空航天大学 V2V video face recognition method based on public feature subspace
CN113378660B (en) * 2021-05-25 2023-11-07 广州紫为云科技有限公司 Face recognition method and device with low data cost
CN113239866B (en) * 2021-05-31 2022-12-13 西安电子科技大学 Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN113205082B (en) * 2021-06-22 2021-10-15 中国科学院自动化研究所 Robust iris identification method based on acquisition uncertainty decoupling
CN113688708A (en) * 2021-08-12 2021-11-23 北京数美时代科技有限公司 Face recognition method, system and storage medium based on probability characteristics
CN113705647B (en) * 2021-08-19 2023-04-28 电子科技大学 Dual semantic feature extraction method based on dynamic interval
CN113792701A (en) * 2021-09-24 2021-12-14 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition;Mengyi Liu et al.;《 IEEE Transactions on Image Processing》;20161005;第25卷(第12期);全文 *
基于改进型加性余弦间隔损失函数的深度学***等;《传感技术学报》;20191231(第12期);全文 *
基于视频场景深度学习的人物语义识别模型;高翔等;《计算机技术与发展》;20180207(第06期);全文 *
基于附加间隔Softmax特征的人脸聚类算法;王锟朋等;《计算机应用与软件》;20200212(第02期);全文 *
支持向量机在机器学习中的应用研究;罗瑜;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20080615(第06期);全文 *

Also Published As

Publication number Publication date
CN111339988A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111339988B (en) Video face recognition method based on dynamic interval loss function and probability characteristic
CN108647583B (en) Face recognition algorithm training method based on multi-target learning
CN106326886B (en) Finger vein image quality appraisal procedure based on convolutional neural networks
US7295687B2 (en) Face recognition method using artificial neural network and apparatus thereof
CN109711254B (en) Image processing method and device based on countermeasure generation network
CN106372581B (en) Method for constructing and training face recognition feature extraction network
Vignolo et al. Feature selection for face recognition based on multi-objective evolutionary wrappers
CN108427921A (en) A kind of face identification method based on convolutional neural networks
CN101464950B (en) Video human face identification and retrieval method based on on-line learning and Bayesian inference
JP2017517076A (en) Face authentication method and system
KR102036957B1 (en) Safety classification method of the city image using deep learning-based data feature
CN108520213B (en) Face beauty prediction method based on multi-scale depth
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN102867191A (en) Dimension reducing method based on manifold sub-space study
CN112800876A (en) Method and system for embedding hypersphere features for re-identification
CN109726703B (en) Face image age identification method based on improved ensemble learning strategy
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN108229432A (en) Face calibration method and device
Zuobin et al. Feature regrouping for cca-based feature fusion and extraction through normalized cut
CN109543637A (en) A kind of face identification method, device, equipment and readable storage medium storing program for executing
Wang et al. Occluded person re-identification via defending against attacks from obstacles
CN112084895A (en) Pedestrian re-identification method based on deep learning
Cheng et al. Student action recognition based on deep convolutional generative adversarial network
CN112836629A (en) Image classification method
Ahmad et al. Deep convolutional neural network using triplet loss to distinguish the identical twins

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant