CN111339988A - Video face recognition method based on dynamic interval loss function and probability characteristic - Google Patents

Video face recognition method based on dynamic interval loss function and probability characteristic Download PDF

Info

Publication number
CN111339988A
CN111339988A CN202010166807.8A CN202010166807A CN111339988A CN 111339988 A CN111339988 A CN 111339988A CN 202010166807 A CN202010166807 A CN 202010166807A CN 111339988 A CN111339988 A CN 111339988A
Authority
CN
China
Prior art keywords
feature
face
uncertainty
function
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010166807.8A
Other languages
Chinese (zh)
Other versions
CN111339988B (en
Inventor
柯逍
郑毅腾
朱敏琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010166807.8A priority Critical patent/CN111339988B/en
Publication of CN111339988A publication Critical patent/CN111339988A/en
Application granted granted Critical
Publication of CN111339988B publication Critical patent/CN111339988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video face recognition method based on a dynamic interval loss function and probability characteristics, which comprises the following steps: step S1: training a recognition network through a face recognition training set; step S2: adopting a trained recognition network as a feature extraction module, and training an uncertainty module through the same training set; step S3: aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain aggregated features; step S4: and comparing the aggregated features by using the mutual likelihood fraction to complete the final recognition. The method can effectively identify the face in the video.

Description

Video face recognition method based on dynamic interval loss function and probability characteristic
Technical Field
The invention relates to the field of pattern recognition and computer vision, in particular to a video face recognition method based on a dynamic interval loss function and probability characteristics.
Background
In recent years, the deep convolutional neural network has achieved great success in the field of computer vision, and the face recognition method based on deep learning also utilizes the advantages of the deep convolutional neural network in the aspect of feature extraction, and continuously creates new records on public data sets and has greatly developed. In addition, there are also an increasing number of researchers in various computer vision meetings that issue papers related to face recognition. Because face recognition has a wide application field and a great commercial value, new face recognition technology is continuously explored in both academic and industrial fields, and in recent years, with the help of a great breakthrough of deep learning and a convolutional neural network in the field of computer vision, face recognition algorithms continuously refresh records on various public reference data sets and generate a plurality of standard products in the industrial field.
Although the face recognition technology has advanced greatly, many challenges are faced in real environment, and factors such as light, posture, shading, age, etc. affect the performance of face recognition.
Disclosure of Invention
The invention aims to provide a video face recognition method based on a dynamic interval loss function and probability characteristics, which can effectively recognize faces in a video.
In order to achieve the purpose, the invention adopts the technical scheme that: a video face recognition method based on a dynamic interval loss function and probability characteristics comprises the following steps:
step S1: training a recognition network through a face recognition training set;
step S2: adopting a trained recognition network as a feature extraction module, and training an uncertainty module through the same training set;
step S3: aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain aggregated features;
step S4: and comparing the aggregated features by using the mutual likelihood fraction to complete the final recognition.
Further, the step S1 specifically includes the following steps:
step S11: acquiring a public face recognition training set from a network, and acquiring related labels of training data;
step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value from pixel values of all input face images, and normalizing;
step S13, adopting 18 layers of ResNet as a network model for extracting the human face depth features, replacing the first convolution kernel of 7 × 7 with 3 convolution kernels of 3 × 3, setting the step size of the first convolution layer to be 1 to keep the output size of the last feature map to be 7 × 7, setting the path of the identity mapping to be an average pooling with the step size of 2 and then convolving the path with 1 of 1 × 1 with the step size of 1 to prevent information loss, and finally, adopting the convolution layer with the size of 7 × 7 to replace the average pooling layer to output the final human face feature xi
Step S14: let D ═ D1,d2,...,dNThe face image in the test set, diFor the ith face image, E (-) is a deep convolutional neural network model for extracting depth features, xi=E(di) For the feature corresponding to the ith human face image, the depth feature x is usediTaking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth categoryi,jAnd inputting the classification probability P into a Softmax activation function to generate a classification probabilityi,jThe calculation formula is as follows:
Figure BDA0002407730250000021
wherein C is the total number of categories and k is a subscript of different categories;
step S15: let yiIs the label corresponding to the ith data,
Figure BDA0002407730250000022
as a depth feature xiAnd corresponding class weight vector
Figure BDA0002407730250000023
Angle therebetween, adopt
Figure BDA0002407730250000024
About
Figure BDA0002407730250000025
The point with the maximum rate of change in the function curve of (2) is taken as a reference point, and the point is compared with
Figure BDA0002407730250000026
Correlation, i.e. when the dynamic interval parameter of the ith sample is set
Figure BDA0002407730250000027
After that time, the user can use the device,
Figure BDA0002407730250000028
about
Figure BDA0002407730250000029
Curve of function of (a) at thetamThe absolute value of the derivative reaches a maximum, where θmDynamic interval parameter for reference point to maximize derivative of function curve
Figure BDA00024077302500000210
The calculation formula of (a) is as follows:
Figure BDA00024077302500000211
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure BDA00024077302500000212
the total score of all other categories except the category of the user is obtained;
step S16: obtaining a classification probability Pi,jAnd dynamic interval parameter
Figure BDA00024077302500000213
Then, the predicted classification probability P is calculated by using a cross entropy loss functioniAnd true probability QiDifference between them and obtaining the loss value LCE(xi) The calculation formula is as follows:
Figure BDA0002407730250000031
and then updating the network parameters by using a gradient descent and back propagation algorithm.
Further, the step S2 specifically includes the following steps:
step S21: taking the face recognition model trained in the step S1 as a feature extraction model, and extracting the depth feature x of each face image from the same training data setiOutputting the corresponding last characteristic diagram as the input of the uncertainty module;
step S22: the uncertainty module is a shallow neural network model and comprises two full connection layers, Relu is used as an activation function, a batch normalization layer is inserted between the full connection layers and the activation function for input normalization operation, and finally an index function is used as the activation function to output uncertainty sigma corresponding to each face imageiWhich is related to the depth feature xiHaving the same dimension, representing the variance of the corresponding feature in the feature space;
step S23: calculating a mutual likelihood fraction s (x) between any two samples using a functioni,xj):
Figure BDA0002407730250000032
Wherein
Figure BDA0002407730250000033
And
Figure BDA0002407730250000034
respectively representing the values of the characteristic mean value mu and the characteristic variance sigma in the ith dimension, wherein h is the dimension of the human face characteristic;
step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batchpair
Figure BDA0002407730250000035
Where R is the set of face pairs of all the same person and s (-) is a computation function of mutual likelihood scores used to compute the mutual likelihood scores between two face pairs, the goal of the loss function being to maximize the mutual likelihood score values between all the face pairs of the same person.
Further, the specific method of step S3 is as follows:
deep face feature x output by feature extraction networkiReflects the most likely feature representation of the input face image, while the output σ of the uncertainty moduleiThen the uncertainty, σ, of the feature in each dimension is representediVaries with the image quality, σiReflecting the importance of the corresponding depth feature in the entire set of input video images, as a weight for depth feature xiPerforming weighted fusion, the fused feature aiThe calculation is as follows:
Figure BDA0002407730250000041
wherein M is the number of samples in a batch;
and fusing the uncertainties corresponding to the features by adopting a minimum uncertainty method, namely, taking the minimum value of each dimension as a final vector for all uncertainty vectors in the set.
Further, in the step S4, for the input feature xiAnd corresponding uncertainty σiBy using each otherComparing the likelihood scores, specifically comprising the following steps:
step S41: performing ten-fold cross validation on the trained model on a validation set to obtain final average accuracy, traversing possible thresholds on each fold, and taking the threshold with the highest final accuracy as a comparison threshold t;
step S42: let G be { G ═ G1,g2,...,gMThe feature x of a tested face image is taken as the face image in the databaseiAnd the facial image characteristics x of each person in GjComparing, and adopting a nearest neighbor method and a threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features x by using a trained feature extraction model and an uncertainty moduleiAnd the corresponding uncertainty σiCalculating a mutual likelihood score, and if the score is greater than a comparison threshold t, the person is considered to be the same person, otherwise, the person is considered to be different; and traversing each image in the database to obtain a final recognition result.
Compared with the prior art, the invention has the following beneficial effects:
1. the face recognition method and the face recognition device can effectively recognize the face in the video, improve the accuracy of face recognition and reduce the influence of image quality on face recognition.
2. The constraint can be gradually enhanced in the model training process, and the generalization of the features is improved.
3. Aiming at the problem that the interval parameter is difficult to select in the traditional interval-based loss function, the loss function based on the dynamic interval is provided. The loss function does not need to adjust parameters of the interval, and can adaptively adjust the size of the interval according to different data sets and different network structures, so as to control the gradient size of each sample in a fine-grained manner. In addition, the constraint strength can be gradually increased in the training process along with the convergence of the model, so that the model can continuously receive effective gradients and update parameters, and the judgment of final characteristics is improved.
4. The method utilizes the uncertainty of the pre-training network learning characteristics, fuses set characteristics by the uncertainty, finally compares the fused characteristics by adopting mutual likelihood scores, and can effectively improve the face recognition effect in the non-limited scene.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present invention provides a video face recognition method based on a dynamic interval loss function and probability features, which includes the following steps:
step S1: the recognition network is trained through a face recognition training set. The method specifically comprises the following steps:
step S11: and acquiring a public face recognition training set from the network, and acquiring related labels of training data.
Step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value of 127.5 from pixel values of all input face images, and dividing by 128 for normalization.
Step S13, adopting 18 layers of ResNet as a network model for extracting the human face depth features, replacing the first convolution kernel of 7 × 7 with 3 convolution kernels of 3 × 3, changing the step size of the first convolution layer from 2 to 1 to keep the output size of the last feature map to be 7 × 7, changing the path of the identity mapping to be an average pooling with the step size of 2 and then convolving with 1 × 1 with the step size of 1 to prevent information loss, and finally adopting the convolution layer with the size of 7 × 7 to replace the average pooling layer to output the final human face feature xi
Step S14: let D ═ D1,d2,...,dNThe face image in the test set, diFor the ith face image, E (-) is a deep convolutional neural network model for extracting depth features, xi=E(di) For the feature corresponding to the ith human face image, the depth feature x is usediTaking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth categoryi,jAnd inputting the classification probability P into a Softmax activation function to generate a classification probabilityi,jThe calculation formula is as follows:
Figure BDA0002407730250000061
where C is the total number of categories and k is the subscript of the different categories.
Step S15: let yiIs the label corresponding to the ith data,
Figure BDA0002407730250000062
as a depth feature xiAnd corresponding class weight vector
Figure BDA0002407730250000063
Angle therebetween, adopt
Figure BDA0002407730250000064
About
Figure BDA0002407730250000065
Curve of function (d)The point with the maximum change rate is taken as a reference point, and the point is compared with the reference point
Figure BDA0002407730250000066
Correlation, i.e. when the dynamic interval parameter of the ith sample is set
Figure BDA0002407730250000067
After that time, the user can use the device,
Figure BDA0002407730250000068
about
Figure BDA0002407730250000069
Curve of function of (a) at thetamThe absolute value of the derivative reaches a maximum, where θmP (θ) as a reference point for maximizing the derivative of the functional curvem) Close to 0.5, in the early stages of training,
Figure BDA00024077302500000610
relatively large, in order to be able to provide suitable constraints on the optimization of the network, we limit the reference point θmIs less than pi/4, dynamic interval parameter
Figure BDA00024077302500000611
The calculation formula of (a) is as follows:
Figure BDA00024077302500000612
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure BDA00024077302500000613
the sum of all other category scores, except for the own category, can generally be taken as the total number of categories minus one.
Step S16: obtaining a classification probability Pi,jAnd dynamic interval parameter
Figure BDA00024077302500000614
Thereafter, using crossoversCalculating and predicting classification probability P by using entropy loss functioniAnd true probability QiDifference between them and obtaining the loss value LCE(xi) The calculation formula is as follows:
Figure BDA00024077302500000615
and then updating the network parameters by using a gradient descent and back propagation algorithm.
Step S2: and training the uncertainty module by using the trained recognition network as a feature extraction module and through the same training set. The method specifically comprises the following steps:
step S21: taking the face recognition model trained in the step S1 as a feature extraction model, and extracting the depth feature x of each face image from the same training data setiAnd outputting the corresponding last feature map as the input of the uncertainty module.
Step S22: the uncertainty module is a shallow neural network model and comprises two full connection layers, Relu is used as an activation function, a batch normalization layer is inserted between the full connection layers and the activation function for input normalization operation, and finally an index function is used as the activation function to output uncertainty sigma corresponding to each face imageiWhich is related to the depth feature xiHaving the same dimension, represent the variance of the corresponding feature in the feature space.
Step S23: calculating a mutual likelihood fraction s (x) between any two samples using a functioni,xj):
Figure BDA0002407730250000071
Wherein
Figure BDA0002407730250000072
And
Figure BDA0002407730250000073
respectively representing the feature mean value mu and the feature squareThe value in the ith dimension of the difference sigma, h being the dimension of the face feature; as can be seen from the formula, if the depth feature xiAnd xjWith a large uncertainty, the value of the mutual likelihood score will be low regardless of the distance between its features; the value of the mutual likelihood score will be high only if both inputs have little uncertainty and the corresponding means are very close.
Step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batchpair
Figure BDA0002407730250000074
Where R is the set of face pairs of all the same person and s (-) is a computation function of mutual likelihood scores used to compute the mutual likelihood scores between two face pairs, the goal of the loss function being to maximize the mutual likelihood score values between all the face pairs of the same person.
Step S3: and aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain the aggregated features.
Deep face feature x output by feature extraction networkiReflects the most likely feature representation of the input face image, while the output σ of the uncertainty moduleiThen the uncertainty, σ, of the feature in each dimension is representediVaries with the image quality, σiReflects the importance of the corresponding depth feature in the entire set of input video images and is therefore used as a weight for depth feature xiPerforming weighted fusion, the fused feature aiThe calculation is as follows:
Figure BDA0002407730250000081
wherein M is the number of samples in a batch;
in order to compare the aggregated features in the testing stage, the uncertainty corresponding to the features is fused by adopting a minimum uncertainty method, namely, the minimum value of each dimension is taken as a final vector for all uncertainty vectors in the set.
Step S4: and comparing the aggregated features by adopting the mutual likelihood fraction instead of the cosine similarity to finish final identification.
In the testing phase, for the input feature xiAnd corresponding uncertainty σiThe mutual likelihood fraction is adopted to replace the cosine similarity for comparison, and the mutual likelihood fraction considers the influence of the quality of the input image on the characteristics at the same time, so that the influence of the poor image quality on the final recognition result can be more effectively inhibited; the method specifically comprises the following steps:
step S41: compared with cosine similarity, the value range of the mutual likelihood fraction is wider, so that the selection of the comparison threshold value is more difficult. In order to effectively select the comparison threshold, the trained model is subjected to cross validation by ten folds on a validation set to obtain the final average accuracy, the possible thresholds are traversed on each fold, and the threshold which enables the final accuracy to be the highest is taken as the comparison threshold t.
Step S42: let G be { G ═ G1,g2,...,gMThe feature x of a tested face image is taken as the face image in the databaseiAnd the facial image characteristics x of each person in GjComparing, and adopting a nearest neighbor method and a threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features x by using a trained feature extraction model and an uncertainty moduleiAnd the corresponding uncertainty σiCalculating a mutual likelihood score, and if the score is greater than a comparison threshold t, the person is considered to be the same person, otherwise, the person is considered to be different; and traversing each image in the database to obtain a final recognition result.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (5)

1. A video face recognition method based on a dynamic interval loss function and probability features is characterized by comprising the following steps:
step S1: training a recognition network through a face recognition training set;
step S2: adopting a trained recognition network as a feature extraction module, and training an uncertainty module through the same training set;
step S3: aggregating the input video feature set by using the learned uncertainty as the importance degree of the features to obtain aggregated features;
step S4: and comparing the aggregated features by using the mutual likelihood fraction to complete the final recognition.
2. The video face recognition method based on the dynamic interval loss function and the probability feature of claim 1, wherein the step S1 specifically includes the following steps:
step S11: acquiring a public face recognition training set from a network, and acquiring related labels of training data;
step S12: outputting positions of a face bounding box and key points of a face by adopting a pre-trained Retina face detection model for face images in a face recognition training set, aligning the face by applying similarity transformation, subtracting a mean value from pixel values of all input face images, and normalizing;
step S13, adopting 18 layers of ResNet as a network model for extracting the human face depth features, replacing the first convolution kernel of 7 × 7 with 3 convolution kernels of 3 × 3, setting the step size of the first convolution layer to be 1 to keep the output size of the last feature map to be 7 × 7, setting the path of the identity mapping to be an average pooling with the step size of 2 and then convolving the path with 1 of 1 × 1 with the step size of 1 to prevent information loss, and finally, adopting the convolution layer with the size of 7 × 7 to replace the average pooling layer to output the final human face feature xi
Step S14: let D ═ D1,d2,...,dNIs a test setFace image of (1), diFor the ith face image, E (-) is a deep convolutional neural network model for extracting depth features, xi=E(di) For the feature corresponding to the ith human face image, the depth feature x is usediTaking dot product with the jth column of the last full connection layer W to obtain the fraction z of the jth categoryi,jAnd inputting the classification probability P into a Softmax activation function to generate a classification probabilityi,jThe calculation formula is as follows:
Figure FDA0002407730240000011
wherein C is the total number of categories and k is a subscript of different categories;
step S15: let yiIs the label corresponding to the ith data,
Figure FDA0002407730240000012
as a depth feature xiAnd corresponding class weight vector
Figure FDA0002407730240000013
Angle therebetween, adopt
Figure FDA0002407730240000021
About
Figure FDA00024077302400000214
The point with the maximum rate of change in the function curve of (2) is taken as a reference point, and the point is compared with
Figure FDA0002407730240000022
Correlation, i.e. when the dynamic interval parameter of the ith sample is set
Figure FDA0002407730240000023
After that time, the user can use the device,
Figure FDA0002407730240000024
about
Figure FDA0002407730240000025
Curve of function of (a) at thetamThe absolute value of the derivative reaches a maximum, where θmDynamic interval parameter for reference point to maximize derivative of function curve
Figure FDA0002407730240000026
The calculation formula of (a) is as follows:
Figure FDA0002407730240000027
where v is the corresponding scaling parameter used to prevent the classification probability from falling outside the desired range,
Figure FDA0002407730240000028
the total score of all other categories except the category of the user is obtained;
step S16: obtaining a classification probability Pi,jAnd dynamic interval parameter
Figure FDA0002407730240000029
Then, the predicted classification probability P is calculated by using a cross entropy loss functioniAnd true probability QiDifference between them and obtaining the loss value LCE(xi) The calculation formula is as follows:
Figure FDA00024077302400000210
and then updating the network parameters by using a gradient descent and back propagation algorithm.
3. The video face recognition method based on the dynamic interval loss function and the probability feature of claim 2, wherein the step S2 specifically includes the following steps:
step S21: taking the face recognition model trained in the step S1 as a feature extraction model, and performing feature extraction on the same face recognition modelExtracting depth characteristic x of each face image from training data setiOutputting the corresponding last characteristic diagram as the input of the uncertainty module;
step S22: the uncertainty module is a shallow neural network model and comprises two full connection layers, Relu is used as an activation function, a batch normalization layer is inserted between the full connection layers and the activation function for input normalization operation, and finally an index function is used as the activation function to output uncertainty sigma corresponding to each face imageiWhich is related to the depth feature xiHaving the same dimension, representing the variance of the corresponding feature in the feature space;
step S23: calculating a mutual likelihood fraction s (x) between any two samples using a functioni,xj):
Figure FDA00024077302400000211
Wherein
Figure FDA00024077302400000212
And
Figure FDA00024077302400000213
respectively representing the values of the characteristic mean value mu and the characteristic variance sigma in the ith dimension, wherein h is the dimension of the human face characteristic;
step S24: calculating the final loss L by adopting the following function according to the distribution condition of the face images in one batchpair
Figure FDA0002407730240000031
Where R is the set of face pairs of all the same person and s (-) is a computation function of mutual likelihood scores used to compute the mutual likelihood scores between two face pairs, the goal of the loss function being to maximize the mutual likelihood score values between all the face pairs of the same person.
4. The video face recognition method based on the dynamic interval loss function and the probability feature of claim 3, wherein the specific method of the step S3 is as follows:
deep face feature x output by feature extraction networkiReflects the most likely feature representation of the input face image, while the output σ of the uncertainty moduleiThen the uncertainty, σ, of the feature in each dimension is representediVaries with the image quality, σiReflecting the importance of the corresponding depth feature in the entire set of input video images, as a weight for depth feature xiPerforming weighted fusion, the fused feature aiThe calculation is as follows:
Figure FDA0002407730240000032
wherein M is the number of samples in a batch;
and fusing the uncertainties corresponding to the features by adopting a minimum uncertainty method, namely, taking the minimum value of each dimension as a final vector for all uncertainty vectors in the set.
5. The method for video face recognition based on dynamic interval loss function and probability feature of claim 4, wherein in step S4, for the input feature xiAnd corresponding uncertainty σiComparing by using the mutual likelihood scores, specifically comprising the following steps:
step S41: performing ten-fold cross validation on the trained model on a validation set to obtain final average accuracy, traversing possible thresholds on each fold, and taking the threshold with the highest final accuracy as a comparison threshold t;
step S42: let G be { G ═ G1,g2,...,gMThe feature x of a tested face image is taken as the face image in the databaseiAnd the facial image characteristics x of each person in GjMaking a comparison, anAdopting a nearest neighbor method and a threshold value method as a judgment basis; for the face images in the database G and the test set D, extracting corresponding depth features x by using a trained feature extraction model and an uncertainty moduleiAnd the corresponding uncertainty σiCalculating a mutual likelihood score, and if the score is greater than a comparison threshold t, the person is considered to be the same person, otherwise, the person is considered to be different; and traversing each image in the database to obtain a final recognition result.
CN202010166807.8A 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic Active CN111339988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010166807.8A CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010166807.8A CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Publications (2)

Publication Number Publication Date
CN111339988A true CN111339988A (en) 2020-06-26
CN111339988B CN111339988B (en) 2023-04-07

Family

ID=71182200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010166807.8A Active CN111339988B (en) 2020-03-11 2020-03-11 Video face recognition method based on dynamic interval loss function and probability characteristic

Country Status (1)

Country Link
CN (1) CN111339988B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116547A (en) * 2020-08-19 2020-12-22 南京航空航天大学 Feature map aggregation method for unconstrained video face recognition
CN112906810A (en) * 2021-03-08 2021-06-04 共达地创新技术(深圳)有限公司 Object detection method, electronic device, and storage medium
CN113033345A (en) * 2021-03-10 2021-06-25 南京航空航天大学 V2V video face recognition method based on public feature subspace
CN113205082A (en) * 2021-06-22 2021-08-03 中国科学院自动化研究所 Robust iris identification method based on acquisition uncertainty decoupling
CN113239866A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN113378660A (en) * 2021-05-25 2021-09-10 广州紫为云科技有限公司 Low-data-cost face recognition method and device
CN113688708A (en) * 2021-08-12 2021-11-23 北京数美时代科技有限公司 Face recognition method, system and storage medium based on probability characteristics
CN113705647A (en) * 2021-08-19 2021-11-26 电子科技大学 Dynamic interval-based dual semantic feature extraction method
CN113792701A (en) * 2021-09-24 2021-12-14 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MENGYI LIU ET AL.: "Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition", 《 IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
王锟朋等: "基于附加间隔Softmax特征的人脸聚类算法", 《计算机应用与软件》 *
章东平等: "基于改进型加性余弦间隔损失函数的深度学习人脸识别", 《传感技术学报》 *
罗瑜: "支持向量机在机器学习中的应用研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
高翔等: "基于视频场景深度学习的人物语义识别模型", 《计算机技术与发展》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116547A (en) * 2020-08-19 2020-12-22 南京航空航天大学 Feature map aggregation method for unconstrained video face recognition
CN112906810A (en) * 2021-03-08 2021-06-04 共达地创新技术(深圳)有限公司 Object detection method, electronic device, and storage medium
CN112906810B (en) * 2021-03-08 2024-04-16 共达地创新技术(深圳)有限公司 Target detection method, electronic device, and storage medium
CN113033345A (en) * 2021-03-10 2021-06-25 南京航空航天大学 V2V video face recognition method based on public feature subspace
CN113033345B (en) * 2021-03-10 2024-02-20 南京航空航天大学 V2V video face recognition method based on public feature subspace
CN113378660B (en) * 2021-05-25 2023-11-07 广州紫为云科技有限公司 Face recognition method and device with low data cost
CN113378660A (en) * 2021-05-25 2021-09-10 广州紫为云科技有限公司 Low-data-cost face recognition method and device
CN113239866A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN113205082A (en) * 2021-06-22 2021-08-03 中国科学院自动化研究所 Robust iris identification method based on acquisition uncertainty decoupling
CN113688708A (en) * 2021-08-12 2021-11-23 北京数美时代科技有限公司 Face recognition method, system and storage medium based on probability characteristics
CN113705647B (en) * 2021-08-19 2023-04-28 电子科技大学 Dual semantic feature extraction method based on dynamic interval
CN113705647A (en) * 2021-08-19 2021-11-26 电子科技大学 Dynamic interval-based dual semantic feature extraction method
CN113792701A (en) * 2021-09-24 2021-12-14 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111339988B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111339988B (en) Video face recognition method based on dynamic interval loss function and probability characteristic
CN108647583B (en) Face recognition algorithm training method based on multi-target learning
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
CN107529650B (en) Closed loop detection method and device and computer equipment
CN105138973B (en) The method and apparatus of face authentication
CN106295694B (en) Face recognition method for iterative re-constrained group sparse representation classification
CN108427921A (en) A kind of face identification method based on convolutional neural networks
US7295687B2 (en) Face recognition method using artificial neural network and apparatus thereof
US7711156B2 (en) Apparatus and method for generating shape model of object and apparatus and method for automatically searching for feature points of object employing the same
CN109598268A (en) A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN108520213B (en) Face beauty prediction method based on multi-scale depth
CN110378208B (en) Behavior identification method based on deep residual error network
CN112232184B (en) Multi-angle face recognition method based on deep learning and space conversion network
CN112800876A (en) Method and system for embedding hypersphere features for re-identification
CN102867191A (en) Dimension reducing method based on manifold sub-space study
CN109614866A (en) Method for detecting human face based on cascade deep convolutional neural networks
CN108229432A (en) Face calibration method and device
CN112084895A (en) Pedestrian re-identification method based on deep learning
CN110490028A (en) Recognition of face network training method, equipment and storage medium based on deep learning
Zuobin et al. Feature regrouping for cca-based feature fusion and extraction through normalized cut
Wang et al. Occluded person re-identification via defending against attacks from obstacles
KR20060089376A (en) A method of face recognition using pca and back-propagation algorithms
CN116258938A (en) Image retrieval and identification method based on autonomous evolution loss
CN114155572A (en) Facial expression recognition method and system
CN112836629A (en) Image classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant