CN111539263A - Video face recognition method based on aggregation countermeasure network - Google Patents

Video face recognition method based on aggregation countermeasure network Download PDF

Info

Publication number
CN111539263A
CN111539263A CN202010253595.7A CN202010253595A CN111539263A CN 111539263 A CN111539263 A CN 111539263A CN 202010253595 A CN202010253595 A CN 202010253595A CN 111539263 A CN111539263 A CN 111539263A
Authority
CN
China
Prior art keywords
network
image
aggregation
video
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010253595.7A
Other languages
Chinese (zh)
Other versions
CN111539263B (en
Inventor
陈莹
金炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010253595.7A priority Critical patent/CN111539263B/en
Publication of CN111539263A publication Critical patent/CN111539263A/en
Application granted granted Critical
Publication of CN111539263B publication Critical patent/CN111539263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video face recognition method based on an aggregation countermeasure network, and belongs to the technical field of video face recognition. The method adopts an aggregation countermeasure network constructed by an aggregation network, a discrimination network and an identification network, wherein the aggregation network and the discrimination network form countermeasure learning, and a generated image is closer to a target set static image in a competitive mode; the perception loss is calculated in the high-dimensional feature space through the recognition network, so that the generated image is closer to the corresponding static image of the target set in the perception performance, and the performance of the aggregation network is improved. The discrimination network adopts a softmax multi-dimensional output mode, can judge whether the image is true or false, and can also discriminate the identity type of the image, so that the identity of the generated image is closer to a true value, subsequent identification is more accurate, and the identification efficiency is higher.

Description

Video face recognition method based on aggregation countermeasure network
Technical Field
The invention relates to a video face recognition method based on an aggregation countermeasure network, and belongs to the technical field of video face recognition.
Background
The video face recognition technology is based on video face recognition as the name implies. With the increasing development of technologies and requirements, the video face recognition technology has been applied to many fields, such as intelligent security, video monitoring, public security investigation and the like.
The video face recognition is different from the face recognition based on a single image, the query set of the video face recognition is a video sequence, the target set of the video face recognition is usually a high-definition face image, and the identity of a person in the video is recognized by extracting the face features of the video sequence and matching in the target set.
However, in the most common video monitoring scene of video face recognition, the face in the video sequence often shot has the conditions of motion blur, noise, occlusion and the like, so that the face has a great difference from the face of a target set, and the difference cannot be well processed by the conventional method or the deep learning-based method at present, so that the recognition effect is poor.
In addition, the current video face recognition method extracts features from a video sequence one by one, which not only causes overlong testing time, but also causes recognition results to be easily interfered by low-quality frames in the video sequence.
Disclosure of Invention
In order to solve the problems of low efficiency and low precision in the existing video face recognition technology, the invention provides a video face recognition method, wherein in the recognition process, a polymerization countermeasure network is adopted to polymerize a plurality of low-quality video sequences into a single high-quality front face image, and the quality of the generated front face image is improved in the polymerization process in a countermeasure learning mode, so that the video face recognition is accurately carried out;
the aggregation countermeasure network consists of an aggregation network, a judgment network and an identification network, wherein the aggregation network and the judgment network form countermeasure learning, the generated image is closer to the static image of the target set in a competitive mode, and the identification network calculates the perception loss in a high-dimensional feature space, so that the generated front face image is closer to the static image of the corresponding target set in the perception performance.
Optionally, the discrimination network outputs an N + 1-dimensional vector in a form of softmax multi-dimensional output; where N is the number of identity categories, the remaining dimension represents the true or false of the corresponding image, "true" represents that the corresponding image is a static image, and "false" represents that the corresponding image is a composite image.
Optionally, the method includes:
s1 constructing aggregation network G and losing L through aggregationaggPre-training the aggregation network G to obtain a pre-training model of the aggregation network GMolding;
s2 loading the pre-training model of the aggregation network G, constructing a discrimination network D and a recognition network R, and calculating the confrontation loss LadvAnd a perceptual loss Lper
S3 Joint polymerization loss L in the form of weighted sumaggTo counter the loss LadvAnd a perceptual loss LperTo construct a final loss function L, L ═ Lagg+λLadv+αLperλ and α are resistance loss LadvAnd a perceptual loss LperWeight coefficient of (2) to the polymerization loss LaggTo counter the loss LadvAnd a perceptual loss LperDistributing different weight coefficient values, training the aggregation network G, and storing model parameters after the pre-training model of the aggregation network G converges to obtain an aggregation countermeasure network video face recognition model;
and S4, testing the aggregation countermeasure network video face recognition model obtained in S3, and after the testing is finished, using the aggregation countermeasure network video face recognition model to perform actual application of video face recognition.
Optionally, the S1 further includes before the step of:
acquiring a training video sequence data set, and recording as V ═ V1,v2,...,vi,...,vNIn which v isiRepresenting the ith category video sequence, wherein i is 1,2, and N is the number of categories of the video sequence;
a static image dataset corresponding to V is acquired, denoted S ═ S1,s2,...,si,...,sNIn which s isiRepresenting the static image corresponding to the ith category.
Optionally, the S1 includes:
generating an image G (V) over an aggregation networki k): the input of the aggregation network G is corresponding to the same category viK consecutive video frames of which the output is the corresponding category viDefining the generated image as G (V)i k) K is a hyper-parameter, representing the number of input video frames of the aggregation network, Vi kVideo of ith category representing k frames in successionA sequence;
calculating LaggThe loss of the carbon dioxide gas is reduced,
Figure BDA0002436395930000021
Sirepresents and Vi kStatic images of the same class, by gradient
Figure BDA0002436395930000022
Updating the parameters, L, of the aggregation network GaggCalculating by using a pixel-level L2 loss function;
and after the aggregation network G is converged, saving the network model parameters to obtain the pre-training model of the aggregation network G.
Optionally, the S2 includes:
loading a pre-training model of the aggregation network G to obtain a generated image G (V)i k) And a corresponding still image Si
Constructing a discrimination network D, converting an original image into a feature map by two convolution layers with the step length of 1, decoding the features by the combination of three convolution layers with the step length of 2 and a residual block, downsampling the decoded features by a pooling layer, and outputting a vector with the dimension of N +1 to represent the identity and true and false information of the corresponding image through a full connection layer;
will generate an image G (V)i k) The static image S corresponding to itiSending the data to a discrimination network D to calculate the countermeasure loss
Figure BDA0002436395930000031
Wherein DiAn ith dimension output representing a discrimination network D;
constructing an identification network R, wherein the identification network R adopts a face identification network LightCNN, and an image G (V) is generatedi k) And a static image S corresponding theretoiSending into the recognition network R, calculating the perception loss
Figure BDA0002436395930000032
Wherein R (-) represents a feature identifying the penultimate pooling layer of the networkAnd (5) feature value.
Optionally, in S3: λ is 0.01 and α is 0.003.
Optionally, the process of testing the aggregated countermeasure network video face recognition model in S4 includes:
the static image of the target set is recorded as S ═ S during the test1,s2,...,sj,...,sMSending the data to an identification network R to obtain a final layer of characteristic value F ═ F1,f2,...,fj,...,fM-wherein M represents the total number of categories of identity; f. ofjFeatures representing a target set static image of a person of identity category j;
capturing a human face picture in real time by using a camera, recording a captured human face video sequence of unknown types as V as the input of a convergence network G, and obtaining a generated image G (V) of unknown types;
sending the generated images G (V) into R to obtain the feature f to be inquiredvSeparately calculating the features f of the generated imagevAnd target set characteristic F ═ F1,f2,...,fj,...,fMAnd (4) the Euclidean distance, wherein the corresponding category with the minimum distance is the final recognition result.
The invention also provides application of the video face recognition method in the technical field of face recognition.
Optionally, the technical field of face recognition includes intelligent security, video monitoring and public security investigation.
The invention has the beneficial effects that:
the invention integrates the image generation technology into the video face recognition, and aggregates a plurality of low-quality video sequences into a single high-quality front face image through the aggregation network, thereby overcoming the defect of extracting image characteristics frame by frame in the current video face recognition technology and improving the video face recognition efficiency.
The aggregation countermeasure network constructed by the invention consists of three networks, namely an aggregation network, a judgment network and an identification network, wherein the aggregation network and the judgment network form countermeasure learning, and the generated image is closer to the static image of the target set in a competitive mode; the perception loss is calculated in the high-dimensional feature space through the recognition network, so that the generated image is closer to the corresponding static image of the target set in the perception performance, and the performance of the aggregation network is improved.
The discrimination network designed by the invention adopts a softmax multi-dimensional output mode, can judge the authenticity of the image and also can discriminate the identity type of the image, and ensures that the identity type of the generated image is consistent with the static image of the target set through a resistance loss containing identity type information, so that the identity of the generated image is closer to the true value, the subsequent identification is more accurate and the identification efficiency is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a video face recognition technology based on an aggregation countermeasure network according to the present invention.
Fig. 2 is a network structure diagram of an aggregation countermeasure network used in the present invention.
Fig. 3A is a partial subset display diagram of a video sequence data set used by the present invention.
Fig. 3B is a real value presentation diagram corresponding to the static image of fig. 3A.
Fig. 3C is a diagram of the image result finally synthesized by the video sequence according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
the embodiment provides a video face recognition method based on an aggregation countermeasure network, and with reference to fig. 1, the method includes:
step 1, obtaining a training set, wherein the training set comprises a video sequence data set V and a corresponding static image data set S:
step 1.1, acquiring a training video sequence data set, and recording as V ═ V1,v2,...,vi,...,vNIn which v isiRepresenting the ith category video sequence, wherein i is 1,2, and N is the number of categories of the video sequence;
in practice, N represents the number of different people present in V, and video sequences corresponding to the same person are referred to as a class.
Step 1.2, a static image dataset corresponding to V is obtained, and is marked as S ═ S1,s2,...,si,...,sNIn which s isiRepresenting a static image corresponding to the ith category;
in practical applications, a high-definition camera can be used to capture S, and in some practical video monitoring scenes, the picture in S is usually a picture on an identification card or a specially-captured picture.
The video sequence data set V is shown in fig. 3A, which may be accompanied by some cases of occlusion, motion blur, noise and side faces; as shown in fig. 3B, the still image data set S is shot in a good environment and is a clear front face image.
Step 2, constructing a polymerization network G, and losing L through polymerizationaggPre-training the aggregation network G:
the overall framework of the aggregation countermeasure network is shown in fig. 2, and in the present embodiment, the aggregation countermeasure network is composed of three networks: aggregation networks, discrimination networks, and identification networks.
Step 2.1, generating image G (V) through aggregation networki k);
The input of the aggregation network G is corresponding to the same category viK consecutive video frames of which the output is the corresponding category viDefining the generated image as G (V)i k) K is a hyper-parameter, representing the number of input video frames of the aggregation network, Vi kRepresenting a sequence of k frames of consecutive video of the ith category.
The aggregation network G adopts a network structure in a coding and decoding form, and as shown in fig. 2, the aggregation network extracts shallow features from two convolution layers with a step size of 1, down-samples (codes) the shallow features through three combinations of convolution with a step size of 2 and a residual block, up-samples (decodes) the shallow features through two combinations of deconvolution and a residual block to obtain features with the same size as an original image, and finally obtains a final high-definition face image through two convolution operations and a sigmoid function.
Step 2.2, calculate LaggThe loss of the carbon dioxide gas is reduced,
Figure BDA0002436395930000051
Sirepresents and Vi kStatic images of the same class, by gradient
Figure BDA0002436395930000052
Updating the parameters, L, of the aggregation network GaggThe network convergence can be accelerated by calculating a pixel-level L2 loss function;
step 2.3, after the aggregation network G is converged, saving network model parameters for subsequent formal training;
step 3, loading a pre-training model of the aggregation network G, constructing a discrimination network D and a recognition network R, and adding the countermeasure loss LadvAnd a perceptual loss LperJointly updating the parameters of the aggregation network G:
step 3.1, loading the aggregation network G pre-training model to obtain a generated image G (V)i k) And a corresponding still image Si
And 3.2, constructing a discrimination network D, wherein the discrimination network D is different from a discrimination network in the traditional GAN (genetic adaptive network) and can not only discriminate true and false (truly representing a static image and falsely representing a synthetic image) but also predict the identity of the synthetic image.
Different from the traditional discrimination network, the output of the discrimination network D is an N + 1-dimensional vector through a softmax function, wherein N is the number of identity categories, the identity information of the synthetic image is maximally reserved in a counterstudy mode, and the remaining dimension is used for judging the truth of the synthetic image.
Step 3.3, generating image G (V)i k) The static image S corresponding to itiSending the data to a discrimination network D to calculate the countermeasure loss
Figure BDA0002436395930000053
Wherein DiRepresenting the i-th dimension output of the discrimination network D. For discriminating network D, its goal is to maximize the penalty LadvFor a converged network, it is to combat the loss LadvMinimization;
in other words, when the input of D is a still image SiWhen D is desired to be DN+1(Si) And Di(Si) Are all maximized to 1; when the input of D is a composite image G (V)i k) When D is desired
Figure BDA0002436395930000054
And Di(G(Vi k) All are minimized to 0, and G is desirably DN+1(G(Vi k) And D)i(G(Vi k) Both) are maximized to 1, so both form a counterstudy in judging identity category and in judging true and false;
step 3.4, constructing an identification network R, wherein the R network adopts the existing face identification network LightCNN, and an image G (V) is generatedi k) And a static image S corresponding theretoiSending into the recognition network R, calculating the perception loss
Figure BDA0002436395930000061
Where R (-) represents the feature value identifying the penultimate pooling layer of the network, perceptual loss allows the generation of an image G (V)i k) And a still image SiThe method has the advantages that the method is closer to a high-dimensional feature space, the perception similarity is higher, and meanwhile, the most obvious human face details in the synthetic image are kept, so that the method is more beneficial to the recognition process;
the face recognition network LightCNN refers to the article "A Light Cn for Deep faceReposition with noise Labels" of Xiang Wu, published in 2018 IEEE Transactions on information forms and Security 2884-2896.
Step 3.5 Joint polymerization losses L in the form of weighted sumsaggTo counter the loss LadvAnd a perceptual loss LperTo construct a final loss function L, L ═ Lagg+λLadv+αLperλ is 0.01, α is 0.003, different weight coefficients are distributed to different losses, a Stochastic Gradient Descent (SGD) algorithm is used for training the network, and model parameters are stored after the network model converges;
a specific method of the Stochastic gradient descent algorithm can be referred to "Stochastic gradient Descriptent locks" of Leon Bottou, published in 2012 at 421-.
Step 4, video face recognition testing process:
step 4.1, first, the static image of the target set at the time of test is recorded as S ═ S1,s2,...,sj,...,sMSending the data to an identification network R to obtain a final layer of characteristic value F ═ F1,f2,...,fj,...,fM-wherein M represents the total number of categories of identity; f. ofjFeatures representing a target set static image of a person of identity category j;
step 4.2, capturing a face picture in real time by using a camera, recording a captured face video sequence of unknown type as V as the input of a polymerization network G, and obtaining a generated image G (V) of unknown type, as shown in FIG. 3C;
step 4.3, sending the generated image G (V) into R to obtain the feature f to be inquiredvSeparately calculating the features f of the generated imagevAnd target set characteristic F ═ F1,f2,...,fj,...,fMAnd (4) the Euclidean distance, wherein the corresponding category with the minimum distance is the final recognition result.
Step 5, in order to embody the performance superiority of the aggregation countermeasure network, the COX Face video Face data set is compared with the current advanced methods such as VGG-Face, GERML, TBE-CNN and Haar-Net, wherein the COX Face comprises three subsets V1, V2 and V3, and the image quality of the V1 subset and the V2 subset is much poorer than that of the V3 subset, and the COX Face video Face data set is more in line with the monitoring scene.
The comparison results are shown in table 1, and it can be seen from table 1 that the recognition accuracy of the present invention is 89.6 and 88.5 for V1 and V2 subsets, respectively, which exceed the second algorithm 0.3 and 0.6, but the present invention is relatively poor at the V3 subset with better image quality, but is also inferior to the Haar-Net algorithm. Meanwhile, the number of parameters of the aggregation countermeasure network constructed by the method and the number of network layers are respectively 7.6 and 34, and the parameters are respectively 5.5M and 22 layers less than those of Haar-Net, so that the processing efficiency of the aggregation countermeasure network is higher and the calculation is faster in the same time. Therefore, the aggregation countermeasure network in the invention is obviously superior to other methods in the video monitoring scene, regardless of the identification precision or the calculation complexity.
Table 1: comparison results of the application and VGG-Face, GERML, TBE-CNN and Haar-Net methods
Figure BDA0002436395930000071
The COX Face Video Face data set may be referred to in Huang Zhiwu "A Benchmark and comparative study of Video-based Face Recognition on Cox Face Database", 2015, IEEE Transactions on Image Processing, 5967 and 5981.
VGG-Face can be referred to by Omkar M. Parkhi "Deep Face Recognition" published on page 6 of the British Machine Vision Conference 2015.
The GERML references "Cross itself-to-roman chemistry with application to face recognition from video" of Huang Zhiwu, published in 2018 at IEEE Transactions on Pattern Analysis and Machine understanding, pages 2827 and 2840.
TBE-CNN may be referred to as "round-bridge environmental neural Networks for Video-based Face Recognition" by Changxing Ding, published in 2018, IEEETRANSACTIONS Pattern Analysis and Machine understanding, pp.1002-1014.
Haar-Net can refer to Parchami Mostafa, "Video-based Face Recognition Using Ensemble of Haar-like Deep conditional Neural Networks", published in 2017 on International Joint Conference on Neural Networks, pages 4625 and 4632.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A video face recognition method is characterized in that in the recognition process, a polymerization countermeasure network is adopted to polymerize a plurality of low-quality video sequences into a single high-quality front face image, and the quality of the generated front face image is improved in the polymerization process in a countermeasure learning mode, so that the video face recognition is accurately carried out;
the aggregation countermeasure network consists of an aggregation network, a judgment network and an identification network, wherein the aggregation network and the judgment network form countermeasure learning, the generated image is closer to the static image of the target set in a competitive mode, and the identification network calculates the perception loss in a high-dimensional feature space, so that the generated front face image is closer to the static image of the corresponding target set in the perception performance.
2. The method of claim 1, wherein the discriminative network takes the form of softmax multi-dimensional output, outputting N + 1-dimensional vectors; where N is the number of identity categories, the remaining dimension represents the true or false of the corresponding image, "true" represents that the corresponding image is a static image, and "false" represents that the corresponding image is a composite image.
3. The method of claim 2, wherein the method comprises:
s1 constructing aggregation network G and losing L through aggregationaggPre-training the aggregation network G to obtain a pre-training model of the aggregation network G;
s2 loading the pre-training model of the aggregation network G, constructing a discrimination network D and a recognition network R, and calculating the confrontation loss LadvAnd a perceptual loss Lper
S3 Joint polymerization loss L in the form of weighted sumaggTo counter the loss LadvAnd a perceptual loss LperTo construct a final loss function L, L ═ Lagg+λLadv+αLperλ and α are resistance loss LadvAnd a perceptual loss LperWeight coefficient of (2) to the polymerization loss LaggTo counter the loss LadvAnd a perceptual loss LperDistributing different weight coefficient values, training the aggregation network G, and storing model parameters after the pre-training model of the aggregation network G converges to obtain an aggregation countermeasure network video face recognition model;
and S4, testing the aggregation countermeasure network video face recognition model obtained in S3, and after the testing is finished, using the aggregation countermeasure network video face recognition model to perform actual application of video face recognition.
4. The method according to claim 3, wherein the S1 is preceded by:
acquiring a training video sequence data set, and recording as V ═ V1,v2,…,vi,…,vNIn which v isiThe video sequence of the ith category is represented, i is 1,2, …, and N is the number of categories of the video sequence;
a static image dataset corresponding to V is acquired, denoted S ═ S1,s2,…,si,…,sNIn which s isiRepresenting the static image corresponding to the ith category.
5. The method according to claim 4, wherein the S1 includes:
generating an image G (V) over an aggregation networki k): poly(s) are polymerizedThe input of the combined network G is corresponding to the same category viK consecutive video frames of which the output is the corresponding category viDefining the generated image as G (V)i k) K is a hyper-parameter, representing the number of input video frames of the aggregation network, Vi kA video sequence of the ith category representing a succession of k frames;
calculating LaggThe loss of the carbon dioxide gas is reduced,
Figure FDA0002436395920000021
Sirepresents and Vi kStatic images of the same class, by gradient ▽ LaggUpdating the parameters, L, of the aggregation network GaggCalculating by using a pixel-level L2 loss function;
and after the aggregation network G is converged, saving the network model parameters to obtain the pre-training model of the aggregation network G.
6. The method according to claim 5, wherein the S2 includes:
loading a pre-training model of the aggregation network G to obtain a generated image G (V)i k) And a corresponding still image Si
Constructing a discrimination network D, converting an original image into a feature map by two convolution layers with the step length of 1, decoding the features by the combination of three convolution layers with the step length of 2 and a residual block, then down-sampling the decoded features by a pooling layer, and finally outputting a vector with the dimension of N +1 to represent the identity and true and false information of the corresponding image by a full connection layer;
will generate an image G (V)i k) The static image S corresponding to itiSending the data to a discrimination network D to calculate the countermeasure loss
Figure FDA0002436395920000022
Wherein DiAn ith dimension output representing a discrimination network D;
constructing an identification network R, wherein the identification network R adopts a face identification network LightCNN, and generating an image G (G)Vi k) And a static image S corresponding theretoiSending into the recognition network R, calculating the perception loss
Figure FDA0002436395920000023
Where R (-) represents a feature value that identifies the penultimate pooling layer of the network.
7. The method according to claim 6, wherein in the S3: λ is 0.01 and α is 0.003.
8. The method according to claim 6, wherein the step of testing the converged countermeasure network video face recognition model in the step S4 includes:
the static image of the target set is recorded as S ═ S during the test1,s2,…,sj,…,sMSending the data to an identification network R to obtain a final layer of characteristic value F ═ F1,f2,…,fj,…,fM-wherein M represents the total number of categories of identity; f. ofjFeatures representing a target set static image of a person of identity category j;
capturing a human face picture in real time by using a camera, recording a captured human face video sequence of unknown types as V as the input of a convergence network G, and obtaining a generated image G (V) of unknown types;
sending the generated images G (V) into R to obtain the feature f to be inquiredvSeparately calculating the features f of the generated imagevAnd target set characteristic F ═ F1,f2,…,fj,…,fMAnd (4) the Euclidean distance, wherein the corresponding category with the minimum distance is the final recognition result.
9. The application of the video face recognition method of any one of claims 1-8 in the technical field of face recognition.
10. The application method of claim 9, wherein the technical field of face recognition includes intelligent security, video surveillance and public security investigation.
CN202010253595.7A 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network Active CN111539263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010253595.7A CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010253595.7A CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Publications (2)

Publication Number Publication Date
CN111539263A true CN111539263A (en) 2020-08-14
CN111539263B CN111539263B (en) 2023-08-11

Family

ID=71974857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010253595.7A Active CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Country Status (1)

Country Link
CN (1) CN111539263B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266946A (en) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 Feature identification method and device under shielding condition, computer equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN107977932A (en) * 2017-12-28 2018-05-01 北京工业大学 It is a kind of based on can differentiate attribute constraint generation confrontation network face image super-resolution reconstruction method
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN108985168A (en) * 2018-06-15 2018-12-11 江南大学 A kind of video face identification method based on the study of minimum normalized cumulant
CN109902546A (en) * 2018-05-28 2019-06-18 华为技术有限公司 Face identification method, device and computer-readable medium
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN107977932A (en) * 2017-12-28 2018-05-01 北京工业大学 It is a kind of based on can differentiate attribute constraint generation confrontation network face image super-resolution reconstruction method
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN109902546A (en) * 2018-05-28 2019-06-18 华为技术有限公司 Face identification method, device and computer-readable medium
CN108985168A (en) * 2018-06-15 2018-12-11 江南大学 A kind of video face identification method based on the study of minimum normalized cumulant
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜玉宁 等: "生成式对抗网络模型研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266946A (en) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 Feature identification method and device under shielding condition, computer equipment and medium

Also Published As

Publication number Publication date
CN111539263B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN109615582B (en) Face image super-resolution reconstruction method for generating countermeasure network based on attribute description
Sabir et al. Recurrent convolutional strategies for face manipulation detection in videos
CN108537743B (en) Face image enhancement method based on generation countermeasure network
Liu et al. Robust video super-resolution with learned temporal dynamics
Singh et al. Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods
Wang et al. Enhancing unsupervised video representation learning by decoupling the scene and the motion
Li et al. Beyond single reference for training: Underwater image enhancement via comparative learning
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN110889375B (en) Hidden-double-flow cooperative learning network and method for behavior recognition
CN110852152B (en) Deep hash pedestrian re-identification method based on data enhancement
JPH1055444A (en) Recognition of face using feature vector with dct as base
CN110827265B (en) Image anomaly detection method based on deep learning
CN112801068B (en) Video multi-target tracking and segmenting system and method
CN113112519A (en) Key frame screening method based on interested target distribution
CN113112416A (en) Semantic-guided face image restoration method
Zhou et al. Transformer-based multi-scale feature integration network for video saliency prediction
Parde et al. Deep convolutional neural network features and the original image
CN115862103A (en) Method and system for identifying face of thumbnail
CN110188625B (en) Video fine structuring method based on multi-feature fusion
Zhou et al. Msflow: Multiscale flow-based framework for unsupervised anomaly detection
Luan et al. Learning unsupervised face normalization through frontal view reconstruction
CN111539263A (en) Video face recognition method based on aggregation countermeasure network
Revi et al. Gan-generated fake face image detection using opponent color local binary pattern and deep learning technique
CN111967331A (en) Face representation attack detection method and system based on fusion feature and dictionary learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant