CN111539263B - Video face recognition method based on aggregation countermeasure network - Google Patents

Video face recognition method based on aggregation countermeasure network Download PDF

Info

Publication number
CN111539263B
CN111539263B CN202010253595.7A CN202010253595A CN111539263B CN 111539263 B CN111539263 B CN 111539263B CN 202010253595 A CN202010253595 A CN 202010253595A CN 111539263 B CN111539263 B CN 111539263B
Authority
CN
China
Prior art keywords
network
aggregation
image
video
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010253595.7A
Other languages
Chinese (zh)
Other versions
CN111539263A (en
Inventor
陈莹
金炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010253595.7A priority Critical patent/CN111539263B/en
Publication of CN111539263A publication Critical patent/CN111539263A/en
Application granted granted Critical
Publication of CN111539263B publication Critical patent/CN111539263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a video face recognition method based on an aggregation countermeasure network, and belongs to the technical field of video face recognition. The method adopts an aggregation countermeasure network constructed by an aggregation network, a discrimination network and an identification network, wherein the aggregation network and the discrimination network form countermeasure learning, and the generated image and the static image of the target set are more similar in a competition mode; the perception loss is calculated in the high-dimensional feature space through the identification network, so that the generated image and the corresponding target set static image are closer in perception performance, and the performance of the aggregation network is improved. The discrimination network adopts a form of softmax multidimensional output, and can discriminate identity categories of images besides judging the authenticity of the images, so that the identity of the generated images is closer to the authenticity value, and the subsequent recognition is more accurate and the recognition efficiency is higher.

Description

Video face recognition method based on aggregation countermeasure network
Technical Field
The application relates to a video face recognition method based on an aggregation countermeasure network, and belongs to the technical field of video face recognition.
Background
The video face recognition technology, as the name implies, is based on video to perform face recognition. With the increasing development of technology and demand, video face recognition technology has been applied in various fields, such as intelligent security, video monitoring, public security investigation, and the like.
The video face recognition is different from the face recognition based on a single image, the query set of the video face recognition is a video sequence, and the target set of the video face recognition is usually a high-definition face image, and the identity of the person in the video is further recognized by extracting the face characteristics of the video sequence and matching the face characteristics in the target set.
However, in the video monitoring scene with the most common video face recognition, the faces in the video sequence often have the conditions of motion blur, noise, shielding and the like, so the faces have great differences with the target set faces, and the differences cannot be well processed by the conventional method or the method based on deep learning at present, so that the recognition effect is poor.
In addition, the existing video face recognition method extracts features from the video sequence one by one, so that not only can the test time be too long, but also the recognition result can be easily interfered by low-quality frames in the video sequence.
Disclosure of Invention
In order to solve the problems of low efficiency and low precision in the existing video face recognition technology, the application provides a video face recognition method, which comprises the steps of adopting an aggregation countermeasure network to aggregate a plurality of low-quality video sequences into a single high-quality face image in the recognition process, and improving the quality of the generated face image in the aggregation process in a countermeasure learning mode so as to accurately perform video face recognition;
the aggregation countermeasure network consists of an aggregation network, a discrimination network and an identification network, wherein the aggregation network and the discrimination network form countermeasure learning, the generated image and the target set static image are more similar in a competitive mode, and the identification network calculates the perception loss in a high-dimensional feature space, so that the generated front face image and the corresponding target set static image are more similar in perception performance.
Optionally, the discrimination network adopts a softmax multidimensional output form to output an n+1-dimensional vector; where N is the number of identity categories, the remaining one dimension represents true or false of the corresponding image, "true" represents the corresponding image as a static image, and "false" represents the corresponding image as a composite image.
Optionally, the method includes:
s1, constructing an aggregation network G, and losing L through aggregation agg Pre-training the aggregation network G to obtain an aggregation network G pre-training model;
s2, loading a pre-training model of the aggregation network G, constructing a discrimination network D and an identification network R, and calculating the countermeasures loss L adv And perceived loss L per
S3 adopting weighted sum form to combine aggregation loss L agg Countering loss L adv And perceived loss L per To construct the final loss function L, l=l agg +λL adv +αL per The method comprises the steps of carrying out a first treatment on the surface of the Lambda and alpha are respectively the countermeasures against loss L adv And perceived loss L per Weight coefficient of (2), give aggregation loss L agg Countering loss L adv And perceived loss L per Different weight coefficient values are distributed, the aggregation network G is trained, model parameters are stored after the pre-training model of the aggregation network G is converged, and an aggregation countermeasure network video face recognition model is obtained;
s4, testing the aggregated countermeasure network video face recognition model obtained in the S3, and performing practical application of video face recognition by using the aggregated countermeasure network video face recognition model after the testing is completed.
Optionally, the step S1 further includes:
acquiring a training video sequence data set, denoted as v= { V 1 ,v 2 ,...,v i ,...,v N }, v is i Representing an i-th category video sequence, i=1, 2,..n, N is the number of categories of video sequences;
acquiring a static image dataset corresponding to V, denoted s= { S 1 ,s 2 ,...,s i ,...,s N (s is therein i Representing the static image corresponding to the i-th category.
Optionally, the S1 includes:
generating an image G (V) over an aggregation network i k ): the inputs to the aggregation network G are corresponding to the same category v i The output is the corresponding category v i Shan Zhanggao quality face image of (2), defining the generated image as G (V i k ) K is a super parameter, which represents the number of video frames input by the aggregation network, V i k A video sequence representing the ith class of k consecutive frames;
calculate L agg The loss of the material is controlled by the temperature,S i representation and V i k Static images of the same class, by gradientUpdating parameters, L, of an aggregation network G agg Calculating by adopting a pixel level L2 loss function;
after the aggregation network G converges, the network model parameters are saved, and the aggregation network G pre-training model is obtained.
Optionally, the S2 includes:
loading the aggregation network G pre-training model to obtain a generated image G (V i k ) Corresponding still image S i
Constructing a discrimination network D, firstly converting an original image into a feature map through two convolution layers with the step length of 1, then decoding features through the combination of three convolution layers with the step length of 2 and residual blocks, then downsampling the decoded features through a pooling layer, and finally outputting an N+1-dimensional vector representing identity and true and false information of a corresponding image through a full connection layer;
will generate an image G (V i k ) Corresponding static image S i Into the discrimination network D to calculate the countermeasures against lossWherein D is i An ith dimension output representing the discrimination network D;
constructing a recognition network R, which uses a face recognition network LightCNN, to generate an image G (V i k ) And a corresponding still image S i Into the recognition network R, calculating the perceived lossWhere R (-) represents a characteristic value that identifies the penultimate pooling layer of the network.
Optionally, in S3: λ=0.01, α=0.003.
Optionally, the process of testing the aggregation countermeasure network video face recognition model in S4 includes:
the static image of the target set at the time of test is denoted as s= { S 1 ,s 2 ,...,s j ,...,s M }, it is subjected toRespectively sending the characteristic values into the identification network R to obtain a final layer of characteristic values F= { F 1 ,f 2 ,...,f j ,...,f M -wherein M represents the total number of categories of identities; f (f) j Features representing a static image of a target set of people of identity class j;
capturing face images in real time by using a camera, and recording the captured face video sequences of unknown categories as V as the input of an aggregation network G to obtain a generated image G (V) of the unknown categories;
sending the generated image G (V) into R to obtain the feature f to be queried v Respectively calculating the characteristic f of the generated image v With object set feature f= { F 1 ,f 2 ,...,f j ,...,f M The Euclidean distance with the smallest distance is the final recognition result.
The application also provides application of the video face recognition method in the technical field of face recognition.
Optionally, the technical field of face recognition includes intelligent security, video monitoring and public security investigation.
The application has the beneficial effects that:
according to the application, the image generation technology is integrated into video face recognition, a plurality of low-quality video sequences are aggregated into a single high-quality front face image through the aggregation network, the defect of extracting image features frame by frame in the existing video face recognition technology is overcome, and the video face recognition efficiency is improved.
The aggregation countermeasure network constructed by the application consists of an aggregation network, a discrimination network and an identification network, wherein the aggregation network and the discrimination network form countermeasure learning, so that the generated image and the target set static image are more similar in a competition mode; the perception loss is calculated in the high-dimensional feature space through the identification network, so that the generated image and the corresponding target set static image are closer in perception performance, and the performance of the aggregation network is improved.
The discrimination network designed by the application adopts a mode of softmax multidimensional output, can judge the true and false of the image, can also distinguish the identity type of the image, ensures that the identity type of the generated image is consistent with the static image of the target set through the countermeasures containing the identity type information, ensures that the identity of the generated image is closer to the true value, and ensures that the subsequent recognition is more accurate and has higher recognition efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a video face recognition technology based on an aggregation countermeasure network provided by the application.
Fig. 2 is a network configuration diagram of an aggregation countermeasure network used in the present application.
Fig. 3A is a partial subset representation of a video sequence data set used in the present application.
Fig. 3B is a realistic value display diagram corresponding to the still image of fig. 3A.
Fig. 3C is a graph of the image results of the final synthesis of the present application through a video sequence.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
Embodiment one:
the embodiment provides a video face recognition method based on an aggregation countermeasure network, referring to fig. 1, the method includes:
step 1, acquiring a training set comprising a video sequence data set V and a corresponding static image data set S:
step 1.1, acquiring a training video sequence data set, which is marked as V= { V 1 ,v 2 ,...,v i ,...,v N }, v is i Representing an i-th category video sequence, i=1, 2,..n, N is the number of categories of video sequences;
in practice, N represents the number of different people present in V, and video sequences corresponding to the same person are referred to as one class.
Step 1.2, acquiring a static image dataset corresponding to V, denoted as S= { S 1 ,s 2 ,...,s i ,...,s N (s is therein i Representing a static image corresponding to the i-th category;
in practical applications, a high-definition camera may be used to capture S, and in some practical video monitoring scenarios, the picture in S is usually a photo on an identification card or a photo specifically captured.
The video sequence data set V is shown in fig. 3A, with some occlusion, motion blur, noise and side face cases; as shown in fig. 3B, the still image data set S is photographed in a good environment, and is a clear front face image.
Step 2, constructing an aggregation network G, and losing L through aggregation agg Pretrained aggregation network G:
the aggregate antagonism network overall framework is shown in fig. 2, and in this embodiment, the aggregate antagonism network is composed of three networks: aggregation network, discrimination network, and identification network.
Step 2.1, generating an image G (V) over an aggregation network i k );
The inputs to the aggregation network G are corresponding to the same category v i The output is the corresponding category v i Shan Zhanggao quality face image of (2), defining the generated image as G (V i k ) K is a super parameter, which represents the number of video frames input by the aggregation network, V i k Representing a k-frame consecutive video sequence of the i-th category.
The aggregation network G adopts a network structure in a coding and decoding form, and it can be known from fig. 2 that the aggregation network extracts shallow features from two convolution layers with a step length of 1, then downsamples (codes) the shallow features by three combinations of convolution with a step length of 2 and residual blocks, then upsamples (decodes) the combinations of deconvolution and residual blocks to obtain features with the same size as the original image, and finally obtains a final high-definition face image by two convolution operations and a sigmoid function.
Step 2.2, calculate L agg The loss of the material is controlled by the temperature,S i representation and V i k Static images of the same class, by gradient +.>Updating parameters, L, of an aggregation network G agg The convergence of the network can be accelerated by adopting the pixel-level L2 loss function to calculate;
step 2.3, after convergence of the aggregation network G, saving network model parameters for subsequent formal training;
step 3, loading a pre-training model of the aggregation network G, constructing a discrimination network D and an identification network R, and adding a countermeasures loss L adv And perceived loss L per Jointly updating parameters of the aggregation network G:
step 3.1, loading a pre-training model of the aggregation network G to obtain a generated image G (V) i k ) Corresponding still image S i
Step 3.2, a discrimination network D is constructed, and unlike the discrimination network in the conventional GAN (Generative Adversarial Networks generation type countermeasure network), the discrimination network D in the present application can discriminate not only between true and false (true represents a still image and false represents a composite image) but also predict the identity of the composite image.
Unlike conventional discrimination networks, the output of the discrimination network D of the present application is a vector of n+1 dimensions through a softmax function, where N is the number of identity classes, and the combined image is maximized by means of countermeasure learning to retain identity information, leaving one dimension for judging whether it is true or false.
Step 3.3, an image G (V i k ) Corresponding static image S i Into the discrimination network D to calculate the countermeasures against lossWherein D is i Ith dimension representing a discrimination network DAnd outputting. For discriminating network D, its goal is to maximize the countering loss L adv Whereas for an aggregation network it is to combat the loss L adv Minimizing;
in other words, when the input of D is a still image S i When D is desired D N+1 (S i ) And D i (S i ) All maximized to 1; when the input of D is a composite image G (V i k ) When D is desiredAnd D i (G(V i k ) All minimized to 0, while G expects D N+1 (G(V i k ) D) and D i (G(V i k ) 1) so that both form an countermeasure study in judging identity class and judging true or false;
step 3.4, constructing a recognition network R, wherein the R network adopts the existing face recognition network LightCNN to generate an image G (V i k ) And a corresponding still image S i Into the recognition network R, calculating the perceived lossWhere R (-) represents a characteristic value identifying the penultimate pooling layer of the network, the perceived loss lets the image G (V i k ) And still image S i The method has the advantages that the method is closer in high-dimensional feature space, the perceived similarity is higher, and meanwhile, the most obvious face details in the synthesized image are reserved, so that the recognition process is facilitated;
the face recognition network LightCNN may refer to "A Light Cnn for Deep Face Representation with Noisy Labels" by Xiang Wu, which is published on pages 2884-2896 of IEEE Transactions on Information Forensics and Security in 2018.
Step 3.5, combining the aggregation loss L in the form of a weighted sum agg Countering loss L adv And perceived loss L per To construct the final loss function L, l=l agg +λL adv +αL per λ=0.01, α=0.003, given to different speciesDifferent weight coefficients are allocated to the loss of the network, a random gradient descent algorithm (Stochastic Gradient Descent, SGD) is utilized to train the network, and model parameters are stored after the network model converges;
specific methods for the random gradient descent algorithm are described in the document "Stochastic Gradient Descent Tricks" by Leon Bottou, 2012 on pages 421-436 of "Neural networks: tricks of the trade".
Step 4, a video face recognition testing process:
step 4.1, firstly, recording the static image of the target set during test as S= { S 1 ,s 2 ,...,s j ,...,s M Respectively sending the characteristic values into a recognition network R to obtain a final layer of characteristic values F= { F 1 ,f 2 ,...,f j ,...,f M -wherein M represents the total number of categories of identities; f (f) j Features representing a static image of a target set of people of identity class j;
step 4.2, capturing face images in real time by using a camera, and recording the captured face video sequences of unknown categories as V as the input of an aggregation network G to obtain a generated image G (V) of the unknown categories, as shown in FIG. 3C;
step 4.3, sending the generated image G (V) into R to obtain the feature f to be queried v Respectively calculating the characteristic f of the generated image v With object set feature f= { F 1 ,f 2 ,...,f j ,...,f M The Euclidean distance with the smallest distance is the final recognition result.
In order to embody the superiority of aggregation to the performance of the network, the application compares with the current advanced methods of VGG-Face, GERML, TBE-CNN and Haar-Net on a COX Face video Face data set, wherein the COX Face comprises three subsets of V1, V2 and V3, the image quality of the subsets of V1 and V2 is much worse than that of V3, and the application is more in line with the monitoring scene.
As shown in Table 1, it is clear from Table 1 that the recognition accuracy of the present application is 89.6 and 88.5 for the V1 and V2 subsets, respectively, exceeding the second algorithms 0.3 and 0.6, but the effect is relatively poor on the V3 subset with better image quality, but is also inferior to the Haar-Net algorithm. Meanwhile, the quantity of the aggregation countermeasure network parameters and the number of network layers constructed by the method are respectively 7.6 layers and 34 layers, and compared with Haar-Net layers, the quantity of the aggregation countermeasure network parameters and the number of network layers are respectively 5.5M layers and 22 layers, so that the aggregation countermeasure network processing efficiency is higher and the calculation is faster in the same time. Therefore, the aggregation countermeasure network in the application is obviously superior to other methods in terms of identification precision and calculation complexity under the monitoring video scene.
Table 1: comparison results of the present application with VGG-Face, GERML, TBE-CNN, haar-Net method
The COX Face video Face dataset may be referred to Huang Zhiwu, "A Benchmark and Comparative Study of Video-based Face Recognition on Cox Face Database," which is published 2015 on pages 5967-5981 of IEEE Transactions on Image Processing.
VGG-Face can be referred to "Deep Face Recognition" by Omkar M.Parkhi, which is published on page 6 of British Machine Vision Conference at 2015.
GERML is described in Huang Zhiwu, "Cross euclidean-to-riemannian metric learning with application to face recognition from video", which is published in 2018 at IEEE Transactions on Pattern Analysis and Machine Intelligence on pages 2827-2840.
TBE-CNN can be referred to as "Trunk-branch Ensemble Convolutional Neural Networks for Video-based Face Recognition" by Changxing Ding, which is published in 2018 on pages 1002-1014 of IEEE Transactions on Pattern Analysis and Machine Intelligence.
Haar-Net can be referred to Parchami Mostafa, video-based Face Recognition Using Ensemble of Haar-like Deep Convolutional Neural Networks, published in 2017 at International Joint Conference on Neural Networks, pages 4635-4632.
Some steps in the embodiments of the present application may be implemented by using software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims (6)

1. A video face recognition method is characterized in that in the method, a plurality of low-quality video sequences are aggregated into a single high-quality front face image by adopting an aggregation countermeasure network in the recognition process, and the quality of the generated front face image is improved in an countermeasure learning mode in the aggregation process, so that video face recognition is accurately performed;
the aggregation countermeasure network consists of an aggregation network, a discrimination network and an identification network, wherein the aggregation network and the discrimination network form countermeasure learning, the generated image and the target set static image are more similar in a competitive manner, and the identification network calculates the perception loss in a high-dimensional feature space, so that the generated front face image and the corresponding target set static image are more similar in perception performance;
the method comprises the following steps:
s1, constructing an aggregation network G, and losing L through aggregation agg Pre-training the aggregation network G to obtain an aggregation network G pre-training model;
s2, loading a pre-training model of the aggregation network G, constructing a discrimination network D and an identification network R, and calculating the countermeasures loss L adv And perceived loss L per
S3 adopting weighted sum form to combine aggregation loss L agg Countering loss L adv And perceived loss L per To construct the final loss function L, l=l agg +λL adv +αL per The method comprises the steps of carrying out a first treatment on the surface of the Lambda and alpha are respectively the countermeasures against loss L adv And perceived loss L per Weight coefficient of (2), give aggregation loss L agg Countering loss L adv And perceived loss L per Different weight coefficient values are distributed, the aggregation network G is trained, and the aggregation network G is pre-trainedAfter the model is converged, the model parameters are stored to obtain an aggregation countermeasure network video face recognition model;
s4, testing the aggregated countermeasure network video face recognition model obtained in the S3, and performing actual application of video face recognition by using the aggregated countermeasure network video face recognition model after the test is completed;
the step S1 further includes:
acquiring a training video sequence data set, denoted as v= { V 1 ,v 2 ,...,v i ,...,v N }, v is i Representing an i-th category video sequence, i=1, 2,..n, N is the number of categories of video sequences;
acquiring a static image dataset corresponding to V, denoted s= { S 1 ,s 2 ,...,s i ,...,s N (s is therein i Representing a static image corresponding to the i-th category;
the S1 comprises the following steps:
generating an image G (V) over an aggregation network i k ): the inputs to the aggregation network G are corresponding to the same category v i The output is the corresponding category v i Shan Zhanggao quality face image of (2), defining the generated image as G (V i k ) K is a super parameter, which represents the number of video frames input by the aggregation network, V i k A video sequence representing the ith class of k consecutive frames;
calculate L agg The loss of the material is controlled by the temperature,S i representation and V i k Static images of the same class, pass gradient L agg Updating parameters, L, of an aggregation network G agg Calculating by adopting a pixel level L2 loss function;
after convergence of the aggregation network G, saving network model parameters to obtain an aggregation network G pre-training model;
the step S2 comprises the following steps:
loading the aggregation network G pre-training model to obtain a generated image G (V i k ) Corresponding still image S i
Constructing a discrimination network D, converting an original image into a feature map through two convolution layers with the step length of 1, decoding features through the combination of three convolution layers with the step length of 2 and residual blocks, downsampling the decoded features through a pooling layer, and finally outputting the identity and true and false information of the corresponding image represented by the vector of the dimension N+1 through a full connection layer;
will generate an image G (V i k ) Corresponding static image S i Into the discrimination network D to calculate the countermeasures against lossWherein D is i An ith dimension output representing the discrimination network D;
constructing a recognition network R, which uses a face recognition network LightCNN, to generate an image G (V i k ) And a corresponding still image S i Into the recognition network R, calculating the perceived lossWhere R (-) represents a characteristic value that identifies the penultimate pooling layer of the network.
2. The method of claim 1, wherein the discrimination network takes the form of a softmax multidimensional output outputting an n+1-dimensional vector; where N is the number of identity categories, the remaining one dimension represents true or false of the corresponding image, "true" represents the corresponding image as a static image, and "false" represents the corresponding image as a composite image.
3. The method according to claim 2, wherein in S3: λ=0.01, α=0.003.
4. A method according to claim 3, wherein the step of testing the aggregated countermeasure network video face recognition model in S4 includes:
the static image of the target set at the time of test is denoted as s= { S 1 ,s 2 ,...,s j ,...,s M Respectively sending the characteristic values into a recognition network R to obtain a final layer of characteristic values F= { F 1 ,f 2 ,...,f j ,...,f M -wherein M represents the total number of categories of identities; fj represents the characteristics of the static image of the target set of people with identity class j;
capturing face images in real time by using a camera, and recording the captured face video sequences of unknown categories as V as the input of an aggregation network G to obtain a generated image G (V) of the unknown categories;
sending the generated image G (V) into R to obtain the feature f to be queried v Respectively calculating the characteristic f of the generated image v With object set feature f= { F 1 ,f 2 ,...,f j ,...,f M The Euclidean distance with the smallest distance is the final recognition result.
5. Use of the video face recognition method of any one of claims 1-4 in the field of face recognition technology.
6. The application method according to claim 5, wherein the face recognition technical field comprises intelligent security, video monitoring and public security investigation.
CN202010253595.7A 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network Active CN111539263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010253595.7A CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010253595.7A CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Publications (2)

Publication Number Publication Date
CN111539263A CN111539263A (en) 2020-08-14
CN111539263B true CN111539263B (en) 2023-08-11

Family

ID=71974857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010253595.7A Active CN111539263B (en) 2020-04-02 2020-04-02 Video face recognition method based on aggregation countermeasure network

Country Status (1)

Country Link
CN (1) CN111539263B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266946A (en) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 Feature identification method and device under shielding condition, computer equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977932A (en) * 2017-12-28 2018-05-01 北京工业大学 It is a kind of based on can differentiate attribute constraint generation confrontation network face image super-resolution reconstruction method
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN108985168A (en) * 2018-06-15 2018-12-11 江南大学 A kind of video face identification method based on the study of minimum normalized cumulant
CN109902546A (en) * 2018-05-28 2019-06-18 华为技术有限公司 Face identification method, device and computer-readable medium
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11024009B2 (en) * 2016-09-15 2021-06-01 Twitter, Inc. Super resolution using a generative adversarial network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977932A (en) * 2017-12-28 2018-05-01 北京工业大学 It is a kind of based on can differentiate attribute constraint generation confrontation network face image super-resolution reconstruction method
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN109902546A (en) * 2018-05-28 2019-06-18 华为技术有限公司 Face identification method, device and computer-readable medium
CN108985168A (en) * 2018-06-15 2018-12-11 江南大学 A kind of video face identification method based on the study of minimum normalized cumulant
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜玉宁 等.生成式对抗网络模型研究.《青岛大学学报(自然科学版)》.2019,第32卷(第3期),全文. *

Also Published As

Publication number Publication date
CN111539263A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
Sabir et al. Recurrent convolutional strategies for face manipulation detection in videos
CN109615582B (en) Face image super-resolution reconstruction method for generating countermeasure network based on attribute description
CN108537743B (en) Face image enhancement method based on generation countermeasure network
Liu et al. Robust video super-resolution with learned temporal dynamics
Kim et al. Fully deep blind image quality predictor
CN112818862B (en) Face tampering detection method and system based on multi-source clues and mixed attention
Lin et al. Real photographs denoising with noise domain adaptation and attentive generative adversarial network
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
Wen et al. VIDOSAT: High-dimensional sparsifying transform learning for online video denoising
CN110580472A (en) video foreground detection method based on full convolution network and conditional countermeasure network
Wang et al. SmsNet: A new deep convolutional neural network model for adversarial example detection
Parde et al. Face and image representation in deep CNN features
CN110827265A (en) Image anomaly detection method based on deep learning
CN113112416A (en) Semantic-guided face image restoration method
Parde et al. Deep convolutional neural network features and the original image
Krishnan et al. SwiftSRGAN-Rethinking super-resolution for efficient and real-time inference
CN116453232A (en) Face living body detection method, training method and device of face living body detection model
CN111310516B (en) Behavior recognition method and device
CN111539263B (en) Video face recognition method based on aggregation countermeasure network
CN110659679B (en) Image source identification method based on adaptive filtering and coupling coding
Revi et al. Gan-generated fake face image detection using opponent color local binary pattern and deep learning technique
Prajapati et al. Mri-gan: A generalized approach to detect deepfakes using perceptual image assessment
CN115294424A (en) Sample data enhancement method based on generation countermeasure network
Žižakić et al. Efficient local image descriptors learned with autoencoders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant