CN113420806A - Face detection quality scoring method and system - Google Patents

Face detection quality scoring method and system Download PDF

Info

Publication number
CN113420806A
CN113420806A CN202110688239.2A CN202110688239A CN113420806A CN 113420806 A CN113420806 A CN 113420806A CN 202110688239 A CN202110688239 A CN 202110688239A CN 113420806 A CN113420806 A CN 113420806A
Authority
CN
China
Prior art keywords
network
face
real
constructing
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110688239.2A
Other languages
Chinese (zh)
Other versions
CN113420806B (en
Inventor
刘芳
任保家
黄欣研
李玲玲
刘洋
刘旭
郭雨薇
郝泽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110688239.2A priority Critical patent/CN113420806B/en
Publication of CN113420806A publication Critical patent/CN113420806A/en
Application granted granted Critical
Publication of CN113420806B publication Critical patent/CN113420806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection quality scoring method and a face detection quality scoring system, wherein a face detection network is constructed and pre-trained, so that a model can accurately position the face; meanwhile, a reward function capable of automatically adjusting reward punishment in the training process is provided, and the reward function and the face detection network form an environment generator; and forming an intelligent body by using a shallow convolutional neural network to score the face quality. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the deep reinforcement learning idea and the self-adjustment reward and punishment mechanism, can efficiently select the human face with better quality from the video data, and improves the performance of a human face recognition system.

Description

Face detection quality scoring method and system
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a face detection quality scoring method and system.
Background
In recent years, with the rapid development of deep learning technology, face detection technology has advanced greatly. This benefits from continuously updated advanced neural network architectures and the constant efforts of researchers on face detection theory. The progress of the face detection technology based on deep learning also promotes the success of related application products, and face detection has achieved good results in the fields of campus safety, life services and the like by means of the strong feature extraction capability of a deep neural network and the real-time performance of a lightweight neural network.
However, there still exist certain problems for the whole face recognition system. For the problem that the human face can not be detected, the current human face detection algorithm can well process the problem; however, there is some worry about whether the quality of the detected face can meet the recognition standard. In a real video surveillance scene, the state of a person appearing in a video picture is random. This randomness includes two aspects. Firstly, the change of the external environment is random, the quality of the video is influenced by the uncertain weather conditions and the day and night replacement time; second, the facial expressions and gestures of a person as they appear on a screen are also uncertain. These factors all play a crucial role in the final recognition result.
Disclosure of Invention
The invention aims to solve the technical problem that the defects in the prior art are overcome, and provides a face detection quality scoring method and a face detection quality scoring system.
The invention adopts the following technical scheme:
a face detection quality scoring method comprises the following steps:
s1, acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in the form of face-face annotation;
s2, constructing a face detection network D, inputting the pair data set constructed in the step S1 into the face detection network D in batches for training, wherein the size of each batch is B;
s3, constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state S;
s4, constructing a reward function R (S, a) of the reward and punishment strength which is automatically adjusted in the training process; combining the generated data with the human face detection network D in the step S2 to form an environment generator Env, generating a state S by the Env, and inputting the state S into the intelligent agent RLQAgent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
s5, constructing an experience playback pool ReplayBuffer, and caching the data [ S, a, R, S '] obtained in the step S4, wherein S' is the state generated by the environment Env at the next moment;
s6, constructing a target Q network QtargetAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an intelligent agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay buffer by using the experience playback pool constructed in the step S5realTraining to obtain real-time Q network QrealNetwork weight Θ;
s7, initializing the agent RLQAgent by using the network weight theta obtained in the step S6, and combining the agent RLQAgent with the face detection network D in the step S2 to realize the quality scoring of the face F when the face F is detected.
Specifically, in step S2, constructing the face detection network D specifically includes:
s201, constructing a backbone network Resnet50, and generating 3 features T with different scales1,T2,T3
S202, constructing a first up-sampling layer U1A second upper sampling layer U2A third upper sampling layer U3Obtaining an intermediate feature T by using the feature pyramid network FPN1,T2,T3
S203, constructing a context information module SSH comprising a first 3 × 3 convolutional layer, a first 5 × 5 convolutional layer, a second 5 × 5 convolutional layer, a first 7 × 7 convolutional layer and a second 7 × 7 convolutional layer to obtain a final feature F1,F2,F2
S204, constructing a face frame head BoxHead, a key point head LandMarkHead and a classifier Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.
Specifically, in step S3, the step of constructing the agent RLQAgent specifically includes:
s301, constructing an intelligent body network comprising a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
and S302, outputting the action classification a and the expected reward value Q through the intelligent agent network.
Specifically, in step S4, the reward function R (S, a) is specifically:
Figure BDA0003125359170000031
wherein, Epochs is the total training algebra, and Epochs is the current algebra.
Specifically, in step S5, the empirical playback pool ReplayBuffer is a double-ended queue, and the capacity is a fixed value 512.
Specifically, step S6 specifically includes:
s601, resetting the environment Env to obtain an initial state S0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S6010
S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;
s604, training the real-time Q network Q by using the 64 samples obtained in the step S603realCalculating a loss value, adding a regular term part to constrain the model, and updating the real-time Q network Q by a batch random gradient descent methodrealNetwork weight Θ in (1);
s605, Q network Q of real-timerealInteract with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S605 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequencytargetIf the updating is needed, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent QrealThe weights Θ of (a) are copied to the target Q network QtargetPerforming the following steps;
s609, repeating the step S603 to the step S608 until the training is finished, and outputting a real-time Q network QrealThe weight Θ in (1).
Further, in step S604, the loss value L (Θ) is calculated as follows:
Figure BDA0003125359170000041
wherein the content of the first and second substances,
Figure BDA0003125359170000042
for the accumulated reward expectation, L (Θ) is the loss value, γ is the decay factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network QrealThe weight of (c).
Further, the number of times of step S609 is 200.
Specifically, step S7 specifically includes:
s701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
Another technical solution of the present invention is a face detection quality scoring system, including:
the data module is used for acquiring a face image and corresponding annotation data thereof and constructing a paired data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;
weight module, construct the target Q network QtargerAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network QrealTraining to obtain real-time Q network QrealNetwork weight Θ;
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention discloses a face detection quality scoring method based on depth reinforcement learning and self-adjustment reward and punishment mechanisms. Firstly, constructing a face detection network and pre-training so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The invention realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining with the ideas of depth reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting a key human face from video data.
Further, the face detection network D is pre-trained using the data set, so that the specific position P of the face can be accurately detected from the video or image.
Furthermore, an intelligent RLQAgent is constructed, and the evaluation on the face quality can be realized by combining a shallow convolutional neural network with a face detection network D under the condition of not increasing the expenditure after training by adopting a reinforcement learning method.
Further, a progressively converging reward function R (s, a) is constructed, which makes it easier to make a wrong decision at an early stage of the agent training because the model has a greater randomness in making decisions. Therefore, at this stage, the penalty is increased. Along with the training, the decision-making capability of the model is continuously enhanced; therefore, the punishment strength is gradually reduced in the later period of training and is recovered to the same level as the reward.
Further, an empirical playback pool ReplayBuffer is constructed for caching [ s, a, R, s' ] data. The training time of the model can be greatly shortened by adopting the strategy of empirical replay.
Furthermore, the target Q network algorithm is used for training the RLQAccent of the intelligent body, and the intelligent body with good decision-making capability can be quickly trained to carry out quality scoring on the face by combining an experience playback strategy.
In summary, the face detection network is firstly constructed and pre-trained, so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The invention realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining with the ideas of depth reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting a key human face from video data.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a block diagram of a face detection network D of the present invention;
FIG. 2 is a network architecture diagram of an agent RLQAgent;
FIG. 3 is a diagram of the interaction of the agent RLQAgent with the environment Env;
FIG. 4 is a schematic diagram of a target Q network algorithm;
FIG. 5 is a comparison of the results of scoring, wherein (a) is the method of the present invention and (b) is the faceQNet method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.
The invention provides a face detection quality scoring method, which is based on a depth reinforcement learning and self-adjustment reward and punishment mechanism, and comprises the steps of firstly constructing a face detection network and pre-training, so that a model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The invention realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining with the ideas of depth reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting a key human face from video data.
The invention relates to a face detection quality scoring method, which comprises the following steps:
s1, acquiring face image
Figure BDA0003125359170000081
And corresponding label data
Figure BDA0003125359170000082
X is a face image, K is the number of face images, wherein X belongs to RNxNAnd R represents a real number domain. I (X) is belonged to {0,1}, and represents whether the face is a human face or not; p (X) epsilon R8x1Indicating the position of the face; l (X) ε R10xRepresenting the positions of key points of the human face; construction of paired datasets in the form of "face-face labeling
Figure BDA0003125359170000091
S2, constructing a human face detection network D, and combining the paired data sets in the step S1
Figure BDA0003125359170000092
Inputting the data into a face detection network D in batches for training, wherein the size of each batch is B;
referring to fig. 1, the face detection network D specifically includes:
s201, constructing a backbone network Resnet50 for generating 3 different scales of characteristics T1,T2,T3
S202, constructing a feature pyramid network FPN which is composed of a first up-sampling layer U1A second upper sampling layer U2A third upper sampling layer U3The components of the composition are as follows,for deriving intermediate features T1,T2,T3
S203, a structural context information module SSH composed of a first 3 × 3 convolution layer, a first 5 × 5 convolution layer, a second 5 × 5 convolution layer, a first 7 × 7 convolution layer and a second 7 × 7 convolution layer for obtaining a final feature F1,F2,F3
S204, constructing a face frame head BoxHead, a key point head LandMarkHead and a classifier Classification; and the method is used for generating the final face position, the face key point and the probability of whether the face is the face or not.
S3, constructing an intelligent agent RLQAgent, wherein the input is a state S, namely the face images in different postures; the output in the training stage is an action a, namely judging the quality of the human face to be good or bad; outputting a face score q in an inference stage, wherein the q belongs to [0,1 ];
referring to fig. 2, the method for constructing the agent RLQAgent specifically includes:
s301, constructing an intelligent network, wherein the intelligent network consists of a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
s302, outputting the action classification a and the expected reward value Q.
S4, constructing the reward function R (S, a). And combined with the face detection network D into an environment generator Env. Env generates a state S, and sends the state S to the agent RLQAgent in step S3 to obtain a decision action a; then obtaining a reward value R according to the state s and the action a; the formula of the constructed reward function R (s, a) is specifically:
Figure BDA0003125359170000101
wherein a is the action generated by the agent according to the state, Epochs is the total training algebra, and epoch is the current algebra.
S5, constructing an experience playback pool ReplayBuffer for caching [ S, a, R, S' ] data; a is the action executed by the agent RLQAgent according to the state s, R is the reward value given by the environment, and s' is the state generated by the environment Env at the next moment;
specifically, the constructed experience playback pool is a double-ended queue, the capacity of the experience playback pool is a fixed value 512, and the experience playback pool is used for storing historical data [ s, a, R, s' ] of the intelligent agent for decision making.
Referring to fig. 3, the specific process of interaction between the agent and the environment and storing the experience data is as follows: the face detection network D outputs faces with different qualities, namely a state s, and the intelligent RLQAgent obtains an action a according to the state s; calculating a specific reward value R using a reward function R (s, a); the face detection network D outputs a next time state s'; the data pairs [ s, a, R, s' ] are buffered in an empirical playback pool ReplayBuffer.
S6, constructing a target Q network QtargetAnd real-time Q network Qreal,QtargetAn agent for reference, for outputting a desire for a cumulative prize value; qrealAn agent for real-time training; and uses the empirical playback pool ReplayBuffer in step S5 to implement the real-time Q network QrealTraining to obtain a network weight theta;
referring to fig. 4, the flow of the target Q network algorithm specifically includes:
s601, resetting the environment Env to obtain an initial state S0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, S in the first piece of data in the ReplayBuffer is S obtained in step S6010
S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;
s604, calculating a loss value, and adding a regular term part to constrain the model. Updating real-time Q network Q by batch random gradient descent methodrealThe weight Θ in (1); the formula for calculating the loss value L (Θ) is as follows:
Figure BDA0003125359170000111
Figure BDA0003125359170000112
wherein R (s, a) is the reward value output by the environment according to the state s and the action a taken by the agent; gamma is an attenuation factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network QrealThe weight of (c).
S605 and agent QrealInteracts with the ambient Env and will generate new s, a, R (s, a), s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S604 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequencytargetIf so, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent QrealThe weights Θ of (a) are copied to the target Q network QtargetPerforming the following steps;
and S609, repeating the steps S603 to S608 until the number of times reaches 200.
And S7, combining the intelligent RLQAgent with the face detection network D to realize the quality grading of the face F when the face F is detected.
S701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
In another embodiment of the present invention, a face detection quality scoring system is provided, which can be used to implement the above-mentioned face detection quality scoring method, and specifically, the face detection quality scoring system includes a data module, a training module, an agent module, a reward module, an experience module, a weight module, and a scoring module.
The data module acquires a face image and corresponding annotation data thereof, and constructs a paired data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;
weight module, construct the target Q network QtargerAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network QrealTraining to obtain real-time Q network QrealNetwork weight Θ;
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the face detection quality scoring method, and comprises the following steps:
acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment generator Env with a human face detection network D, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network QtargetAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBufferrealTraining to obtain real-time Q network QrealNetwork weight Θ; and initializing the intelligent agent RLQAgent by using the network weight theta, and combining the intelligent agent RLQAgent with the face detection network D to realize the quality scoring of the face F when the face F is detected.
In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.
One or more instructions stored in a computer-readable storage medium can be loaded and executed by a processor to realize the corresponding steps of the face detection quality scoring method in the embodiment; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:
acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment generator Env with a human face detection network D, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network QtargetAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBufferrealTraining to obtain real-time Q network QrealNetwork weight Θ; initializing agent RLQAG using network weights Θent is combined with the face detection network D, and the quality scoring of the face F is completed while the face F is detected.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The effect of the present invention can be further illustrated by the following simulation results
1. Simulation conditions
The hardware conditions of the simulation of the invention are as follows: the intelligent sensing and image understanding laboratory graphic workstation carries four GPUs with 11G video memories; the data set used by the simulation of the invention is a CFP face image set and a CAS-PEAL face data set. The CFP data set contains about 500 IDs, about 7000 pictures. The CAS-PEAL dataset contains about 1040 IDs, about 99,450 pictures. And classifying the data into two types of good quality and poor quality according to the postures of the human faces in the data set. 80% of the data were used for training and 20% for testing.
2. Emulated content
Using the above data set, we compared the proposed method with the scoring method using only deep learning, and the accuracy results on the test set are shown in table 1.
TABLE 1
Figure BDA0003125359170000161
3. Analysis of simulation results
Referring to FIG. 5, histograms of the results of the RLQAgent model and the faceQNet model proposed in the present invention scoring about 100000 faces in the CFP test set and CAS-PEAL data set are shown. It can be seen that, using the method proposed in this chapter, the identification scores are concentrated in two intervals, the number of points in the [0,0.4] interval is substantially equal to the number of points between [0.65, 1.0], which is consistent with the distribution of the test data. While most of the scores for FaceQNet networks fall between the intervals 0.2, 0.6. Table 1 is the result of the classification accuracy of the above methods on the test set, and it can be seen that the method provided by the present invention achieves better results.
In summary, the face detection quality scoring method based on the depth reinforcement learning and self-adjustment reward and punishment mechanism can complete scoring on the face quality when the face is detected. Firstly, constructing a face detection network and pre-training so that the model can accurately position the face; meanwhile, a reward function which can lead the model to be quickly converged is provided, and the reward function and the face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A face detection quality scoring method is characterized by comprising the following steps:
s1, acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in the form of face-face annotation;
s2, constructing a face detection network D, inputting the pair data set constructed in the step S1 into the face detection network D in batches for training, wherein the size of each batch is B;
s3, constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state S;
s4, constructing a reward function R (S, a) of the reward and punishment strength which is automatically adjusted in the training process; combining the generated data with the human face detection network D in the step S2 to form an environment generator Env, generating a state S by the Env, and inputting the state S into the intelligent agent RLQAgent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
s5, constructing an experience playback pool ReplayBuffer, and caching the data [ S, a, R, S '] obtained in the step S4, wherein S' is the state generated by the environment Env at the next moment;
s6, constructing a target Q network QtargetAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an intelligent agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay buffer by using the experience playback pool constructed in the step S5realTraining to obtain real-time Q network QrealNetwork weight Θ;
s7, initializing the agent RLQAgent by using the network weight theta obtained in the step S6, and combining the agent RLQAgent with the face detection network D in the step S2 to realize the quality scoring of the face F when the face F is detected.
2. The method according to claim 1, wherein in step S2, constructing the face detection network D specifically includes:
s201, constructing a backbone network Resnet50, and generating 3 features T with different scales1,T2,T3
S202, constructing a first up-sampling layer U1A second upper sampling layer U2A third upper sampling layer U3Obtaining an intermediate feature T by using the feature pyramid network FPN1,T2,T3
S203, constructing a context information module SSH comprising a first 3 × 3 convolutional layer, a first 5 × 5 convolutional layer, a second 5 × 5 convolutional layer, a first 7 × 7 convolutional layer and a second 7 × 7 convolutional layer to obtain a final feature F1,F2,F3
S204, constructing a face frame head BoxHead, a key point head LandMarkHead and a classifier Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.
3. The method according to claim 1, wherein in step S3, the step of constructing the agent RLQAgent is specifically:
s301, constructing an intelligent body network comprising a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
and S302, outputting the action classification a and the expected reward value Q through the intelligent agent network.
4. The method according to claim 1, wherein in step S4, the reward function R (S, a) is specifically:
Figure FDA0003125359160000021
wherein, Epochs is the total training algebra, and Epochs is the current algebra.
5. The method according to claim 1, wherein in step S5, the empirical playback pool ReplayBuffer is a double ended queue with a fixed capacity of 512.
6. The method according to claim 1, wherein step S6 is specifically:
s601, resetting the environment Env to obtain an initial state S0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S6010
S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;
s604, training the real-time Q network Q by using the 64 samples obtained in the step S603realCalculating a loss value, adding a regular term part to constrain the model, and updating the real-time Q network Q by a batch random gradient descent methodrealNetwork weight Θ in (1);
s605, Q network Q of real-timerealInteract with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S605 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequencytargetIf the updating is needed, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent QrealThe weights Θ of (a) are copied to the target Q network QtargetPerforming the following steps;
s609, repeating the step S603 to the step S608 until the training is finished, and outputting a real-time Q network QrealThe weight Θ in (1).
7. The method according to claim 6, wherein in step S604, the loss value L (Θ) is calculated as follows:
Figure FDA0003125359160000031
wherein the content of the first and second substances,
Figure FDA0003125359160000032
for the accumulated reward expectation, L (Θ) is the loss value, γ is the decay factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network QrealThe weight of (c).
8. The method of claim 6, wherein the number of steps S609 is 200.
9. The method according to claim 1, wherein step S7 is specifically:
s701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
10. A face detection quality scoring system, comprising:
the data module is used for acquiring a face image and corresponding annotation data thereof and constructing a paired data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;
weight module, construct the target Q network QtargetAnd real-time Q network QrealNetwork Q of the target QtargetA reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network QrealAs an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network QrealTraining to obtain real-time Q network QrealNetwork weight Θ;
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
CN202110688239.2A 2021-06-21 2021-06-21 Face detection quality scoring method and system Active CN113420806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110688239.2A CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110688239.2A CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Publications (2)

Publication Number Publication Date
CN113420806A true CN113420806A (en) 2021-09-21
CN113420806B CN113420806B (en) 2023-02-03

Family

ID=77789635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110688239.2A Active CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Country Status (1)

Country Link
CN (1) CN113420806B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399698A (en) * 2021-11-30 2022-04-26 西安交通大学 Hand washing quality scoring method and system based on smart watch

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
CN108446619A (en) * 2018-03-12 2018-08-24 清华大学 Face critical point detection method and device based on deeply study
US20190236455A1 (en) * 2018-01-31 2019-08-01 Royal Bank Of Canada Pre-training neural networks with human demonstrations for deep reinforcement learning
CN110866471A (en) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 Face image quality evaluation method and device, computer readable medium and communication terminal
CN112749686A (en) * 2021-01-29 2021-05-04 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer equipment and storage medium
CN112800893A (en) * 2021-01-18 2021-05-14 南京航空航天大学 Human face attribute editing method based on reinforcement learning
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
US20190236455A1 (en) * 2018-01-31 2019-08-01 Royal Bank Of Canada Pre-training neural networks with human demonstrations for deep reinforcement learning
CN108446619A (en) * 2018-03-12 2018-08-24 清华大学 Face critical point detection method and device based on deeply study
CN110866471A (en) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 Face image quality evaluation method and device, computer readable medium and communication terminal
CN112800893A (en) * 2021-01-18 2021-05-14 南京航空航天大学 Human face attribute editing method based on reinforcement learning
CN112749686A (en) * 2021-01-29 2021-05-04 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer equipment and storage medium
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HADO VAN HASSELT ET AL: "Deep Reinforcement Learning with Double Q-Learning", 《THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
JIANKANG DENG ET AL: "RetinaFace: Single-shot Multi-level Face Localisation in the Wild", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
RUISHAN LIU ET AL: "The Effects of Memory Replay in Reinforcement Learning", 《2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON)》 *
王亚等: "基于CNN的监控视频中人脸图像质量评估", 《计算机***应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399698A (en) * 2021-11-30 2022-04-26 西安交通大学 Hand washing quality scoring method and system based on smart watch
CN114399698B (en) * 2021-11-30 2024-04-02 西安交通大学 Hand washing quality scoring method and system based on intelligent watch

Also Published As

Publication number Publication date
CN113420806B (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN111259982B (en) Attention mechanism-based premature infant retina image classification method and device
CN111179229B (en) Industrial CT defect detection method based on deep learning
CN110458165B (en) Natural scene text detection method introducing attention mechanism
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN112396002A (en) Lightweight remote sensing target detection method based on SE-YOLOv3
CN103699904B (en) The image computer auxiliary judgment method of multisequencing nuclear magnetic resonance image
CN111639744A (en) Student model training method and device and electronic equipment
CN112070729B (en) Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112330684B (en) Object segmentation method and device, computer equipment and storage medium
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN110738160A (en) human face quality evaluation method combining with human face detection
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN112507904B (en) Real-time classroom human body posture detection method based on multi-scale features
CN106296734A (en) Based on extreme learning machine and the target tracking algorism of boosting Multiple Kernel Learning
CN113239869A (en) Two-stage behavior identification method and system based on key frame sequence and behavior information
CN113420806B (en) Face detection quality scoring method and system
CN111694954B (en) Image classification method and device and electronic equipment
CN113538233A (en) Super-resolution model compression and acceleration method based on self-distillation contrast learning
CN107066980A (en) A kind of anamorphose detection method and device
CN113239866B (en) Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
CN114155551A (en) Improved pedestrian detection method and device based on YOLOv3 under complex environment
CN111429414B (en) Artificial intelligence-based focus image sample determination method and related device
CN112883930A (en) Real-time true and false motion judgment method based on full-connection network
CN117315223A (en) Target detection method based on transducer architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant