CN113420806B - Face detection quality scoring method and system - Google Patents

Face detection quality scoring method and system Download PDF

Info

Publication number
CN113420806B
CN113420806B CN202110688239.2A CN202110688239A CN113420806B CN 113420806 B CN113420806 B CN 113420806B CN 202110688239 A CN202110688239 A CN 202110688239A CN 113420806 B CN113420806 B CN 113420806B
Authority
CN
China
Prior art keywords
network
real
face
training
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110688239.2A
Other languages
Chinese (zh)
Other versions
CN113420806A (en
Inventor
刘芳
任保家
黄欣研
李玲玲
刘洋
刘旭
郭雨薇
郝泽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110688239.2A priority Critical patent/CN113420806B/en
Publication of CN113420806A publication Critical patent/CN113420806A/en
Application granted granted Critical
Publication of CN113420806B publication Critical patent/CN113420806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection quality scoring method and a face detection quality scoring system, wherein a face detection network is constructed and pre-trained, so that a model can accurately position the face; meanwhile, a reward function capable of automatically adjusting reward punishment in the training process is provided, and the reward function and the face detection network form an environment generator; and forming an intelligent body by using a shallow convolutional neural network to score the face quality. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the deep reinforcement learning idea and the self-adjustment reward and punishment mechanism, can efficiently select the human face with better quality from the video data, and improves the performance of a human face recognition system.

Description

Face detection quality scoring method and system
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a face detection quality scoring method and system.
Background
In recent years, with the rapid development of deep learning technology, face detection technology has advanced greatly. This benefits from continuously updated advanced neural network architectures and the constant efforts of researchers on face detection theory. The progress of the face detection technology based on deep learning also promotes the success of related application products, and face detection has achieved good results in the fields of campus safety, life services and the like by means of the strong feature extraction capability of a deep neural network and the real-time performance of a lightweight neural network.
However, there are still certain problems for the whole face recognition system. For the problem that the human face cannot be detected, the current human face detection algorithm can well process the problem; however, there is some worry about whether the quality of the detected face can meet the recognition standard. In a real video surveillance scene, the state of a person appearing in a video picture is random. This randomness includes two aspects. Firstly, the change of the external environment is random, and the quality of the video is influenced by the changing time of day and night in spite of the uncertain weather condition; second, the facial expressions and gestures of a person as they appear on a screen are also uncertain. These factors all play a crucial role in the final recognition result.
Disclosure of Invention
The invention aims to solve the technical problem that the defects in the prior art are overcome, and provides a face detection quality scoring method and a face detection quality scoring system.
The invention adopts the following technical scheme:
a face detection quality scoring method comprises the following steps:
s1, acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation;
s2, constructing a face detection network D, and inputting the paired data sets constructed in the step S1 into the face detection network D in batches for training, wherein the size of each batch is B;
s3, constructing an intelligent agent RLQAgent, wherein the input of the intelligent agent RLQAgent is a state S;
s4, automatically adjusting reward functions R (S, a) of reward and punishment strength in the training process; combining the environment generator Env with the face detection network D in the step S2 to generate a state S, and inputting the state S into the intelligent agent RLQAccent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
s5, constructing an experience playback pool ReplayBuffer, caching the data [ S, a, R, S '] obtained in the step S4, wherein S' is the state generated by the environment Env at the next moment;
s6, constructing a target Q network Q target And real-time Q network Q real Network Q of the target Q target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real And (3) as an intelligent agent RLQAccent for real-time training, using the experience playback pool ReplayBuffer constructed in the step S5 to carry out real-time Q network Q real Training to obtain real-time Q network Q real Network weight Θ;
and S7, initializing an agent RLQAgent by using the network weight theta obtained in the step S6, and combining the agent RLQAgent with the face detection network D in the step S2 to realize that the quality score of the face F is completed while the face F is detected.
Specifically, in step S2, constructing the face detection network D specifically includes:
s201, constructing a backbone network Resnet50, and generating 3 different scales of characteristics T 1 ,T 2 ,T 3
S202, constructing a first up-sampling layer U 1 A second upper sampling layer U 2 A third upper sampling layer U 3 Obtaining an intermediate feature T by the feature pyramid network FPN 1 ,T 2 ,T 3
S203, constructing a context information module SSH comprising the first 3 x 3 convolution layer, the first 5 x 5 convolution layer, the second 5 x 5 convolution layer, the first 7 x 7 convolution layer and the second 7 x 7 convolution layer to obtain a final characteristic F 1 ,F 2 ,F 2
S204, constructing a face frame head BoxHead, a key point head LandMarkHead and a classifier Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.
Specifically, in step S3, the step of constructing the agent RLQAgent specifically includes:
s301, constructing an intelligent body network comprising a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
and S302, outputting the action classification a and the expected reward value Q through the intelligent agent network.
Specifically, in step S4, the reward function R (S, a) is specifically:
Figure BDA0003125359170000031
wherein, epochs is the total training algebra, and Epochs is the current algebra.
Specifically, in step S5, the replay pool ReplayBuffer is a double-ended queue, and the capacity is a fixed value 512.
Specifically, step S6 specifically includes:
s601, resetting the environment Env to obtain an initial state S 0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S601 0
S603, 64 samples are obtained from the ReplayBuffer of the experience playback pool;
s604, training the real-time Q network Q by using the 64 samples obtained in the step S603 real Calculating loss value, adding regular term part to constrain the model, and updating real-time Q network Q by batch random gradient descent method real Network weight Θ in (1);
s605, Q network Q of real-time real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S605 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequency target If the updating is needed, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent Q real The weights Θ of (a) are copied to the target Q network Q target The preparation method comprises the following steps of (1) performing;
s609, repeating the steps S603 to S608 until the training is finished, and outputting a real-time Q network Q real The weight Θ in (1).
Further, in step S604, the loss value L (Θ) is calculated as follows:
Figure BDA0003125359170000041
wherein the content of the first and second substances,
Figure BDA0003125359170000042
for the accumulated reward expectation, L (Θ) is the loss value, γ is the decay factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q real The weight of (c).
Further, the number of times of step S609 is 200.
Specifically, step S7 specifically includes:
s701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
Another technical solution of the present invention is a face detection quality scoring system, including:
the data module is used for acquiring a face image and corresponding annotation data thereof and constructing a paired data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, wherein s' is the state generated by the environment Env at the next moment;
weight module, construct the target Q network Q targer And real-time Q network Q real Network Q of the target Q target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q real Training to obtain real-time Q network Q real Network weight Θ;
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention discloses a face detection quality scoring method based on depth reinforcement learning and self-adjustment reward and punishment mechanisms. Firstly, constructing a face detection network and pre-training the face detection network so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the ideas of deep reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting the key human face from the video data.
Further, the face detection network D is pre-trained using the data set, so that the specific position P of the face can be accurately detected from the video or image.
Furthermore, an intelligent RLQAgent is constructed, and the evaluation on the face quality can be realized by combining a shallow convolutional neural network with a face detection network D under the condition of not increasing the expenditure after training by adopting a reinforcement learning method.
Further, a progressively converging reward function R (s, a) is constructed, which makes it easier to make a wrong decision at an early stage of the agent training because the model has a greater randomness in making decisions. Therefore, at this stage, the penalty is increased. Along with the training, the decision-making capability of the model is continuously enhanced; therefore, the punishment strength is gradually reduced in the later period of training and is recovered to the same level as the reward.
Further, an empirical playback pool ReplayBuffer is constructed for caching [ s, a, R, s' ] data. The training time of the model can be greatly shortened by adopting the strategy of empirical replay.
Furthermore, the target Q network algorithm is used for training the RLQAccent of the intelligent body, and the intelligent body with good decision-making capability can be quickly trained to carry out quality scoring on the face by combining an experience playback strategy.
In summary, the face detection network is firstly constructed and pre-trained, so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The invention realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining with the ideas of depth reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting a key human face from video data.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a block diagram of a face detection network D of the present invention;
FIG. 2 is a network architecture diagram of an agent RLQAgent;
FIG. 3 is a diagram of the interaction of an agent RLQAgent with the environment Env;
FIG. 4 is a schematic diagram of a target Q network algorithm;
FIG. 5 is a comparison of the results of scoring, wherein (a) is the method of the present invention and (b) is the faceQNet method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.
The invention provides a face detection quality scoring method, which is based on a depth reinforcement learning and self-adjustment reward and punishment mechanism, and comprises the steps of firstly constructing a face detection network and pre-training, so that a model can accurately position the face; simultaneously, providing a gradually converging reward function, and forming an environment generator with a face detection network; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the ideas of deep reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting the key human face from the video data.
The invention relates to a face detection quality scoring method, which comprises the following steps:
s1, obtaining a face image
Figure BDA0003125359170000081
And corresponding label data
Figure BDA0003125359170000082
X is a face image, K is the number of face images, wherein X belongs to R NxN And R represents a real number domain. I (X) is in an element of {0,1}, and indicates whether the element is a human face or not; p (X) epsilon R 8x1 Indicating the position of the face; l (X) ∈ R 10x Representing the positions of key points of the human face; construction of paired datasets in the form of "face-face labeling
Figure BDA0003125359170000091
S2, constructing a face detection network D, and combining the paired data sets in the step S1
Figure BDA0003125359170000092
Inputting the data into a face detection network D in batches for training, wherein the size of each batch is B;
referring to fig. 1, the face detection network D specifically includes:
s201, constructing a backbone network Resnet50 for generating 3 different scales of characteristics T 1 ,T 2 ,T 3
S202, constructing a feature pyramid network FPN which is composed of a first up-sampling layer U 1 A second upper sampling layer U 2 A third upper sampling layer U 3 Composition for obtaining intermediate characteristics T 1 ,T 2 ,T 3
S203, a structural context information module SSH composed of a first 3 × 3 convolution layer, a first 5 × 5 convolution layer, a second 5 × 5 convolution layer, a first 7 × 7 convolution layer and a second 7 × 7 convolution layer for obtaining a final feature F 1 ,F 2 ,F 3
S204, constructing a face frame head part BoxHead, a key point head part LandMarkHead and a classifier Classification; and the method is used for generating final face positions, face key points and the probability of being a face or not.
S3, constructing an intelligent agent RLQAgent, wherein the input of the intelligent agent RLQAgent is a state S, namely the face images in different postures; the output in the training stage is an action a, namely judging the quality of the human face to be good or bad; the face score q is output in the reasoning stage, and q belongs to [0,1];
referring to fig. 2, the method for constructing the agent RLQAgent specifically includes:
s301, constructing an intelligent network, wherein the intelligent network consists of a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
s302, outputting the action classification a and the expected reward value Q.
And S4, constructing the reward function R (S, a). And combined with the face detection network D into an environment generator Env. Env generates a state S, and sends the state S to the agent RLQAgent in the step S3 to obtain a decision action a; then obtaining a reward value R according to the state s and the action a; the formula of the constructed reward function R (s, a) is specifically:
Figure BDA0003125359170000101
wherein a is the action generated by the agent according to the state, epochs is the total training algebra, and epoch is the current algebra.
S5, constructing an experience playback pool ReplayBuffer for caching [ S, a, R, S' ] data; a is the action executed by the agent RLQAgent according to the state s, R is the reward value given by the environment, and s' is the state generated by the environment Env at the next moment;
specifically, the constructed experience playback pool is a double-ended queue, the capacity of the experience playback pool is a fixed value 512, and the experience playback pool is used for storing historical data [ s, a, R, s' ] of the intelligent agent for decision making.
Referring to fig. 3, the specific process of interaction between the agent and the environment and storing the experience data is as follows: the face detection network D outputs faces with different qualities, namely a state s, and the intelligent RLQAgent obtains an action a according to the state s; calculating a specific reward value R using a reward function R (s, a); the face detection network D outputs a next time state s'; the data pairs [ s, a, R, s' ] are buffered in an empirical playback pool ReplayBuffer.
S6, constructing a target Q network Q target And real-time Q network Q real ,Q target An agent for reference, for outputting a desire for a cumulative prize value; q real An agent for real-time training; and using the experience playback pool ReplayBuffer in the step S5 to carry out Q network Q real-time real Training to obtain a network weight theta;
referring to fig. 4, the flow of the target Q network algorithm specifically includes:
s601, resetting the environment Env to obtain an initial state S 0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, S in the first piece of data in the ReplayBuffer is S obtained in step S601 0
S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;
s604, calculating a loss value, and adding a regular term part to constrain the model. Updating real-time Q network Q by batch random gradient descent method real The weight Θ in (1); the formula for calculating the loss value L (Θ) is as follows:
Figure BDA0003125359170000111
Figure BDA0003125359170000112
wherein R (s, a) is the reward value output by the environment according to the state s and the action a taken by the agent; gamma is an attenuation factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q real The weight of (c).
S605 and agent Q real Interact with the ambient Env and will generate a new s, a, R (s, a), s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S604 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequency target If so, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent Q real Copy the weights Θ to the target Q network Q target Performing the following steps;
and S609, repeating the steps S603 to S608 until the number of times reaches 200.
And S7, combining the intelligent RLQAgent with the face detection network D to realize the quality grading of the F while detecting the face F.
S701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
In another embodiment of the present invention, a face detection quality scoring system is provided, which can be used to implement the above-mentioned face detection quality scoring method, and specifically, the face detection quality scoring system includes a data module, a training module, an agent module, a reward module, an experience module, a weight module, and a scoring module.
The data module acquires a face image and corresponding annotation data thereof, and constructs a pair data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;
a weight module for constructing a target Q network Q targer And real-time Q network Q real Network Q of the target Q target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q real Training to obtain real-time Q network Q real Network weight Θ;
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the face detection quality scoring method, and comprises the following steps:
acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment generator Env with a human face detection network D, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network Q target And real-time Q network Q real Network Q of targets target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real As an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBuffer real Training to obtain real-time Q network Q real The network weight Θ of; and initializing the intelligent agent RLQAgent by using the network weight theta, and combining the intelligent agent RLQAgent with the face detection network D to realize the quality scoring of the face F when the face F is detected.
In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer readable storage medium may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
One or more instructions stored in a computer-readable storage medium can be loaded and executed by a processor to realize the corresponding steps of the face detection quality scoring method in the embodiment; one or more instructions in the computer readable storage medium are loaded by the processor and perform the steps of:
acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment with a face detection network D to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching is carried out, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network Q target And real-time Q network Q real Network Q of the target Q target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real As an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBuffer real Training to obtain real-time Q network Q real Network weight Θ; initial using network weight ΘAnd the intelligent agent RLQAgent is combined with the face detection network D to realize the quality scoring of the face F when the face F is detected.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The effects of the present invention can be further illustrated by the following simulation results
1. Simulation conditions
The hardware conditions of the simulation of the invention are as follows: the intelligent sensing and image understanding laboratory graphic workstation carries four GPUs with 11G video memories; the data set used by the simulation of the invention is a CFP face image set and a CAS-PEAL face data set. The CFP data set contains about 500 IDs, about 7000 pictures. The CAS-PEAL dataset contained about 1040 IDs, about 99,450 pictures. And classifying the data into two types of good quality and poor quality according to the postures of the human faces in the data set. 80% of the data were used for training and 20% for testing.
2. Emulated content
Using the above data set, we compared the proposed method with the scoring method using only deep learning, and the accuracy results on the test set are shown in table 1.
TABLE 1
Figure BDA0003125359170000161
3. Analysis of simulation results
Referring to FIG. 5, histograms of the results of the RLQAgent model and the faceQNet model proposed in the present invention scoring about 100000 faces in the CFP test set and CAS-PEAL data set are shown. It can be seen that, using the method proposed in this section, the identification scores are concentrated in two intervals, the number of points in the [0,0.4] interval is substantially equal to the number of points between [0.65,1.0], which is consistent with the distribution of test data. While most of the scores for the faceQNet network are scattered between the intervals [0.2,0.6 ]. Table 1 is the result of the classification accuracy of the above methods on the test set, and it can be seen that the method provided by the present invention achieves better results.
In summary, the face detection quality scoring method based on the depth reinforcement learning and self-adjustment reward and punishment mechanism can complete scoring on the face quality when the face is detected. Firstly, constructing a face detection network and pre-training the face detection network so that the model can accurately position the face; meanwhile, a reward function which can lead the model to be quickly converged is provided, and the reward function and the face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention should not be limited thereby, and any modification made on the basis of the technical idea proposed by the present invention falls within the protection scope of the claims of the present invention.

Claims (7)

1. A face detection quality scoring method is characterized by comprising the following steps:
s1, acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation;
s2, constructing a face detection network D, and inputting the paired data sets constructed in the step S1 into the face detection network D in batches for training, wherein the size of each batch is B;
s3, constructing an intelligent agent RLQAccent, wherein the input of the intelligent agent RLQAccent is a state S;
s4, automatically adjusting a reward function R (S, a) of reward and punishment strength in the construction training process; combining the environment generator Env with the face detection network D in the step S2 to generate a state S, and inputting the state S into the intelligent agent RLQAccent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
s5, constructing an experience playback pool ReplayBuffer, caching the data [ S, a, R, S '] obtained in the step S4, wherein S' is the state generated by the environment Env at the next moment;
s6, constructing a target Q network Q target And real-time Q network Q real Network Q of targets target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real And (3) as an intelligent agent RLQAccent for real-time training, using the experience playback pool ReplayBuffer constructed in the step S5 to carry out real-time Q network Q real Training to obtain real-time Q network Q real The network weight Θ specifically is:
s601, resetting the environment Env to obtain an initial state S 0
S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S601 0
S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;
s604, training the real-time Q network Q by using the 64 samples obtained in the step S603 real Calculating loss value, adding regular term part to constrain the model, and updating real-time Q network Q by batch random gradient descent method real The network weight Θ, loss value L (Θ) in (a) is calculated as follows:
Figure FDA0003983983600000011
wherein the content of the first and second substances,
Figure FDA0003983983600000021
for cumulative prize expectation, L (Θ) is the loss value; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q real The weight of (c);
s605, real-time Q network Q real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;
s606, repeating the step S605 until reaching 512 times;
s607, judging whether the target Q network Q needs to be updated according to the updating frequency target If the updating is needed, go to step S608, otherwise, go to step S609;
s608, enabling the intelligent agent Q real The weights Θ of (a) are copied to the target Q network Q target Performing the following steps;
s609, repeating the steps S603 to S608 until 200 times, and outputting the real-time Q network Q real The weight Θ in (1);
and S7, initializing an intelligent agent RLQAgent by using the network weight theta obtained in the step S6, and combining the intelligent agent RLQAgent with the face detection network D in the step S2 to realize the quality scoring of the face F while detecting the face F.
2. The method according to claim 1, wherein in step S2, constructing the face detection network D specifically includes:
s201, constructing a backbone network Resnet50, and generating 3 different scales of characteristics T 1 ,T 2 ,T 3
S202, constructing a first up-sampling layer U 1 A second upper sampling layer U 2 A third upper sampling layer U 3 Obtaining an intermediate feature T by the feature pyramid network FPN 1 ,T 2 ,T 3
S203, constructing a context information module SSH comprising the first 3 x 3 convolution layer, the first 5 x 5 convolution layer, the second 5 x 5 convolution layer, the first 7 x 7 convolution layer and the second 7 x 7 convolution layer to obtain a final characteristic F 1 ,F 2 ,F 3
S204, constructing a BoxHead, a LandmarkHead and a Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.
3. The method according to claim 1, characterized in that in step S3, the step of constructing the agent RLQAgent is embodied as:
s301, constructing an intelligent body network comprising a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;
and S302, outputting the action classification a and the expected reward value Q through the intelligent agent network.
4. The method according to claim 1, wherein in step S4, the reward function R (S, a) is specifically:
Figure FDA0003983983600000031
wherein, epochs is the total training algebra, and Epochs is the current algebra.
5. The method according to claim 1, wherein in step S5, the empirical playback pool ReplayBuffer is a double ended queue with a fixed capacity of 512.
6. The method according to claim 1, wherein step S7 is specifically:
s701, inputting the image I into a face detection network D to obtain a specific position P of a face;
s702, obtaining a face F according to the specific position P of the face;
s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;
s704, outputting the face and the corresponding score [ F, score ].
7. A face detection quality scoring system, comprising:
the data module is used for acquiring a face image and corresponding annotation data thereof and constructing a paired data set in the form of face-face annotation;
the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;
the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;
the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;
the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;
weight module, construct the target Q network Q tatget And real-time Q network Q real Network Q of the target Q target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q real Training to obtain real-time Q network Q real The network weight Θ specifically is:
reset the Environment Env, resulting in an initial state s 0 (ii) a Randomly initializing an empirical playback pool ReplayBuffer to obtain [ s, a, R, s']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S601 0 (ii) a Obtaining 64 samples from an empirical playback pool ReplayBuffer; training real-time Q-network Q using 64 samples real Calculating a loss value, adding a regular term part to constrain the model, and updating the real-time Q network Q by a batch random gradient descent method real Network weight in (Θ), loss value L (Θ)) The calculation is as follows:
Figure FDA0003983983600000041
wherein the content of the first and second substances,
Figure FDA0003983983600000042
for cumulative reward expected values, L (Θ) is a loss value; lambda is a regular term coefficient, theta is a real-time Q network Q real The weight of (c);
real-time Q network Q real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer; repeating until reaching 512 times; judging whether the target Q network Q needs to be updated according to the updating frequency target If updating is needed, returning, otherwise, enabling the intelligent agent Q real The weights Θ of (a) are copied to the target Q network Q target Performing the following steps; repeating for 200 times and outputting real-time Q network Q real The weight Θ in (1);
and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.
CN202110688239.2A 2021-06-21 2021-06-21 Face detection quality scoring method and system Active CN113420806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110688239.2A CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110688239.2A CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Publications (2)

Publication Number Publication Date
CN113420806A CN113420806A (en) 2021-09-21
CN113420806B true CN113420806B (en) 2023-02-03

Family

ID=77789635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110688239.2A Active CN113420806B (en) 2021-06-21 2021-06-21 Face detection quality scoring method and system

Country Status (1)

Country Link
CN (1) CN113420806B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399698B (en) * 2021-11-30 2024-04-02 西安交通大学 Hand washing quality scoring method and system based on intelligent watch

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
CN108446619A (en) * 2018-03-12 2018-08-24 清华大学 Face critical point detection method and device based on deeply study
CN110866471A (en) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 Face image quality evaluation method and device, computer readable medium and communication terminal
CN112749686A (en) * 2021-01-29 2021-05-04 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer equipment and storage medium
CN112800893A (en) * 2021-01-18 2021-05-14 南京航空航天大学 Human face attribute editing method based on reinforcement learning
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
CA3032182A1 (en) * 2018-01-31 2019-07-31 Royal Bank Of Canada Pre-training neural netwoks with human demonstrations for deep reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
CN108446619A (en) * 2018-03-12 2018-08-24 清华大学 Face critical point detection method and device based on deeply study
CN110866471A (en) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 Face image quality evaluation method and device, computer readable medium and communication terminal
CN112800893A (en) * 2021-01-18 2021-05-14 南京航空航天大学 Human face attribute editing method based on reinforcement learning
CN112749686A (en) * 2021-01-29 2021-05-04 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer equipment and storage medium
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Deep Reinforcement Learning with Double Q-Learning;Hado van Hasselt et al;《Thirtieth AAAI Conference on Artificial Intelligence》;20160302;第2094-2100页 *
RetinaFace: Single-shot Multi-level Face Localisation in the Wild;Jiankang Deng et al;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200805;第5202-5211页 *
The Effects of Memory Replay in Reinforcement Learning;Ruishan Liu et al;《2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)》;20190207;第1-15页 *
基于CNN的监控视频中人脸图像质量评估;王亚等;《计算机***应用》;20181114(第11期);第71-77页 *

Also Published As

Publication number Publication date
CN113420806A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN111259982B (en) Attention mechanism-based premature infant retina image classification method and device
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN110135267B (en) Large-scene SAR image fine target detection method
CN110458165B (en) Natural scene text detection method introducing attention mechanism
JP2022529557A (en) Medical image segmentation methods, medical image segmentation devices, electronic devices and computer programs
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN110189255A (en) Method for detecting human face based on hierarchical detection
CN112070729A (en) Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112330684B (en) Object segmentation method and device, computer equipment and storage medium
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN110738160A (en) human face quality evaluation method combining with human face detection
CN107491734A (en) Semi-supervised Classification of Polarimetric SAR Image method based on multi-core integration Yu space W ishart LapSVM
CN110930378A (en) Emphysema image processing method and system based on low data demand
US20230053911A1 (en) Detecting an object in an image using multiband and multidirectional filtering
CN106296734A (en) Based on extreme learning machine and the target tracking algorism of boosting Multiple Kernel Learning
CN113420806B (en) Face detection quality scoring method and system
CN111694954B (en) Image classification method and device and electronic equipment
CN112232411A (en) Optimization method of HarDNet-Lite on embedded platform
CN113239866B (en) Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN114155551A (en) Improved pedestrian detection method and device based on YOLOv3 under complex environment
CN111429414B (en) Artificial intelligence-based focus image sample determination method and related device
CN112883930A (en) Real-time true and false motion judgment method based on full-connection network
CN117315223A (en) Target detection method based on transducer architecture
CN116403127A (en) Unmanned aerial vehicle aerial image target detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant