CN113420806B

CN113420806B - Face detection quality scoring method and system

Info

Publication number: CN113420806B
Application number: CN202110688239.2A
Authority: CN
Inventors: 刘芳; 任保家; 黄欣研; 李玲玲; 刘洋; 刘旭; 郭雨薇; 郝泽华
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2023-02-03
Anticipated expiration: 2041-06-21
Also published as: CN113420806A

Abstract

The invention discloses a face detection quality scoring method and a face detection quality scoring system, wherein a face detection network is constructed and pre-trained, so that a model can accurately position the face; meanwhile, a reward function capable of automatically adjusting reward punishment in the training process is provided, and the reward function and the face detection network form an environment generator; and forming an intelligent body by using a shallow convolutional neural network to score the face quality. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the deep reinforcement learning idea and the self-adjustment reward and punishment mechanism, can efficiently select the human face with better quality from the video data, and improves the performance of a human face recognition system.

Description

Face detection quality scoring method and system

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a face detection quality scoring method and system.

Background

In recent years, with the rapid development of deep learning technology, face detection technology has advanced greatly. This benefits from continuously updated advanced neural network architectures and the constant efforts of researchers on face detection theory. The progress of the face detection technology based on deep learning also promotes the success of related application products, and face detection has achieved good results in the fields of campus safety, life services and the like by means of the strong feature extraction capability of a deep neural network and the real-time performance of a lightweight neural network.

However, there are still certain problems for the whole face recognition system. For the problem that the human face cannot be detected, the current human face detection algorithm can well process the problem; however, there is some worry about whether the quality of the detected face can meet the recognition standard. In a real video surveillance scene, the state of a person appearing in a video picture is random. This randomness includes two aspects. Firstly, the change of the external environment is random, and the quality of the video is influenced by the changing time of day and night in spite of the uncertain weather condition; second, the facial expressions and gestures of a person as they appear on a screen are also uncertain. These factors all play a crucial role in the final recognition result.

Disclosure of Invention

The invention aims to solve the technical problem that the defects in the prior art are overcome, and provides a face detection quality scoring method and a face detection quality scoring system.

The invention adopts the following technical scheme:

a face detection quality scoring method comprises the following steps:

s1, acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation;

s2, constructing a face detection network D, and inputting the paired data sets constructed in the step S1 into the face detection network D in batches for training, wherein the size of each batch is B;

s3, constructing an intelligent agent RLQAgent, wherein the input of the intelligent agent RLQAgent is a state S;

s4, automatically adjusting reward functions R (S, a) of reward and punishment strength in the training process; combining the environment generator Env with the face detection network D in the step S2 to generate a state S, and inputting the state S into the intelligent agent RLQAccent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;

s5, constructing an experience playback pool ReplayBuffer, caching the data [ S, a, R, S '] obtained in the step S4, wherein S' is the state generated by the environment Env at the next moment;

s6, constructing a target Q network Q _target And real-time Q network Q _real Network Q of the target Q _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real And (3) as an intelligent agent RLQAccent for real-time training, using the experience playback pool ReplayBuffer constructed in the step S5 to carry out real-time Q network Q _real Training to obtain real-time Q network Q _real Network weight Θ;

and S7, initializing an agent RLQAgent by using the network weight theta obtained in the step S6, and combining the agent RLQAgent with the face detection network D in the step S2 to realize that the quality score of the face F is completed while the face F is detected.

Specifically, in step S2, constructing the face detection network D specifically includes:

s201, constructing a backbone network Resnet50, and generating 3 different scales of characteristics T ₁ ，T ₂ ，T ₃ ；

S202, constructing a first up-sampling layer U ₁ A second upper sampling layer U ₂ A third upper sampling layer U ₃ Obtaining an intermediate feature T by the feature pyramid network FPN ₁ ，T ₂ ，T ₃ ；

S203, constructing a context information module SSH comprising the first 3 x 3 convolution layer, the first 5 x 5 convolution layer, the second 5 x 5 convolution layer, the first 7 x 7 convolution layer and the second 7 x 7 convolution layer to obtain a final characteristic F ₁ ，F ₂ ，F ₂ ；

S204, constructing a face frame head BoxHead, a key point head LandMarkHead and a classifier Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.

Specifically, in step S3, the step of constructing the agent RLQAgent specifically includes:

s301, constructing an intelligent body network comprising a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;

and S302, outputting the action classification a and the expected reward value Q through the intelligent agent network.

Specifically, in step S4, the reward function R (S, a) is specifically:

wherein, epochs is the total training algebra, and Epochs is the current algebra.

Specifically, in step S5, the replay pool ReplayBuffer is a double-ended queue, and the capacity is a fixed value 512.

Specifically, step S6 specifically includes:

s601, resetting the environment Env to obtain an initial state S ₀ ；

S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S601 ₀ ；

S603, 64 samples are obtained from the ReplayBuffer of the experience playback pool;

s604, training the real-time Q network Q by using the 64 samples obtained in the step S603 _real Calculating loss value, adding regular term part to constrain the model, and updating real-time Q network Q by batch random gradient descent method _real Network weight Θ in (1);

s605, Q network Q of real-time _real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;

s606, repeating the step S605 until reaching 512 times;

s607, judging whether the target Q network Q needs to be updated according to the updating frequency _target If the updating is needed, go to step S608, otherwise, go to step S609;

s608, enabling the intelligent agent Q _real The weights Θ of (a) are copied to the target Q network Q _target The preparation method comprises the following steps of (1) performing;

s609, repeating the steps S603 to S608 until the training is finished, and outputting a real-time Q network Q _real The weight Θ in (1).

Further, in step S604, the loss value L (Θ) is calculated as follows:

wherein the content of the first and second substances,

for the accumulated reward expectation, L (Θ) is the loss value, γ is the decay factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q _real The weight of (c).

Further, the number of times of step S609 is 200.

Specifically, step S7 specifically includes:

s701, inputting the image I into a face detection network D to obtain a specific position P of a face;

s702, obtaining a face F according to the specific position P of the face;

s703, inputting the face F into an RLQAgent intelligent body for network scoring to obtain a quality score;

s704, outputting the face and the corresponding score [ F, score ].

Another technical solution of the present invention is a face detection quality scoring system, including:

the data module is used for acquiring a face image and corresponding annotation data thereof and constructing a paired data set in the form of face-face annotation;

the training module is used for constructing a face detection network D, inputting the paired data sets constructed by the data module into the face detection network D in batches for training, wherein the size of each batch is B;

the intelligent agent module is used for constructing an intelligent agent RLQAccent, and the input of the intelligent agent RLQAccent is a state s;

the reward module constructs a reward function R (s, a) for automatically adjusting reward and punishment strength in the training process; combining the training data with a face detection network D of a training module to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAccent constructed by an intelligent agent module to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;

the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, wherein s' is the state generated by the environment Env at the next moment;

weight module, construct the target Q network Q _targer And real-time Q network Q _real Network Q of the target Q _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q _real Training to obtain real-time Q network Q _real Network weight Θ;

and the scoring module is used for initializing the intelligent agent RLQAgent by using the network weight theta obtained by the weight module, and combining the network weight theta with the face detection network D of the training module to realize the quality scoring of the face F while detecting the face F.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention discloses a face detection quality scoring method based on depth reinforcement learning and self-adjustment reward and punishment mechanisms. Firstly, constructing a face detection network and pre-training the face detection network so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the ideas of deep reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting the key human face from the video data.

Further, the face detection network D is pre-trained using the data set, so that the specific position P of the face can be accurately detected from the video or image.

Furthermore, an intelligent RLQAgent is constructed, and the evaluation on the face quality can be realized by combining a shallow convolutional neural network with a face detection network D under the condition of not increasing the expenditure after training by adopting a reinforcement learning method.

Further, a progressively converging reward function R (s, a) is constructed, which makes it easier to make a wrong decision at an early stage of the agent training because the model has a greater randomness in making decisions. Therefore, at this stage, the penalty is increased. Along with the training, the decision-making capability of the model is continuously enhanced; therefore, the punishment strength is gradually reduced in the later period of training and is recovered to the same level as the reward.

Further, an empirical playback pool ReplayBuffer is constructed for caching [ s, a, R, s' ] data. The training time of the model can be greatly shortened by adopting the strategy of empirical replay.

Furthermore, the target Q network algorithm is used for training the RLQAccent of the intelligent body, and the intelligent body with good decision-making capability can be quickly trained to carry out quality scoring on the face by combining an experience playback strategy.

In summary, the face detection network is firstly constructed and pre-trained, so that the model can accurately position the face; simultaneously, a gradually converging reward function is provided, and the reward function and a face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The invention realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining with the ideas of depth reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting a key human face from video data.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a block diagram of a face detection network D of the present invention;

FIG. 2 is a network architecture diagram of an agent RLQAgent;

FIG. 3 is a diagram of the interaction of an agent RLQAgent with the environment Env;

FIG. 4 is a schematic diagram of a target Q network algorithm;

FIG. 5 is a comparison of the results of scoring, wherein (a) is the method of the present invention and (b) is the faceQNet method.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides a face detection quality scoring method, which is based on a depth reinforcement learning and self-adjustment reward and punishment mechanism, and comprises the steps of firstly constructing a face detection network and pre-training, so that a model can accurately position the face; simultaneously, providing a gradually converging reward function, and forming an environment generator with a face detection network; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized. An experience playback strategy and a target Q network algorithm are adopted during the training of the intelligent agent, so that the training speed and the performance of the model can be effectively improved. The method realizes the quality scoring of the human face by utilizing the characteristic of larger difference between human faces with different qualities and combining the ideas of deep reinforcement learning and self-adjustment reward and punishment mechanisms, and can efficiently process the problem of selecting the key human face from the video data.

The invention relates to a face detection quality scoring method, which comprises the following steps:

s1, obtaining a face image

And corresponding label data

X is a face image, K is the number of face images, wherein X belongs to R ^NxN And R represents a real number domain. I (X) is in an element of {0,1}, and indicates whether the element is a human face or not; p (X) epsilon R ^8x1 Indicating the position of the face; l (X) ∈ R ^10x Representing the positions of key points of the human face; construction of paired datasets in the form of "face-face labeling

S2, constructing a face detection network D, and combining the paired data sets in the step S1

Inputting the data into a face detection network D in batches for training, wherein the size of each batch is B;

referring to fig. 1, the face detection network D specifically includes:

s201, constructing a backbone network Resnet50 for generating 3 different scales of characteristics T ₁ ，T ₂ ，T ₃ ；

S202, constructing a feature pyramid network FPN which is composed of a first up-sampling layer U ₁ A second upper sampling layer U ₂ A third upper sampling layer U ₃ Composition for obtaining intermediate characteristics T ₁ ，T ₂ ，T ₃ ；

S203, a structural context information module SSH composed of a first 3 × 3 convolution layer, a first 5 × 5 convolution layer, a second 5 × 5 convolution layer, a first 7 × 7 convolution layer and a second 7 × 7 convolution layer for obtaining a final feature F ₁ ，F ₂ ，F ₃ ；

S204, constructing a face frame head part BoxHead, a key point head part LandMarkHead and a classifier Classification; and the method is used for generating final face positions, face key points and the probability of being a face or not.

S3, constructing an intelligent agent RLQAgent, wherein the input of the intelligent agent RLQAgent is a state S, namely the face images in different postures; the output in the training stage is an action a, namely judging the quality of the human face to be good or bad; the face score q is output in the reasoning stage, and q belongs to [0,1];

referring to fig. 2, the method for constructing the agent RLQAgent specifically includes:

s301, constructing an intelligent network, wherein the intelligent network consists of a first convolution layer, a first maximum pooling layer, a first BatchNorm layer, a second convolution layer, a second maximum pooling layer, a second BatchNorm layer, a third convolution layer, a third maximum pooling layer, a third BatchNorm layer and a full connection layer;

s302, outputting the action classification a and the expected reward value Q.

And S4, constructing the reward function R (S, a). And combined with the face detection network D into an environment generator Env. Env generates a state S, and sends the state S to the agent RLQAgent in the step S3 to obtain a decision action a; then obtaining a reward value R according to the state s and the action a; the formula of the constructed reward function R (s, a) is specifically:

wherein a is the action generated by the agent according to the state, epochs is the total training algebra, and epoch is the current algebra.

S5, constructing an experience playback pool ReplayBuffer for caching [ S, a, R, S' ] data; a is the action executed by the agent RLQAgent according to the state s, R is the reward value given by the environment, and s' is the state generated by the environment Env at the next moment;

specifically, the constructed experience playback pool is a double-ended queue, the capacity of the experience playback pool is a fixed value 512, and the experience playback pool is used for storing historical data [ s, a, R, s' ] of the intelligent agent for decision making.

Referring to fig. 3, the specific process of interaction between the agent and the environment and storing the experience data is as follows: the face detection network D outputs faces with different qualities, namely a state s, and the intelligent RLQAgent obtains an action a according to the state s; calculating a specific reward value R using a reward function R (s, a); the face detection network D outputs a next time state s'; the data pairs [ s, a, R, s' ] are buffered in an empirical playback pool ReplayBuffer.

S6, constructing a target Q network Q _target And real-time Q network Q _real ，Q _target An agent for reference, for outputting a desire for a cumulative prize value; q _real An agent for real-time training; and using the experience playback pool ReplayBuffer in the step S5 to carry out Q network Q real-time _real Training to obtain a network weight theta;

referring to fig. 4, the flow of the target Q network algorithm specifically includes:

s601, resetting the environment Env to obtain an initial state S ₀ ；

S602, randomly initializing an empirical playback pool ReplayBuffer to obtain [ S, a, R, S']Data, S in the first piece of data in the ReplayBuffer is S obtained in step S601 ₀ ；

S603, obtaining 64 samples from the ReplayBuffer in the experience playback pool;

s604, calculating a loss value, and adding a regular term part to constrain the model. Updating real-time Q network Q by batch random gradient descent method _real The weight Θ in (1); the formula for calculating the loss value L (Θ) is as follows:

wherein R (s, a) is the reward value output by the environment according to the state s and the action a taken by the agent; gamma is an attenuation factor; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q _real The weight of (c).

S605 and agent Q _real Interact with the ambient Env and will generate a new s, a, R (s, a), s']The records are stored in an experience playback pool ReplayBuffer;

s606, repeating the step S604 until reaching 512 times;

s607, judging whether the target Q network Q needs to be updated according to the updating frequency _target If so, go to step S608, otherwise, go to step S609;

s608, enabling the intelligent agent Q _real Copy the weights Θ to the target Q network Q _target Performing the following steps;

and S609, repeating the steps S603 to S608 until the number of times reaches 200.

And S7, combining the intelligent RLQAgent with the face detection network D to realize the quality grading of the F while detecting the face F.

s702, obtaining a face F according to the specific position P of the face;

s704, outputting the face and the corresponding score [ F, score ].

In another embodiment of the present invention, a face detection quality scoring system is provided, which can be used to implement the above-mentioned face detection quality scoring method, and specifically, the face detection quality scoring system includes a data module, a training module, an agent module, a reward module, an experience module, a weight module, and a scoring module.

The data module acquires a face image and corresponding annotation data thereof, and constructs a pair data set in the form of face-face annotation;

the experience module is used for constructing an experience playback pool ReplayBuffer, caching data [ s, a, R, s '] obtained by the reward module, and s' is the state generated by the environment Env at the next moment;

a weight module for constructing a target Q network Q _targer And real-time Q network Q _real Network Q of the target Q _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q _real Training to obtain real-time Q network Q _real Network weight Θ;

In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the face detection quality scoring method, and comprises the following steps:

acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment generator Env with a human face detection network D, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network Q _target And real-time Q network Q _real Network Q of targets _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real As an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBuffer _real Training to obtain real-time Q network Q _real The network weight Θ of; and initializing the intelligent agent RLQAgent by using the network weight theta, and combining the intelligent agent RLQAgent with the face detection network D to realize the quality scoring of the face F when the face F is detected.

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer readable storage medium may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.

One or more instructions stored in a computer-readable storage medium can be loaded and executed by a processor to realize the corresponding steps of the face detection quality scoring method in the embodiment; one or more instructions in the computer readable storage medium are loaded by the processor and perform the steps of:

acquiring a face image and corresponding annotation data thereof, and constructing a paired data set in a form of face-face annotation; constructing a face detection network D, inputting paired data sets into the face detection network D in batches for training, wherein the size of each batch is B; constructing an agent RLQAccent, wherein the input of the agent RLQAccent is a state s; constructing a reward function R (s, a) for automatically adjusting reward and punishment force in the training process; combining the environment with a face detection network D to form an environment generator Env, generating a state s by the Env, and inputting the state s into an intelligent agent RLQAgent to obtain a decision action a; obtaining an award value R according to the state s and the decision action a; constructing an empirical playback pool ReplayBuffer for data [ s, a, R, s']Caching is carried out, wherein s' is the state generated by the environment Env at the next moment; constructing a target Q network Q _target And real-time Q network Q _real Network Q of the target Q _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real As an agent RLQAccent for real-time training, the real-time Q network Q is subjected to replay by using an experience replay pool ReplayBuffer _real Training to obtain real-time Q network Q _real Network weight Θ; initial using network weight ΘAnd the intelligent agent RLQAgent is combined with the face detection network D to realize the quality scoring of the face F when the face F is detected.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The effects of the present invention can be further illustrated by the following simulation results

1. Simulation conditions

The hardware conditions of the simulation of the invention are as follows: the intelligent sensing and image understanding laboratory graphic workstation carries four GPUs with 11G video memories; the data set used by the simulation of the invention is a CFP face image set and a CAS-PEAL face data set. The CFP data set contains about 500 IDs, about 7000 pictures. The CAS-PEAL dataset contained about 1040 IDs, about 99,450 pictures. And classifying the data into two types of good quality and poor quality according to the postures of the human faces in the data set. 80% of the data were used for training and 20% for testing.

2. Emulated content

Using the above data set, we compared the proposed method with the scoring method using only deep learning, and the accuracy results on the test set are shown in table 1.

TABLE 1

3. Analysis of simulation results

Referring to FIG. 5, histograms of the results of the RLQAgent model and the faceQNet model proposed in the present invention scoring about 100000 faces in the CFP test set and CAS-PEAL data set are shown. It can be seen that, using the method proposed in this section, the identification scores are concentrated in two intervals, the number of points in the [0,0.4] interval is substantially equal to the number of points between [0.65,1.0], which is consistent with the distribution of test data. While most of the scores for the faceQNet network are scattered between the intervals [0.2,0.6 ]. Table 1 is the result of the classification accuracy of the above methods on the test set, and it can be seen that the method provided by the present invention achieves better results.

In summary, the face detection quality scoring method based on the depth reinforcement learning and self-adjustment reward and punishment mechanism can complete scoring on the face quality when the face is detected. Firstly, constructing a face detection network and pre-training the face detection network so that the model can accurately position the face; meanwhile, a reward function which can lead the model to be quickly converged is provided, and the reward function and the face detection network form an environment generator; the shallow convolutional neural network is used for forming an intelligent body to score the face quality, and the added calculated amount can be ignored while the scoring function is realized.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention should not be limited thereby, and any modification made on the basis of the technical idea proposed by the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A face detection quality scoring method is characterized by comprising the following steps:

s3, constructing an intelligent agent RLQAccent, wherein the input of the intelligent agent RLQAccent is a state S;

s4, automatically adjusting a reward function R (S, a) of reward and punishment strength in the construction training process; combining the environment generator Env with the face detection network D in the step S2 to generate a state S, and inputting the state S into the intelligent agent RLQAccent constructed in the step S3 to obtain a decision action a; obtaining an award value R according to the state s and the decision action a;

s6, constructing a target Q network Q _target And real-time Q network Q _real Network Q of targets _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real And (3) as an intelligent agent RLQAccent for real-time training, using the experience playback pool ReplayBuffer constructed in the step S5 to carry out real-time Q network Q _real Training to obtain real-time Q network Q _real The network weight Θ specifically is:

s601, resetting the environment Env to obtain an initial state S ₀ ；

s604, training the real-time Q network Q by using the 64 samples obtained in the step S603 _real Calculating loss value, adding regular term part to constrain the model, and updating real-time Q network Q by batch random gradient descent method _real The network weight Θ, loss value L (Θ) in (a) is calculated as follows:

wherein the content of the first and second substances,

for cumulative prize expectation, L (Θ) is the loss value; lambda is the coefficient of the regularization term, and theta is the real-time Q network Q _real The weight of (c);

s605, real-time Q network Q _real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer;

s606, repeating the step S605 until reaching 512 times;

s608, enabling the intelligent agent Q _real The weights Θ of (a) are copied to the target Q network Q _target Performing the following steps;

s609, repeating the steps S603 to S608 until 200 times, and outputting the real-time Q network Q _real The weight Θ in (1);

and S7, initializing an intelligent agent RLQAgent by using the network weight theta obtained in the step S6, and combining the intelligent agent RLQAgent with the face detection network D in the step S2 to realize the quality scoring of the face F while detecting the face F.

2. The method according to claim 1, wherein in step S2, constructing the face detection network D specifically includes:

S203, constructing a context information module SSH comprising the first 3 x 3 convolution layer, the first 5 x 5 convolution layer, the second 5 x 5 convolution layer, the first 7 x 7 convolution layer and the second 7 x 7 convolution layer to obtain a final characteristic F ₁ ，F ₂ ，F ₃ ；

S204, constructing a BoxHead, a LandmarkHead and a Classification; and generating the final face position, the face key point and the probability of whether the face is the final face.

3. The method according to claim 1, characterized in that in step S3, the step of constructing the agent RLQAgent is embodied as:

4. The method according to claim 1, wherein in step S4, the reward function R (S, a) is specifically:

5. The method according to claim 1, wherein in step S5, the empirical playback pool ReplayBuffer is a double ended queue with a fixed capacity of 512.

6. The method according to claim 1, wherein step S7 is specifically:

s702, obtaining a face F according to the specific position P of the face;

s704, outputting the face and the corresponding score [ F, score ].

7. A face detection quality scoring system, comprising:

weight module, construct the target Q network Q _tatget And real-time Q network Q _real Network Q of the target Q _target A reference agent RLQAgent for outputting a desire for a cumulative prize value; real-time Q network Q _real As an intelligent agent RLQAccent for real-time training, an experience playback pool ReplayBuffer constructed by an experience module is used for the real-time Q network Q _real Training to obtain real-time Q network Q _real The network weight Θ specifically is:

reset the Environment Env, resulting in an initial state s ₀ (ii) a Randomly initializing an empirical playback pool ReplayBuffer to obtain [ s, a, R, s']Data, [ s, a, R, s']S in the data is the initial state S obtained in step S601 ₀ (ii) a Obtaining 64 samples from an empirical playback pool ReplayBuffer; training real-time Q-network Q using 64 samples _real Calculating a loss value, adding a regular term part to constrain the model, and updating the real-time Q network Q by a batch random gradient descent method _real Network weight in (Θ), loss value L (Θ)) The calculation is as follows:

wherein the content of the first and second substances,

for cumulative reward expected values, L (Θ) is a loss value; lambda is a regular term coefficient, theta is a real-time Q network Q _real The weight of (c);

real-time Q network Q _real Interact with the ambient Env and will generate a new s, a, R, s']The records are stored in an experience playback pool ReplayBuffer; repeating until reaching 512 times; judging whether the target Q network Q needs to be updated according to the updating frequency _target If updating is needed, returning, otherwise, enabling the intelligent agent Q _real The weights Θ of (a) are copied to the target Q network Q _target Performing the following steps; repeating for 200 times and outputting real-time Q network Q _real The weight Θ in (1);