CN114495228A

CN114495228A - Training method and device for face detector, equipment, medium and product

Info

Publication number: CN114495228A
Application number: CN202210097245.5A
Authority: CN
Inventors: 黄泽斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-13

Abstract

The present disclosure provides a training method, apparatus, device, medium, and product for a face detector, which relate to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition. The specific implementation scheme comprises the following steps: taking at least one sample image in the training sample set as input data of a face detector to obtain a face detection result associated with each sample image in the at least one sample image; determining a classification loss evaluation value associated with each sample image according to the face detection result; determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image; and adjusting the model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector.

Description

Training method and device for face detector, equipment, medium and product

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly to the field of deep learning and computer vision technology, applicable to human face recognition and other scenarios.

Background

In the training process for the face detector, the model parameters of the face detector may be adjusted according to the loss evaluation value back propagation associated with the sample image. However, in some scenes, there are phenomena that the number of loss evaluation values is too large and complicated, and the optimization efficiency and training effect of the face detector are not good.

Disclosure of Invention

The present disclosure provides a training method and apparatus, device, medium, and product for a face detector.

According to an aspect of the present disclosure, there is provided a training method of a face detector, including: taking at least one sample image in a training sample set as input data of a face detector to obtain a face detection result associated with each sample image in the at least one sample image; determining a classification loss evaluation value associated with each sample image according to the face detection result; determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image; and adjusting model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector, wherein the sample types of the sample image comprise a positive sample type and a negative sample type, and the sample image of the positive sample type comprises a face area with a quality score meeting a preset condition.

According to another aspect of the present disclosure, there is provided an exercise apparatus for a face detector, including: the first processing module is used for taking at least one sample image in the training sample set as input data of the face detector to obtain a face detection result associated with each sample image in the at least one sample image; the second processing module is used for determining a classification loss evaluation value associated with each sample image according to the face detection result; a third processing module, configured to determine a target loss evaluation value for adjusting a model parameter of the face detector according to the classification loss evaluation value and the sample type of each sample image; and the fourth processing module is used for adjusting the model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector, wherein the sample types of the sample images comprise a positive sample type and a negative sample type, and the sample images of the positive sample type comprise a face area with a quality score meeting a preset condition.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described training method of the face detector.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described training method of the face detector.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-described training method of a face detector.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates a system architecture of a training method and apparatus of a face detector according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a training method of a face detector according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart of a training method of a face detector according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a training process of a face detector according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an exercise apparatus for a face detector according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of an electronic device for performing training of a face detector according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The embodiment of the disclosure provides a training method of a face detector. The training method of the face detector comprises the following steps: the method comprises the steps of taking at least one sample image in a training sample set as input data of a face detector, obtaining a face detection result associated with each sample image in the at least one sample image, determining a classification loss evaluation value associated with each sample image according to the face detection result, determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and a sample type of each sample image, and adjusting the model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector. The sample types of the sample images comprise positive sample types and negative sample types, and the sample images of the positive sample types comprise human face regions with quality scores meeting preset conditions.

Fig. 1 schematically shows a system architecture of a training method and apparatus of a face detector according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

The system architecture 100 according to this embodiment may include a data collection side 101, a network 102, and a server 103. Network 102 is the medium used to provide a communication link between data collection end 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud computing, network services, middleware services, and the like.

The data acquisition terminal 101 interacts with the server 103 through the network 102 to receive or transmit data and the like. The data collection end 101 may be configured to collect sample data for training the deep network model, where the sample data may include, for example, positive sample data and negative sample data.

The server 103 may be a server providing various services, such as a background processing server (for example only) performing model training by using sample data provided by the data collection end 101.

For example, the server 103 receives a training sample set from the data acquisition terminal 101, and uses at least one sample image in the training sample set as input data of the face detector, to obtain a face detection result associated with each sample image in the at least one sample image. The server 103 is further configured to determine a classification loss evaluation value associated with each sample image according to the face detection result, determine a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image, and adjust the model parameters of the face detector based on the target loss evaluation value to obtain an adjusted face detector.

The sample types of the sample images can comprise positive sample types and negative sample types, and the sample images of the positive sample types comprise human face regions with quality scores meeting preset conditions.

It should be noted that the training method of the face detector provided by the embodiment of the present disclosure may be executed by the server 103. Accordingly, the training apparatus of the face detector provided by the embodiment of the present disclosure may be disposed in the server 103. The training method of the face detector provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and can communicate with the data acquisition terminal 101 and/or the server 103. Correspondingly, the training device of the face detector provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 103 and can communicate with the data acquisition terminal 101 and/or the server 103.

It should be understood that the number of data collection terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of data collection terminals, networks, and servers, as desired for implementation.

The embodiment of the present disclosure provides a training method of a face detector, and the following describes the training method of a face detector according to an exemplary embodiment of the present disclosure with reference to fig. 2 to 4 in combination with the system architecture of fig. 1. The training method of the face detector of the embodiment of the present disclosure may be performed by the server 103 shown in fig. 1, for example.

Fig. 2 schematically shows a flow chart of a training method of a face detector according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the face detector of the embodiment of the present disclosure may include, for example, operations S210 to S240.

In operation S210, at least one sample image in the training sample set is used as input data of the face detector, and a face detection result associated with each sample image in the at least one sample image is obtained.

In operation S220, a classification loss evaluation value associated with each sample image is determined according to the face detection result.

In operation S230, a target loss evaluation value for adjusting a model parameter of the face detector is determined according to the classification loss evaluation value and the sample type of each sample image.

In operation S240, model parameters of the face detector are adjusted based on the target loss evaluation value, resulting in an adjusted face detector.

An exemplary flow of each operation of the training method of the face detector of the present embodiment is illustrated below.

Illustratively, the training samples may be obtained in various public, legally compliant ways, such as from a public data set, or by the data collection end after obtaining user authorization associated with the training samples. The training samples are not scene data for a specific user and cannot reflect personal information of the specific user.

The sample types of the sample images can comprise positive sample types and negative sample types, and the sample images of the positive sample types comprise human face regions with quality scores meeting preset conditions. The training sample set may include at least one positive-going sample image and at least one negative-going sample image, divided according to sample type. The negative sample images may include low quality face images and non-face images.

And taking at least one sample image in the training sample set as input data of the face detector to obtain a face detection result associated with each sample image in the at least one sample image. The face detection result for each sample image includes a classification prediction probability and face frame prediction coordinates.

A classification loss value associated with each sample image may be determined based on the classification prediction probability associated with each sample image and a preset classification target probability. And determining a face frame coordinate regression loss value associated with each sample image according to the face frame prediction coordinate associated with each sample image and a preset face frame target coordinate. The classification target probability and the face box target coordinate may be true values obtained by manual labeling, and the classification target probability may include 0 or 1, for example.

Illustratively, a Softmax (normalized exponential) function may be used to act on the fully-connected feature representation layer in the face detector, and in each iteration of the iterative training, a classification prediction probability for the sample image may be output from sample image features associated with the sample image using the Softmax function. And calculating a difference value between the classification prediction probability and a preset classification target probability through a Softmax Loss function to obtain a classification Loss value associated with the sample image.

In each iteration of the iterative training, face frame prediction coordinates for the sample image may be output from sample image features associated with the sample image using a SmoothL1 (smoothed norm) function. And calculating the difference value between the predicted coordinates of the face frame and the target coordinates of the preset face frame through a SmoothL1 Loss function to obtain the regression Loss value of the coordinates of the face frame associated with the sample image.

And calculating a classification loss evaluation value associated with the corresponding sample image according to the classification loss value associated with each sample image and the face frame coordinate regression loss value. The classification Loss evaluation value Loss associated with the sample image may be expressed as, for example, L ═ L_cls+μL_c，L_clsRepresents the classification loss value, L_cAnd the regression loss value of the coordinates of the face frame is represented, and mu represents a preset weight coefficient.

The face detector is subjected to back propagation by utilizing the weighted sum of the loss values associated with the sample image, so that the accuracy and recall rate of the trained face detector are favorably ensured, and the robustness of the face detection is favorably improved.

And determining a target loss evaluation value for adjusting the model parameters of the face detector according to the sample type and the classification loss evaluation value of each sample image. And adjusting the model parameters of the face detector based on a back propagation algorithm and a gradient descent algorithm according to the target loss evaluation value to obtain the adjusted face detector.

For example, a first preset proportion of the highest-value classification loss evaluation value may be used as the target loss evaluation value for adjusting the model parameters of the face detector according to the classification loss evaluation value associated with at least one forward sample image. And taking the classification loss evaluation value with the largest value in a second preset proportion as a target loss evaluation value for adjusting the model parameters of the face detector according to the classification loss evaluation value associated with at least one negative sample image.

By the embodiment of the disclosure, at least one sample image in the training sample set is used as input data of the face detector, and a face detection result associated with each sample image in the at least one sample image is obtained; determining a classification loss evaluation value associated with each sample image according to the face detection result; determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image; and adjusting the model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector. The sample types of the sample images comprise a positive sample type and a negative sample type, and the sample images of the positive sample type comprise a human face area with quality scores meeting preset conditions.

The target loss evaluation value used for adjusting the model parameters of the face detector is screened according to the sample type and the classification loss evaluation value of the sample image, the number of classification loss evaluation values which are propagated in the reverse direction can be effectively controlled on the basis of keeping the effective information of the classification loss evaluation value of the sample image, the workload of training the face detector is favorably reduced effectively, and the training efficiency of the face detector can be improved effectively.

Fig. 3 schematically shows a flow chart of a training method of a face detector according to another embodiment of the present disclosure.

As shown in fig. 3, operation S230 may include, for example, operations S310 to S330.

In operation S310, the classification loss evaluation values associated with at least one negative-going sample image are sorted in a descending order, resulting in a sorted result.

At least one classification loss evaluation value, of which the distribution position is located before the preset reference position in the ranking result, is taken as a target loss evaluation value in operation S320.

In operation S330, a classification loss center value associated with at least one classification loss evaluation value is set as a target loss evaluation value for at least one classification loss evaluation value whose distribution position is located at and after a preset reference position.

An exemplary flow of operations of the method of determining a characteristic of noise-like data of the present embodiment is illustrated below.

Illustratively, a preset number of classification loss evaluation values having the largest value are used as the target loss evaluation values for adjusting the model parameters of the face detector, according to the classification loss evaluation value associated with at least one negative-going sample image. The classification loss evaluation values associated with the at least one negative-going sample image may be sorted in descending order, and the at least one classification loss evaluation value whose sorted position is before the preset reference position may be used as a target loss evaluation value for adjusting the model parameters of the face detector.

The negative sample image does not include a face region with a quality score meeting a preset condition, and the negative sample image may be a low-quality face image or a non-face image, for example. The classification loss evaluation value associated with negative-going sample images is typically relatively minimal. The larger the classification loss evaluation value is, the greater the influence of the corresponding sample image on the face detector training process is.

The classification loss evaluation value of the negative sample images with the largest numerical value and the preset number is used as a target loss evaluation value for adjusting the model parameters of the face detector, so that the problem of excessive invalid classification loss evaluation values caused by excessive negative sample image occupation in the model training process can be effectively solved. On the basis of fully retaining the effective information of the classification loss evaluation values of the negative sample images, the number of the classification loss evaluation values of the negative sample images which are transmitted in the reverse direction is reduced, and the training efficiency of the face detector is effectively improved.

For at least one classification loss evaluation value whose ranking position is located at and after the preset reference position, a classification loss center value associated with the at least one classification loss evaluation value may be calculated as a target loss evaluation value for adjusting a model parameter of the face detector. The classification loss center value may include, for example, a median, an arithmetic mean, a weighted mean, an exponential mean, a square mean, and the like of at least one classification loss evaluation value, which is not limited in this embodiment.

By calculating the central value of the classification loss, the effective information of the classification loss evaluation value of the negative sample image with a small value can be fully reserved, the training precision of the face detector can be effectively ensured, and the robustness of the face detection can be improved.

For example, for at least one classification loss evaluation value whose ranking position is located at and after the preset reference position, a part of the classification loss evaluation values may be selected from the at least one classification loss evaluation value as a target loss evaluation value for adjusting the model parameters of the face detector based on a preset sampling rule. The preset sampling rule may include, for example, a uniform sampling rule, a discrete distribution sampling rule, a Box-Muller algorithm sampling rule, and the like, which is not limited in this embodiment.

For at least one forward sample image in the training sample set, the classification loss evaluation value associated with each of the at least one forward sample image may be used as a target loss evaluation value for adjusting the model parameters of the face detector. Note that a preset number of classification loss evaluation values having the largest numerical value may be set as the target loss evaluation value.

The forward sample images can be effectively guaranteed to be fully trained, the training effect of the face detector can be effectively guaranteed, and the universality and the convenience of the face detector training can be improved.

In an example manner, a target sample set for next iterative training may be screened from a training sample set according to the classification loss evaluation value and the sample type of each sample image, and the training of the currently adjusted face detector may continue using the sample images in the target sample set.

For at least one negative sample image in the training sample set, a target negative sample image with a maximum classification loss evaluation value at a preset ratio may be determined according to the classification loss evaluation value associated with each negative sample image, so as to obtain a first training sample subset in the target sample set.

For at least one other negative sample image other than the target negative sample image, a second subset of training samples in the target sample set may be determined based on the classification loss evaluation value associated with the at least one other negative sample image. For example, a central value of classification loss of the classification loss evaluation value associated with at least one other negative-going sample image may be calculated, and the at least one other negative-going sample image corresponding to the central value of classification loss may constitute the second training sample subset.

For at least one other negative-going sample image other than the target negative-going sample image, the at least one other negative-going sample image may also be ranked according to the classification loss evaluation value associated with each other negative-going sample image. The ordered at least one other negative-going sample image may be sampled based on a preset sampling rule to obtain a second training sample subset.

According to the sample type and the classification loss evaluation value of the sample image, a target sample set for next iterative training is screened from a training sample set, so that the training speed and the optimization efficiency of the face detector are effectively improved on the basis of ensuring that effective samples are fully trained and the model learning effect is effectively improved.

For example, a loss value convergence coefficient associated with each sample image may be determined based on a classification loss evaluation value of the at least one sample image in a plurality of iterative training for the face detector. Determining a weight distribution value for each sample image according to the loss value convergence coefficient associated with each sample image, and continuously training the face detector after the current adjustment according to the weight distribution value associated with each sample image.

The types of the loss value convergence coefficient associated with the sample image may include a convergence type and a non-convergence type, and the magnitude of the loss value convergence coefficient of the convergence type may indicate the convergence speed of the classification loss evaluation value associated with the sample image. Illustratively, the classification loss evaluation values associated with the sample images in the multiple iterative training for the face detector constitute a loss evaluation value sequence, and the loss value convergence coefficient may be determined from a mean value of a first preset number of classification loss evaluation values at the tail of the loss evaluation value sequence and a mean value of a second preset number of classification loss evaluation values at the head.

For a first type sample image in which the convergence rate of the classification loss evaluation value is higher than a preset threshold value, the weight assignment value associated with the first type sample image may be reduced. For the second type sample image in which the convergence rate of the classification loss evaluation value is lower than the preset threshold value, the weight assignment value associated with the second type sample image may be increased. For the third type sample image in which the classification loss evaluation value does not converge, the weight assignment value associated with the third type sample image may be reduced.

And determining a weight distribution value aiming at the sample image according to the change trend and the relative relation of the change trend of the classification loss evaluation value of the sample image in multiple iterative training. The method is favorable for adjusting the weight distribution value associated with the sample image according to the action of the sample image on the training process of the face detector, effectively ensuring the training effect of the face detector and effectively improving the training efficiency of the face detector.

Fig. 4 schematically shows a schematic diagram of a training process of a face detector according to an embodiment of the present disclosure.

As shown in fig. 4, in a training process 400 of a face detector, a training sample set may include a positive sample image set 401 and a negative sample image set 402.

At least one forward sample image in the forward sample image set 401 is used as input data of the face detector, and a face detection result associated with each forward sample image in the at least one forward sample image is obtained. From the face detection result, a first classification loss evaluation value associated with each forward sample image is determined to obtain a first classification loss evaluation value set 403.

At least one negative sample image in the negative sample image set 402 is used as input data of the face detector, and a face detection result associated with each negative sample image in the at least one negative sample image is obtained. From the face detection results, a second classification loss evaluation value associated with each negative-going sample image is determined to obtain a second classification loss evaluation value set 404.

Each first classification loss evaluation value in the first classification loss evaluation value set 403 is used for back propagation to adjust the model parameters of the face detector. In the second classification loss evaluation value set 404, a preset number of target second classification loss evaluation values 405 with the largest value are filtered and propagated in the reverse direction to adjust the model parameters of the face detector.

For at least one other second classification loss evaluation value other than the target second classification loss evaluation value, a classification loss center value 407 associated with the at least one other second classification loss evaluation value is calculated. The classification loss center value 407 is used for back propagation to adjust the model parameters of the face detector.

By screening the target loss evaluation value used for adjusting the model parameter of the face detector, the number of the classification loss evaluation values which are propagated reversely can be effectively controlled on the basis of keeping the effective information of the classification loss evaluation values of the sample images, the training efficiency of the face detector can be effectively improved, and the training effect of the face detector can be effectively ensured.

Fig. 5 schematically shows a block diagram of an training apparatus of a face detector according to an embodiment of the present disclosure.

As shown in fig. 5, the training apparatus 500 of the face detector of the embodiment of the present disclosure includes, for example, a first processing module 510, a second processing module 520, a third processing module 530, and a fourth processing module 540.

A first processing module 510, configured to use at least one sample image in the training sample set as input data of a face detector, to obtain a face detection result associated with each sample image in the at least one sample image; a second processing module 520, configured to determine a classification loss evaluation value associated with each sample image according to a face detection result; a third processing module 530, configured to determine a target loss evaluation value for adjusting a model parameter of the face detector according to the classification loss evaluation value and the sample type of each sample image; and a fourth processing module 540, configured to adjust a model parameter of the face detector based on the target loss evaluation value, so as to obtain an adjusted face detector. The sample types of the sample images comprise a positive sample type and a negative sample type, and the sample images of the positive sample type comprise a human face area with quality scores meeting preset conditions.

By the embodiment of the disclosure, at least one sample image in the training sample set is used as input data of the face detector, and a face detection result associated with each sample image in the at least one sample image is obtained; determining a classification loss evaluation value associated with each sample image according to the face detection result; determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image; and adjusting the model parameters of the face detector based on the target loss evaluation value to obtain the adjusted face detector. The sample types of the sample images comprise positive sample types and negative sample types, and the sample images of the positive sample types comprise human face regions with quality scores meeting preset conditions.

According to the embodiment of the disclosure, the training sample set is divided according to the sample types and comprises at least one negative sample image; the third processing module comprises: the first processing sub-module is used for carrying out descending ordering on the classification loss evaluation values associated with at least one negative sample image to obtain an ordering result; and a second processing sub-module for taking at least one classification loss evaluation value whose distribution position is located before a preset reference position in the sorting result as a target loss evaluation value.

According to an embodiment of the present disclosure, the third processing module further includes: and the third processing sub-module is used for regarding at least one classification loss evaluation value with the distribution position located at and behind the preset reference position, and taking a classification loss center value associated with the at least one classification loss evaluation value as a target loss evaluation value.

According to the embodiment of the disclosure, the training sample set is divided according to the sample types and comprises at least one forward sample image; the third processing module comprises: and the fourth processing sub-module is used for taking the classification loss evaluation value associated with each forward sample image in the at least one forward sample image as the target loss evaluation value.

According to an embodiment of the present disclosure, the second processing module includes: the fifth processing submodule is used for determining a classification loss value and a face frame coordinate regression loss value which are associated with each sample image according to a face detection result aiming at each sample image; and a sixth processing sub-module for calculating a classification loss evaluation value associated with the corresponding sample image according to the classification loss value associated with each sample image and the face frame coordinate regression loss value.

According to an embodiment of the present disclosure, the face detection result for each sample image includes a classification prediction probability and a face frame prediction coordinate; the fifth processing submodule includes: a first processing unit for determining a classification loss value associated with each sample image according to a classification prediction probability associated with each sample image and a preset classification target probability; and the second processing unit is used for determining the face frame coordinate regression loss value associated with each sample image according to the face frame prediction coordinate associated with each sample image and the preset face frame target coordinate.

According to an embodiment of the present disclosure, the apparatus further includes a fifth processing module, configured to: screening a target sample set for next iterative training in a training sample set according to the classification loss evaluation value and the sample type of each sample image; and continuously training the face detector after the current adjustment by using the sample images in the target sample set.

According to an embodiment of the present disclosure, the apparatus further includes a sixth processing module, configured to: determining a loss value convergence coefficient associated with each sample image according to a classification loss evaluation value of at least one sample image in a plurality of iterative trainings aiming at the face detector; determining a weight assignment value for each sample image according to a loss value convergence coefficient associated with each sample image; and continuing training the face detector after the current adjustment according to the weight distribution value associated with each sample image.

It should be noted that in the technical solutions of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the related information are all in accordance with the regulations of the related laws and regulations, and do not violate the customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 6 schematically shows a block diagram of an electronic device for performing a training method of a face detector according to an embodiment of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. The electronic device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the various methods and processes described above, such as a training method of a face detector. For example, in some embodiments, the training method of the face detector may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the training method of the face detector described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the training method of the face detector.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable face detector training apparatus such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with an object, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to an object; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which objects can provide input to the computer. Other kinds of devices may also be used to provide for interaction with an object; for example, feedback provided to the subject can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the object may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., an object computer having a graphical object interface or a web browser through which objects can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a face detector, comprising:

taking at least one sample image in a training sample set as input data of a face detector to obtain a face detection result associated with each sample image in the at least one sample image;

determining a classification loss evaluation value associated with each sample image according to the face detection result;

determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image; and

adjusting model parameters of the face detector based on the target loss evaluation value to obtain an adjusted face detector,

the sample types of the sample images comprise positive sample types and negative sample types, and the sample images of the positive sample types comprise human face areas with quality scores meeting preset conditions.

2. The method of claim 1, wherein,

dividing according to the type of the sample, wherein the training sample set comprises at least one negative direction sample image;

the determining a target loss evaluation value for adjusting a model parameter of the face detector according to the classification loss evaluation value and the sample type of each sample image includes:

sorting the classification loss evaluation values associated with the at least one negative sample image in a descending order to obtain a sorting result; and

and taking at least one classification loss evaluation value with a distribution position located before a preset reference position in the sorting result as the target loss evaluation value.

3. The method of claim 2, wherein the determining a target loss evaluation value for adjusting model parameters of the face detector according to the classification loss evaluation value and the sample type of each sample image further comprises:

regarding at least one classification loss evaluation value whose distribution position is located at and after the preset reference position, a classification loss center value associated with the at least one classification loss evaluation value is taken as the target loss evaluation value.

4. The method of claim 1, wherein,

dividing according to the type of a sample, wherein the training sample set comprises at least one forward sample image;

the classification loss evaluation value associated with each of the at least one forward sample image is taken as the target loss evaluation value.

5. The method of claim 1, wherein the determining a classification loss evaluation value associated with the each sample image according to the face detection result comprises:

determining a classification loss value and a face frame coordinate regression loss value associated with each sample image according to a face detection result aiming at each sample image; and

and calculating the classification loss evaluation value associated with the corresponding sample image according to the classification loss value associated with each sample image and the face frame coordinate regression loss value.

6. The method of claim 5, wherein,

the face detection result aiming at each sample image comprises a classification prediction probability and a face frame prediction coordinate;

determining a classification loss value and a face frame coordinate regression loss value associated with each sample image according to a face detection result for each sample image, including:

determining a classification loss value associated with each sample image according to the classification prediction probability associated with each sample image and a preset classification target probability; and

and determining a face frame coordinate regression loss value associated with each sample image according to the face frame prediction coordinate associated with each sample image and a preset face frame target coordinate.

7. The method of any of claims 1 to 6, further comprising:

screening a target sample set for next iterative training in the training sample set according to the classification loss evaluation value and the sample type of each sample image; and

and continuously training the face detector after the current adjustment by using the sample images in the target sample set.

8. The method of any of claims 1 to 6, further comprising:

determining a loss value convergence coefficient associated with each sample image according to the classification loss evaluation value of the at least one sample image in a plurality of iterative trainings aiming at the face detector;

determining a weight assignment value for the each sample image according to a loss value convergence coefficient associated with the each sample image; and

and continuing training the face detector which is adjusted at the current time according to the weight distribution value associated with each sample image.

9. An apparatus for training a face detector, comprising:

the first processing module is used for taking at least one sample image in the training sample set as input data of the face detector to obtain a face detection result associated with each sample image in the at least one sample image;

the second processing module is used for determining a classification loss evaluation value associated with each sample image according to the face detection result;

a third processing module, configured to determine a target loss evaluation value for adjusting a model parameter of the face detector according to the classification loss evaluation value and the sample type of each sample image; and

a fourth processing module, configured to adjust a model parameter of the face detector based on the target loss evaluation value to obtain an adjusted face detector,

10. The apparatus according to claim 9, wherein the training sample set includes at least one negative-going sample image, divided according to sample type; the third processing module comprises:

the first processing sub-module is used for carrying out descending ordering on the classification loss evaluation values associated with the at least one negative sample image to obtain an ordering result; and

and the second processing sub-module is used for taking at least one classification loss evaluation value with the distribution position located before a preset reference position in the sorting result as the target loss evaluation value.

11. The apparatus of claim 10, wherein the third processing module further comprises:

and a third processing sub-module, configured to, for at least one classification loss evaluation value whose distribution position is located at and after the preset reference position, take a classification loss center value associated with the at least one classification loss evaluation value as the target loss evaluation value.

12. The apparatus of claim 9, wherein the training sample set includes at least one forward sample image, partitioned according to sample type; the third processing module comprises:

a fourth processing sub-module, configured to use the classification loss evaluation value associated with each of the at least one forward sample image as the target loss evaluation value.

13. The apparatus of claim 9, wherein the second processing module comprises:

a fifth processing submodule, configured to determine, according to a face detection result for each sample image, a classification loss value and a face frame coordinate regression loss value associated with each sample image; and

and the sixth processing submodule is used for calculating the classification loss evaluation value associated with the corresponding sample image according to the classification loss value associated with each sample image and the face frame coordinate regression loss value.

14. The apparatus of claim 9, wherein the face detection result for each sample image comprises a classification prediction probability and face frame prediction coordinates; the fifth processing sub-module includes:

a first processing unit, configured to determine a classification loss value associated with each sample image according to a classification prediction probability associated with each sample image and a preset classification target probability; and

and the second processing unit is used for determining a face frame coordinate regression loss value associated with each sample image according to the face frame prediction coordinate associated with each sample image and a preset face frame target coordinate.

15. The apparatus of any of claims 9 to 14, further comprising a fifth processing module to:

16. The apparatus of any of claims 9 to 14, further comprising a sixth processing module to:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 8.