CN111753598A

CN111753598A - Face detection method and device

Info

Publication number: CN111753598A
Application number: CN201910251460.4A
Authority: CN
Inventors: 刘金财; 王涛; 樊星宇
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2020-10-09

Abstract

The embodiment of the invention provides a face detection method and equipment, wherein the method comprises the following steps: carrying out scaling processing on the face image to be detected to obtain the face image to be detected with a plurality of scales; processing the face images to be detected in multiple scales through a first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales; respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales and/or the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales to obtain second screening candidate frames; and processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image. The embodiment of the invention can improve the face detection efficiency.

Description

Face detection method and device

Technical Field

The embodiment of the invention relates to the technical field of face recognition, in particular to a face detection method and device.

Background

When a user transacts a business, the user usually needs to acquire a face image of the user as a proof for auditing. When a user image is collected, the collected user image often includes images of other faces due to the influence of the surrounding environment, and therefore, face detection needs to be performed on the collected user image to extract a face image of the user.

At present, a commonly used face detection method is to perform face detection through a Multi-task cascaded convolution network algorithm (called Multi-tasskcassated connected Networks, MTCNN for short). The MTCNN algorithm obtains a face image by screening out a face candidate frame firstly and then further screening out the face candidate frame.

However, the inventor finds that in the conventional MTCNN network algorithm, the number of face candidate frames to be screened is too large, which causes too large computation amount when face candidate frames are further screened subsequently, and reduces the face detection efficiency.

Disclosure of Invention

The invention provides a face detection method and face detection equipment, which aim to solve the problem of low face detection efficiency in the prior art.

In a first aspect, an embodiment of the present invention provides a face detection method, including:

carrying out scaling processing on the face image to be detected to obtain the face image to be detected with a plurality of scales;

processing the face images to be detected in multiple scales through a first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales;

respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales and/or the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales to obtain second screening candidate frames;

and processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image.

As an embodiment of the present invention, the processing the second screening candidate frame sequentially through the second stage neural network model and the third stage neural network model to obtain a face image includes:

processing the second screening candidate frame through a second-level neural network model to obtain a second candidate frame, and removing the overlapped second candidate frame to obtain a third screening candidate frame with multiple scales;

respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames;

and processing the fourth screening candidate frame through a third-level neural network model to obtain a face image.

As an embodiment of the present invention, the processing the fourth filtering candidate frame through the third-level neural network model to obtain a face image includes:

processing the fourth screening candidate frame through a third-level neural network model to obtain a third candidate frame, and removing the overlapped third candidate frame to obtain a fifth screening candidate frame with multiple scales;

and acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frames of which the third face positions exceed the position threshold values of the corresponding scales and/or the fifth screening candidate frames of which the third face areas exceed the area threshold values of the corresponding scales to obtain the face images.

As an embodiment of the present invention, before removing the first filtering candidate frame whose first face position exceeds the position threshold of the corresponding scale, the method further includes:

acquiring a first preset number of face image samples, and determining the positions of faces in the face image samples;

determining a preset position threshold according to the positions of the faces in the face image samples;

and zooming the preset position threshold to obtain position thresholds of a plurality of scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.

As an embodiment of the present invention, before removing the first screening candidate frame whose first human face area exceeds the area threshold of the corresponding scale and obtaining the second screening candidate frame, the method further includes:

acquiring a second preset number of face image samples, and determining the area of the face in each face image sample;

determining a preset area threshold according to the area of the face in each face image sample;

and zooming the preset area threshold to obtain area thresholds of a plurality of scales, wherein the zooming scale of the preset area threshold is the same as that of the face image to be detected.

As an embodiment of the invention, the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.

As an embodiment of the present invention, before performing scaling processing on the face image to be detected, the method further includes:

and constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain a face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.

In a second aspect, an embodiment of the present invention provides a face detection apparatus, including:

the scaling module is used for scaling the face image to be detected to obtain the face images to be detected with a plurality of scales;

the first processing module is used for processing the face images to be detected in multiple scales through the first-level neural network model to obtain a first candidate frame, removing the overlapped first candidate frame and obtaining a first screening candidate frame in multiple scales;

the screening module is used for respectively acquiring a first face position and/or a first face area of each first screening candidate frame, removing the first screening candidate frames of which the first face positions exceed the position threshold values of the corresponding scales, and/or removing the first screening candidate frames of which the first face areas exceed the area threshold values of the corresponding scales, and obtaining second screening candidate frames;

and the second processing module is used for processing the second screening candidate frame sequentially through the second-level neural network model and the third-level neural network model to obtain the face image.

In a third aspect, an embodiment of the present invention provides a face detection device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored in the memory, so that the at least one processor executes the face detection method according to any one of the first aspect of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the face detection method according to any one of the first aspect of the embodiments of the present invention is implemented.

According to the face detection method and device provided by the embodiment of the invention, the face images to be detected in multiple scales are processed through the first-level neural network model to obtain the first candidate frame, the overlapped first candidate frame is removed to obtain the first screening candidate frame, then the first screening candidate frame is further screened through the first face position and/or the first face area, the first screening candidate frame with the first face position exceeding the position threshold value of the corresponding scale and/or the first screening candidate frame with the first face area exceeding the area threshold value of the corresponding scale are removed to obtain the second screening candidate frame, so that the number of candidate frames is reduced, the calculation amount of the second-level neural network model and the third-level neural network model is reduced, and the face detection efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a first flowchart of a face detection method according to an embodiment of the present invention;

fig. 2 is a second flowchart of a face detection method according to an embodiment of the present invention;

fig. 3 is a third flowchart of the face detection method according to the embodiment of the present invention;

fig. 4 is a fourth flowchart of a face detection method according to an embodiment of the present invention;

fig. 5 is a first schematic structural diagram of a face detection device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware structure of a face detection device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The MTCNN neural network model generally includes three-level neural network models, i.e., a P-net neural network model, an R-net neural network model, and an O-net neural network model. The P-net neural network model is used for generating a candidate frame, and the R-net neural network model and the O-net neural network model are used for screening the candidate frame to obtain a face image. In the existing MTCNN network algorithm, too many face candidate frames are screened, so that the calculation amount is too large when the face candidate frames are further screened subsequently, and the face detection efficiency is reduced. The embodiment of the invention improves the MTCNN algorithm to improve the face detection efficiency.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Referring to fig. 1, fig. 1 is a flowchart illustrating a first method for detecting a face according to an embodiment of the present invention, as shown in fig. 1, the method according to the embodiment may include:

and S101, carrying out scaling processing on the face image to be detected to obtain the face images to be detected with a plurality of scales.

In the embodiment of the invention, the face image to be detected is the face image of the user acquired by the image acquisition device. And carrying out pyramid level scaling on the face image to be detected to obtain the face image to be detected with a plurality of scales. The specific scaling processing steps include: determining a basic scaling scale a, scaling the face image to be detected according to the basic scaling scale, and iterating according to an iteration strategy that the area of the scaled image is a times of the area of the previous layer of image to obtain a plurality of scalesFor example, the basic scaling scale is a, the area of the face image to be detected is s, and the areas of the face images to be detected of multiple scales obtained after scaling processing are a, s × a, s × a from high to low²，…，s×aⁿ. The base scaling scale may be set as desired, for example, the base scaling scale is 0.72.

And S102, processing the face images to be detected in multiple scales through the first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales.

In an embodiment of the invention, the first stage neural network model comprises convolutional layers of 3 × 3 convolutional templates. The size of an input image area of the first neural network model is 12 x 12, the first-level neural network model performs face detection on each 12 x 12 area in the face images to be detected in multiple scales, whether a face exists in the image of each 12 x 12 area is judged, a face regression frame and face key point positioning are given, and a first candidate frame is obtained. The face regression frame indicates the precise position of the face frame, for example, face coordinates, and the face key point location includes 5 key points, which are the position of the left eye, the position of the right eye, the position of the nose, the position of the left side of the mouth, and the position of the right side of the mouth.

And after the first candidate frames are obtained, removing the overlapped first candidate frames through a non-maximum suppression algorithm (NMS) to obtain first screening candidate frames, wherein the scales of the first screening candidate frames are different.

Step S103, respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding the position threshold values of the corresponding scales and/or the first screening candidate frames with the first face areas exceeding the area threshold values of the corresponding scales to obtain second screening candidate frames.

In the embodiment of the invention, the accurate position of each first candidate frame, namely the face position, such as face coordinates, can be obtained through the first-level neural network model, the face area can be obtained according to the face position, and the first screening candidate frames are further screened through the face position and/or the face area, so that the number of the first candidate frames is reduced.

In a possible implementation manner, the first screening candidate frame is further screened through the face position, and the specific implementation manner is as follows: and acquiring first face positions of the first screening candidate frames with different scales, and removing the first screening candidate frames with the first face positions exceeding the position threshold values of the corresponding scales to obtain second screening candidate frames.

In another possible implementation manner, the first screening candidate frame is further screened through the face area, and the specific implementation manner is as follows: and obtaining the first face areas of the first screening candidate frames with different scales, and removing the first screening candidate frames with the first face areas exceeding the area threshold values of the corresponding scales to obtain second screening candidate frames.

In another possible implementation manner, the first screening candidate frame is further screened according to the face position and the face area, and the specific implementation manner is as follows: the method comprises the steps of obtaining first face positions and first face areas of a plurality of first screening candidate frames with different scales, removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales, removing the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales, and obtaining second screening candidate frames.

And step S104, processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image.

In an embodiment of the invention, the second level neural network model comprises 2 convolutional layers of 3 × 3 convolutional templates, 1 convolutional layer of 2 × 2 convolutional templates, and 1 fully-connected layer. The third level neural network model includes 2 convolutional layers of 3 × 3 convolutional templates, 1 convolutional layer of 2 × 2 convolutional templates, and 1 fully-connected layer. And inputting the second screening candidate frame into a second-level neural network model, removing the non-face frame by the second-level neural network model, and finely adjusting the first screening candidate frame to reduce the range of the face candidate frame. And taking the output of the second-level neural network model as the input of a third-level neural network model, and further screening by the third-level neural network model to obtain a face image. The face image only comprises the face of the user and does not comprise other faces in the environment.

According to the embodiment of the invention, the first-level neural network model is used for processing the face images to be detected in multiple scales to obtain the first candidate frame, the overlapped first candidate frame is removed to obtain the first screening candidate frame, then the first screening candidate frame is further screened by the first face position and/or the first face area, the first screening candidate frame with the first face position exceeding the position threshold value of the corresponding scale and/or the first screening candidate frame with the first face area exceeding the area threshold value of the corresponding scale are removed to obtain the second screening candidate frame, so that the number of candidate frames is reduced, the calculation amount of the second-level neural network model and the third-level neural network model is reduced, and the face detection efficiency is improved.

Referring to fig. 2, fig. 2 is a second flowchart of a face detection method according to an embodiment of the present invention, as shown in fig. 2, based on the embodiment shown in fig. 1, the method of this embodiment describes in detail a specific implementation manner of step S104, and the method of this embodiment may include:

step S201, the second screening candidate frame is processed through a second-level neural network model to obtain a second candidate frame, and the overlapped second candidate frame is removed to obtain a third screening candidate frame with multiple scales.

Step S202, respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames.

Step S203, processing the fourth screening candidate frame through a third-level neural network model to obtain a third candidate frame, and removing the overlapped third candidate frame to obtain a fifth screening candidate frame with multiple scales.

Step S204, acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frame of which the third face position exceeds the position threshold value of the corresponding scale and/or the fifth screening candidate frame of which the third face area exceeds the area threshold value of the corresponding scale to obtain the face image.

In the embodiment of the invention, the second candidate frame output by the second-level neural network model and the third candidate frame output by the third-level neural network model are further screened respectively through a non-maximum suppression algorithm, a face position and/or a face area.

And taking the first screening candidate frame as the input of a second neural network model, removing the non-face frame by the second neural network model, finely adjusting the first screening candidate frame, reducing the range of the face candidate frame, outputting the second candidate frame by the second neural network model, removing the overlapped second candidate frame by a non-maximum suppression algorithm (NMS) to obtain third screening candidate frames, wherein the scales of the third screening candidate frames are different. And further screening the third screening candidate frame by the face area and/or the face position to obtain a fourth screening candidate frame, wherein the specific implementation manner is similar to that of step S103 shown in fig. 1, and the embodiment of the present invention is not described again.

And taking the fourth screening candidate frame as the input of the third neural network model, outputting a third candidate frame, and removing the overlapped third candidate frame through a non-maximum suppression algorithm (NMS) to obtain fifth screening candidate frames, wherein the scales of the fifth screening candidate frames are different. And further screening the fifth screening candidate frame according to the face area and/or the face position to obtain a face image, wherein a specific implementation manner is similar to that of step S103 shown in fig. 1, and details are not repeated in the embodiment of the present invention.

The embodiment of the invention further screens the third screening candidate frame through the second face position and/or the second face area, and further screens the fifth screening candidate frame through the third face position and/or the third face area, thereby further improving the face detection efficiency.

Referring to fig. 3, fig. 3 is a flowchart of a third flow chart of a face detection method according to an embodiment of the present invention, as shown in fig. 3, before removing, in step S103, a first filtering candidate frame in which a first face position exceeds a position threshold of a corresponding scale, in the method according to the present embodiment, the method according to the present embodiment may further include:

step S301, a first preset number of face image samples are obtained, and the positions of faces in the face image samples are determined.

Step S302, a preset position threshold value is determined according to the positions of the human faces in the human face image samples.

Step S303, zooming the preset position threshold to obtain position thresholds of multiple scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.

In the embodiment of the invention, the face image sample is acquired by the image acquisition device, and the acquisition scene of the face image sample is the same as the acquisition scene of the face image to be detected. The human face image sample is a human face image of a sample acquirer standing in a specified position range acquired by the image acquisition device, and the human face image to be detected is also an image of the user standing in the specified position range acquired by the image acquisition device. For example, when bank deposit business is transacted, the face image sample is a face image acquired by a bank business transaction window, and the face image to be detected is also a face image acquired by the bank business transaction window.

The position of the face in each face image sample can be determined through a face recognition technology, and a preset position threshold value is determined according to the position of the face in each face image sample. For example, the positions of the faces whose positions in the face image sample exceed the preset range are removed, the maximum value of the positions of the remaining faces is the upper limit of the preset position threshold, and the minimum value is the lower limit of the preset position threshold. And similarly carrying out pyramid level scaling processing on the preset position threshold to obtain preset position thresholds of multiple scales, wherein the scaling scale of the preset position threshold is the same as that of the face image to be detected. The specific scaling processing steps include: and determining a basic scaling scale a, scaling the preset position threshold according to the basic scaling scale, and iterating according to an iteration strategy that the scaled value is a times of the previous value to obtain position thresholds of multiple scales.

Referring to fig. 4, fig. 4 is a fourth flowchart of a face detection method according to an embodiment of the present invention, as shown in fig. 4, based on the embodiment shown in fig. 1, before removing the first filtering candidate frame whose first face area exceeds the area threshold of the corresponding scale in step S103 to obtain the second filtering candidate frame, the method of this embodiment may further include:

step S401, obtaining a second preset number of face image samples, and determining the area of the face in each face image sample.

Step S402, determining a preset area threshold according to the area of the face in each face image sample.

And S403, performing scaling processing on the preset area threshold to obtain area thresholds of multiple scales, wherein the scaling scale of the preset area threshold is the same as that of the face image to be detected.

In the embodiment of the present invention, the first preset number and the second preset number may be the same or different, and the embodiment of the present invention is not particularly limited.

The position of the face in each face image sample can be determined through a face recognition technology, the area of the face is determined according to the position of the face in each face image sample, and then the preset area threshold value is determined according to the area of the face in each face image sample. For example, the area of the face in the face image sample is removed to exceed the area of the face within the preset range, the maximum value of the areas of the remaining faces is the upper limit of the preset area threshold, and the minimum value is the lower limit of the preset area threshold. And carrying out pyramid level scaling treatment on the preset area threshold value to obtain area threshold values of multiple scales, wherein the scaling scale of the preset area threshold value is the same as that of the face image to be detected. The specific scaling processing steps include: and determining a basic scaling scale a, scaling the preset area threshold according to the basic scaling scale, and iterating according to an iteration strategy that the scaled value is a times of the previous value to obtain area thresholds of multiple scales.

The method of this embodiment may further include: and constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain a face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.

In the embodiment of the invention, the face detection neural network model comprises: p-net (full name of Proposal Network) neural Network model, R-net (full name of Refine Network) neural Network model, and O-net (full name of Output Network) neural Network model. And training the initial face detection neural network model through the training sample to obtain the face detection neural network model for face detection.

Fig. 5 is a schematic structural diagram of a first face detection apparatus according to an embodiment of the present invention, as shown in fig. 5, a face detection apparatus 500 includes a scaling module 501, a first processing module 502, a screening module 503, and a second processing module 504, and specific functions of each module are as follows:

the scaling module 501 is configured to scale the face image to be detected to obtain face images to be detected in multiple scales.

The first processing module 502 is configured to process the face images to be detected in multiple scales through the first-stage neural network model to obtain a first candidate frame, and remove the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales.

The screening module 503 is configured to obtain a first face position and/or a first face area of each first screening candidate frame, and remove the first screening candidate frame whose first face position exceeds the position threshold of the corresponding scale, and/or the first screening candidate frame whose first face area exceeds the area threshold of the corresponding scale, so as to obtain a second screening candidate frame.

And the second processing module 504 is configured to process the second screening candidate frame sequentially through the second-level neural network model and the third-level neural network model to obtain a face image.

As an embodiment of the present invention, the second processing module 504 is specifically configured to process the second candidate screening frame through a second-level neural network model to obtain a second candidate screening frame, and remove the overlapped second candidate screening frame to obtain a third candidate screening frame with multiple scales;

As an embodiment of the present invention, the second processing module 504 is specifically configured to process the fourth candidate screening frame through a third-level neural network model to obtain a third candidate screening frame, and remove the overlapped third candidate screening frame to obtain fifth candidate screening frames with multiple scales;

Referring to fig. 6, fig. 6 is a schematic structural diagram of a second face detection apparatus according to an embodiment of the present invention, as shown in fig. 6, based on the embodiment shown in fig. 5, the apparatus of this embodiment may further include: the first determining module 505 is configured to obtain a first preset number of face image samples, and determine positions of faces in the face image samples;

As an embodiment of the present invention, the apparatus of this embodiment may further include: the second determining module 506, the second determining module 506 is configured to obtain a second preset number of face image samples, and determine areas of faces in the face image samples;

As an embodiment of the present invention, the apparatus of this embodiment may further include: and the model training module 507, the model training module 507 is used for constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain the face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.

The apparatus of the present embodiment may be used to implement the method embodiments shown in fig. 1 to fig. 4, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 7 is a schematic diagram of a hardware structure of a face detection device according to an embodiment of the present invention. As shown in fig. 7, the face detection apparatus 700 provided in the present embodiment includes: at least one processor 701 and a memory 702. The face detection apparatus 700 further comprises a communication section 703. The processor 701, the memory 702, and the communication section 703 are connected by a bus 704.

In a specific implementation process, the at least one processor 701 executes the computer-executable instructions stored in the memory 702, so that the at least one processor 701 executes the face detection method in any one of the above-described method embodiments. The communication component 703 is used for communicating with the terminal device and/or the server.

For a specific implementation process of the processor 701, reference may be made to the above method embodiments, which implement principles and technical effects similar to each other, and details of this embodiment are not described herein again.

In the embodiment shown in fig. 7, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the face detection method in any of the above method embodiments is implemented.

The computer-readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A face detection method, comprising:

2. The method of claim 1, wherein the processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image comprises:

3. The method of claim 2, wherein the processing the fourth filtering candidate frame through the third-level neural network model to obtain a face image comprises:

4. The method of claim 1, wherein removing the first screening candidate box with the first face position exceeding the position threshold of the corresponding scale further comprises:

5. The method of claim 1, wherein removing the first screening candidate box having the first human face area exceeding the area threshold of the corresponding scale before obtaining the second screening candidate box, further comprises:

6. The method of any one of claims 1 to 5, wherein the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.

7. The method according to claim 6, wherein before the scaling process of the face image to be detected, the method further comprises:

8. A face detection apparatus, comprising:

9. A face detection apparatus, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the face detection method of any of claims 1 to 7.

10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, implement the face detection method of any one of claims 1 to 7.