CN111753598A - Face detection method and device - Google Patents

Face detection method and device Download PDF

Info

Publication number
CN111753598A
CN111753598A CN201910251460.4A CN201910251460A CN111753598A CN 111753598 A CN111753598 A CN 111753598A CN 201910251460 A CN201910251460 A CN 201910251460A CN 111753598 A CN111753598 A CN 111753598A
Authority
CN
China
Prior art keywords
face
neural network
network model
screening candidate
candidate frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910251460.4A
Other languages
Chinese (zh)
Inventor
刘金财
王涛
樊星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201910251460.4A priority Critical patent/CN111753598A/en
Publication of CN111753598A publication Critical patent/CN111753598A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a face detection method and equipment, wherein the method comprises the following steps: carrying out scaling processing on the face image to be detected to obtain the face image to be detected with a plurality of scales; processing the face images to be detected in multiple scales through a first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales; respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales and/or the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales to obtain second screening candidate frames; and processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image. The embodiment of the invention can improve the face detection efficiency.

Description

Face detection method and device
Technical Field
The embodiment of the invention relates to the technical field of face recognition, in particular to a face detection method and device.
Background
When a user transacts a business, the user usually needs to acquire a face image of the user as a proof for auditing. When a user image is collected, the collected user image often includes images of other faces due to the influence of the surrounding environment, and therefore, face detection needs to be performed on the collected user image to extract a face image of the user.
At present, a commonly used face detection method is to perform face detection through a Multi-task cascaded convolution network algorithm (called Multi-tasskcassated connected Networks, MTCNN for short). The MTCNN algorithm obtains a face image by screening out a face candidate frame firstly and then further screening out the face candidate frame.
However, the inventor finds that in the conventional MTCNN network algorithm, the number of face candidate frames to be screened is too large, which causes too large computation amount when face candidate frames are further screened subsequently, and reduces the face detection efficiency.
Disclosure of Invention
The invention provides a face detection method and face detection equipment, which aim to solve the problem of low face detection efficiency in the prior art.
In a first aspect, an embodiment of the present invention provides a face detection method, including:
carrying out scaling processing on the face image to be detected to obtain the face image to be detected with a plurality of scales;
processing the face images to be detected in multiple scales through a first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales;
respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales and/or the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales to obtain second screening candidate frames;
and processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image.
As an embodiment of the present invention, the processing the second screening candidate frame sequentially through the second stage neural network model and the third stage neural network model to obtain a face image includes:
processing the second screening candidate frame through a second-level neural network model to obtain a second candidate frame, and removing the overlapped second candidate frame to obtain a third screening candidate frame with multiple scales;
respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames;
and processing the fourth screening candidate frame through a third-level neural network model to obtain a face image.
As an embodiment of the present invention, the processing the fourth filtering candidate frame through the third-level neural network model to obtain a face image includes:
processing the fourth screening candidate frame through a third-level neural network model to obtain a third candidate frame, and removing the overlapped third candidate frame to obtain a fifth screening candidate frame with multiple scales;
and acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frames of which the third face positions exceed the position threshold values of the corresponding scales and/or the fifth screening candidate frames of which the third face areas exceed the area threshold values of the corresponding scales to obtain the face images.
As an embodiment of the present invention, before removing the first filtering candidate frame whose first face position exceeds the position threshold of the corresponding scale, the method further includes:
acquiring a first preset number of face image samples, and determining the positions of faces in the face image samples;
determining a preset position threshold according to the positions of the faces in the face image samples;
and zooming the preset position threshold to obtain position thresholds of a plurality of scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.
As an embodiment of the present invention, before removing the first screening candidate frame whose first human face area exceeds the area threshold of the corresponding scale and obtaining the second screening candidate frame, the method further includes:
acquiring a second preset number of face image samples, and determining the area of the face in each face image sample;
determining a preset area threshold according to the area of the face in each face image sample;
and zooming the preset area threshold to obtain area thresholds of a plurality of scales, wherein the zooming scale of the preset area threshold is the same as that of the face image to be detected.
As an embodiment of the invention, the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.
As an embodiment of the present invention, before performing scaling processing on the face image to be detected, the method further includes:
and constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain a face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.
In a second aspect, an embodiment of the present invention provides a face detection apparatus, including:
the scaling module is used for scaling the face image to be detected to obtain the face images to be detected with a plurality of scales;
the first processing module is used for processing the face images to be detected in multiple scales through the first-level neural network model to obtain a first candidate frame, removing the overlapped first candidate frame and obtaining a first screening candidate frame in multiple scales;
the screening module is used for respectively acquiring a first face position and/or a first face area of each first screening candidate frame, removing the first screening candidate frames of which the first face positions exceed the position threshold values of the corresponding scales, and/or removing the first screening candidate frames of which the first face areas exceed the area threshold values of the corresponding scales, and obtaining second screening candidate frames;
and the second processing module is used for processing the second screening candidate frame sequentially through the second-level neural network model and the third-level neural network model to obtain the face image.
In a third aspect, an embodiment of the present invention provides a face detection device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored in the memory, so that the at least one processor executes the face detection method according to any one of the first aspect of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the face detection method according to any one of the first aspect of the embodiments of the present invention is implemented.
According to the face detection method and device provided by the embodiment of the invention, the face images to be detected in multiple scales are processed through the first-level neural network model to obtain the first candidate frame, the overlapped first candidate frame is removed to obtain the first screening candidate frame, then the first screening candidate frame is further screened through the first face position and/or the first face area, the first screening candidate frame with the first face position exceeding the position threshold value of the corresponding scale and/or the first screening candidate frame with the first face area exceeding the area threshold value of the corresponding scale are removed to obtain the second screening candidate frame, so that the number of candidate frames is reduced, the calculation amount of the second-level neural network model and the third-level neural network model is reduced, and the face detection efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first flowchart of a face detection method according to an embodiment of the present invention;
fig. 2 is a second flowchart of a face detection method according to an embodiment of the present invention;
fig. 3 is a third flowchart of the face detection method according to the embodiment of the present invention;
fig. 4 is a fourth flowchart of a face detection method according to an embodiment of the present invention;
fig. 5 is a first schematic structural diagram of a face detection device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of a face detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The MTCNN neural network model generally includes three-level neural network models, i.e., a P-net neural network model, an R-net neural network model, and an O-net neural network model. The P-net neural network model is used for generating a candidate frame, and the R-net neural network model and the O-net neural network model are used for screening the candidate frame to obtain a face image. In the existing MTCNN network algorithm, too many face candidate frames are screened, so that the calculation amount is too large when the face candidate frames are further screened subsequently, and the face detection efficiency is reduced. The embodiment of the invention improves the MTCNN algorithm to improve the face detection efficiency.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Referring to fig. 1, fig. 1 is a flowchart illustrating a first method for detecting a face according to an embodiment of the present invention, as shown in fig. 1, the method according to the embodiment may include:
and S101, carrying out scaling processing on the face image to be detected to obtain the face images to be detected with a plurality of scales.
In the embodiment of the invention, the face image to be detected is the face image of the user acquired by the image acquisition device. And carrying out pyramid level scaling on the face image to be detected to obtain the face image to be detected with a plurality of scales. The specific scaling processing steps include: determining a basic scaling scale a, scaling the face image to be detected according to the basic scaling scale, and iterating according to an iteration strategy that the area of the scaled image is a times of the area of the previous layer of image to obtain a plurality of scalesFor example, the basic scaling scale is a, the area of the face image to be detected is s, and the areas of the face images to be detected of multiple scales obtained after scaling processing are a, s × a, s × a from high to low2,…,s×an. The base scaling scale may be set as desired, for example, the base scaling scale is 0.72.
And S102, processing the face images to be detected in multiple scales through the first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales.
In an embodiment of the invention, the first stage neural network model comprises convolutional layers of 3 × 3 convolutional templates. The size of an input image area of the first neural network model is 12 x 12, the first-level neural network model performs face detection on each 12 x 12 area in the face images to be detected in multiple scales, whether a face exists in the image of each 12 x 12 area is judged, a face regression frame and face key point positioning are given, and a first candidate frame is obtained. The face regression frame indicates the precise position of the face frame, for example, face coordinates, and the face key point location includes 5 key points, which are the position of the left eye, the position of the right eye, the position of the nose, the position of the left side of the mouth, and the position of the right side of the mouth.
And after the first candidate frames are obtained, removing the overlapped first candidate frames through a non-maximum suppression algorithm (NMS) to obtain first screening candidate frames, wherein the scales of the first screening candidate frames are different.
Step S103, respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding the position threshold values of the corresponding scales and/or the first screening candidate frames with the first face areas exceeding the area threshold values of the corresponding scales to obtain second screening candidate frames.
In the embodiment of the invention, the accurate position of each first candidate frame, namely the face position, such as face coordinates, can be obtained through the first-level neural network model, the face area can be obtained according to the face position, and the first screening candidate frames are further screened through the face position and/or the face area, so that the number of the first candidate frames is reduced.
In a possible implementation manner, the first screening candidate frame is further screened through the face position, and the specific implementation manner is as follows: and acquiring first face positions of the first screening candidate frames with different scales, and removing the first screening candidate frames with the first face positions exceeding the position threshold values of the corresponding scales to obtain second screening candidate frames.
In another possible implementation manner, the first screening candidate frame is further screened through the face area, and the specific implementation manner is as follows: and obtaining the first face areas of the first screening candidate frames with different scales, and removing the first screening candidate frames with the first face areas exceeding the area threshold values of the corresponding scales to obtain second screening candidate frames.
In another possible implementation manner, the first screening candidate frame is further screened according to the face position and the face area, and the specific implementation manner is as follows: the method comprises the steps of obtaining first face positions and first face areas of a plurality of first screening candidate frames with different scales, removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales, removing the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales, and obtaining second screening candidate frames.
And step S104, processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image.
In an embodiment of the invention, the second level neural network model comprises 2 convolutional layers of 3 × 3 convolutional templates, 1 convolutional layer of 2 × 2 convolutional templates, and 1 fully-connected layer. The third level neural network model includes 2 convolutional layers of 3 × 3 convolutional templates, 1 convolutional layer of 2 × 2 convolutional templates, and 1 fully-connected layer. And inputting the second screening candidate frame into a second-level neural network model, removing the non-face frame by the second-level neural network model, and finely adjusting the first screening candidate frame to reduce the range of the face candidate frame. And taking the output of the second-level neural network model as the input of a third-level neural network model, and further screening by the third-level neural network model to obtain a face image. The face image only comprises the face of the user and does not comprise other faces in the environment.
According to the embodiment of the invention, the first-level neural network model is used for processing the face images to be detected in multiple scales to obtain the first candidate frame, the overlapped first candidate frame is removed to obtain the first screening candidate frame, then the first screening candidate frame is further screened by the first face position and/or the first face area, the first screening candidate frame with the first face position exceeding the position threshold value of the corresponding scale and/or the first screening candidate frame with the first face area exceeding the area threshold value of the corresponding scale are removed to obtain the second screening candidate frame, so that the number of candidate frames is reduced, the calculation amount of the second-level neural network model and the third-level neural network model is reduced, and the face detection efficiency is improved.
Referring to fig. 2, fig. 2 is a second flowchart of a face detection method according to an embodiment of the present invention, as shown in fig. 2, based on the embodiment shown in fig. 1, the method of this embodiment describes in detail a specific implementation manner of step S104, and the method of this embodiment may include:
step S201, the second screening candidate frame is processed through a second-level neural network model to obtain a second candidate frame, and the overlapped second candidate frame is removed to obtain a third screening candidate frame with multiple scales.
Step S202, respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames.
Step S203, processing the fourth screening candidate frame through a third-level neural network model to obtain a third candidate frame, and removing the overlapped third candidate frame to obtain a fifth screening candidate frame with multiple scales.
Step S204, acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frame of which the third face position exceeds the position threshold value of the corresponding scale and/or the fifth screening candidate frame of which the third face area exceeds the area threshold value of the corresponding scale to obtain the face image.
In the embodiment of the invention, the second candidate frame output by the second-level neural network model and the third candidate frame output by the third-level neural network model are further screened respectively through a non-maximum suppression algorithm, a face position and/or a face area.
And taking the first screening candidate frame as the input of a second neural network model, removing the non-face frame by the second neural network model, finely adjusting the first screening candidate frame, reducing the range of the face candidate frame, outputting the second candidate frame by the second neural network model, removing the overlapped second candidate frame by a non-maximum suppression algorithm (NMS) to obtain third screening candidate frames, wherein the scales of the third screening candidate frames are different. And further screening the third screening candidate frame by the face area and/or the face position to obtain a fourth screening candidate frame, wherein the specific implementation manner is similar to that of step S103 shown in fig. 1, and the embodiment of the present invention is not described again.
And taking the fourth screening candidate frame as the input of the third neural network model, outputting a third candidate frame, and removing the overlapped third candidate frame through a non-maximum suppression algorithm (NMS) to obtain fifth screening candidate frames, wherein the scales of the fifth screening candidate frames are different. And further screening the fifth screening candidate frame according to the face area and/or the face position to obtain a face image, wherein a specific implementation manner is similar to that of step S103 shown in fig. 1, and details are not repeated in the embodiment of the present invention.
The embodiment of the invention further screens the third screening candidate frame through the second face position and/or the second face area, and further screens the fifth screening candidate frame through the third face position and/or the third face area, thereby further improving the face detection efficiency.
Referring to fig. 3, fig. 3 is a flowchart of a third flow chart of a face detection method according to an embodiment of the present invention, as shown in fig. 3, before removing, in step S103, a first filtering candidate frame in which a first face position exceeds a position threshold of a corresponding scale, in the method according to the present embodiment, the method according to the present embodiment may further include:
step S301, a first preset number of face image samples are obtained, and the positions of faces in the face image samples are determined.
Step S302, a preset position threshold value is determined according to the positions of the human faces in the human face image samples.
Step S303, zooming the preset position threshold to obtain position thresholds of multiple scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.
In the embodiment of the invention, the face image sample is acquired by the image acquisition device, and the acquisition scene of the face image sample is the same as the acquisition scene of the face image to be detected. The human face image sample is a human face image of a sample acquirer standing in a specified position range acquired by the image acquisition device, and the human face image to be detected is also an image of the user standing in the specified position range acquired by the image acquisition device. For example, when bank deposit business is transacted, the face image sample is a face image acquired by a bank business transaction window, and the face image to be detected is also a face image acquired by the bank business transaction window.
The position of the face in each face image sample can be determined through a face recognition technology, and a preset position threshold value is determined according to the position of the face in each face image sample. For example, the positions of the faces whose positions in the face image sample exceed the preset range are removed, the maximum value of the positions of the remaining faces is the upper limit of the preset position threshold, and the minimum value is the lower limit of the preset position threshold. And similarly carrying out pyramid level scaling processing on the preset position threshold to obtain preset position thresholds of multiple scales, wherein the scaling scale of the preset position threshold is the same as that of the face image to be detected. The specific scaling processing steps include: and determining a basic scaling scale a, scaling the preset position threshold according to the basic scaling scale, and iterating according to an iteration strategy that the scaled value is a times of the previous value to obtain position thresholds of multiple scales.
Referring to fig. 4, fig. 4 is a fourth flowchart of a face detection method according to an embodiment of the present invention, as shown in fig. 4, based on the embodiment shown in fig. 1, before removing the first filtering candidate frame whose first face area exceeds the area threshold of the corresponding scale in step S103 to obtain the second filtering candidate frame, the method of this embodiment may further include:
step S401, obtaining a second preset number of face image samples, and determining the area of the face in each face image sample.
Step S402, determining a preset area threshold according to the area of the face in each face image sample.
And S403, performing scaling processing on the preset area threshold to obtain area thresholds of multiple scales, wherein the scaling scale of the preset area threshold is the same as that of the face image to be detected.
In the embodiment of the present invention, the first preset number and the second preset number may be the same or different, and the embodiment of the present invention is not particularly limited.
The position of the face in each face image sample can be determined through a face recognition technology, the area of the face is determined according to the position of the face in each face image sample, and then the preset area threshold value is determined according to the area of the face in each face image sample. For example, the area of the face in the face image sample is removed to exceed the area of the face within the preset range, the maximum value of the areas of the remaining faces is the upper limit of the preset area threshold, and the minimum value is the lower limit of the preset area threshold. And carrying out pyramid level scaling treatment on the preset area threshold value to obtain area threshold values of multiple scales, wherein the scaling scale of the preset area threshold value is the same as that of the face image to be detected. The specific scaling processing steps include: and determining a basic scaling scale a, scaling the preset area threshold according to the basic scaling scale, and iterating according to an iteration strategy that the scaled value is a times of the previous value to obtain area thresholds of multiple scales.
As an embodiment of the invention, the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.
The method of this embodiment may further include: and constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain a face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.
In the embodiment of the invention, the face detection neural network model comprises: p-net (full name of Proposal Network) neural Network model, R-net (full name of Refine Network) neural Network model, and O-net (full name of Output Network) neural Network model. And training the initial face detection neural network model through the training sample to obtain the face detection neural network model for face detection.
Fig. 5 is a schematic structural diagram of a first face detection apparatus according to an embodiment of the present invention, as shown in fig. 5, a face detection apparatus 500 includes a scaling module 501, a first processing module 502, a screening module 503, and a second processing module 504, and specific functions of each module are as follows:
the scaling module 501 is configured to scale the face image to be detected to obtain face images to be detected in multiple scales.
The first processing module 502 is configured to process the face images to be detected in multiple scales through the first-stage neural network model to obtain a first candidate frame, and remove the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales.
The screening module 503 is configured to obtain a first face position and/or a first face area of each first screening candidate frame, and remove the first screening candidate frame whose first face position exceeds the position threshold of the corresponding scale, and/or the first screening candidate frame whose first face area exceeds the area threshold of the corresponding scale, so as to obtain a second screening candidate frame.
And the second processing module 504 is configured to process the second screening candidate frame sequentially through the second-level neural network model and the third-level neural network model to obtain a face image.
As an embodiment of the present invention, the second processing module 504 is specifically configured to process the second candidate screening frame through a second-level neural network model to obtain a second candidate screening frame, and remove the overlapped second candidate screening frame to obtain a third candidate screening frame with multiple scales;
respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames;
and processing the fourth screening candidate frame through a third-level neural network model to obtain a face image.
As an embodiment of the present invention, the second processing module 504 is specifically configured to process the fourth candidate screening frame through a third-level neural network model to obtain a third candidate screening frame, and remove the overlapped third candidate screening frame to obtain fifth candidate screening frames with multiple scales;
and acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frames of which the third face positions exceed the position threshold values of the corresponding scales and/or the fifth screening candidate frames of which the third face areas exceed the area threshold values of the corresponding scales to obtain the face images.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a second face detection apparatus according to an embodiment of the present invention, as shown in fig. 6, based on the embodiment shown in fig. 5, the apparatus of this embodiment may further include: the first determining module 505 is configured to obtain a first preset number of face image samples, and determine positions of faces in the face image samples;
determining a preset position threshold according to the positions of the faces in the face image samples;
and zooming the preset position threshold to obtain position thresholds of a plurality of scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.
As an embodiment of the present invention, the apparatus of this embodiment may further include: the second determining module 506, the second determining module 506 is configured to obtain a second preset number of face image samples, and determine areas of faces in the face image samples;
determining a preset area threshold according to the area of the face in each face image sample;
and zooming the preset area threshold to obtain area thresholds of a plurality of scales, wherein the zooming scale of the preset area threshold is the same as that of the face image to be detected.
As an embodiment of the invention, the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.
As an embodiment of the present invention, the apparatus of this embodiment may further include: and the model training module 507, the model training module 507 is used for constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain the face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.
The apparatus of the present embodiment may be used to implement the method embodiments shown in fig. 1 to fig. 4, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 7 is a schematic diagram of a hardware structure of a face detection device according to an embodiment of the present invention. As shown in fig. 7, the face detection apparatus 700 provided in the present embodiment includes: at least one processor 701 and a memory 702. The face detection apparatus 700 further comprises a communication section 703. The processor 701, the memory 702, and the communication section 703 are connected by a bus 704.
In a specific implementation process, the at least one processor 701 executes the computer-executable instructions stored in the memory 702, so that the at least one processor 701 executes the face detection method in any one of the above-described method embodiments. The communication component 703 is used for communicating with the terminal device and/or the server.
For a specific implementation process of the processor 701, reference may be made to the above method embodiments, which implement principles and technical effects similar to each other, and details of this embodiment are not described herein again.
In the embodiment shown in fig. 7, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the face detection method in any of the above method embodiments is implemented.
The computer-readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A face detection method, comprising:
carrying out scaling processing on the face image to be detected to obtain the face image to be detected with a plurality of scales;
processing the face images to be detected in multiple scales through a first-level neural network model to obtain a first candidate frame, and removing the overlapped first candidate frame to obtain a first screening candidate frame in multiple scales;
respectively obtaining a first face position and/or a first face area of each first screening candidate frame, and removing the first screening candidate frames with the first face positions exceeding position thresholds of corresponding scales and/or the first screening candidate frames with the first face areas exceeding area thresholds of corresponding scales to obtain second screening candidate frames;
and processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image.
2. The method of claim 1, wherein the processing the second screening candidate frame sequentially through a second-level neural network model and a third-level neural network model to obtain a face image comprises:
processing the second screening candidate frame through a second-level neural network model to obtain a second candidate frame, and removing the overlapped second candidate frame to obtain a third screening candidate frame with multiple scales;
respectively obtaining a second face position and/or a second face area of each third screening candidate frame, and removing the third screening candidate frames of which the second face positions exceed the position threshold values of the corresponding scales and/or the third screening candidate frames of which the second face areas exceed the area threshold values of the corresponding scales to obtain fourth screening candidate frames;
and processing the fourth screening candidate frame through a third-level neural network model to obtain a face image.
3. The method of claim 2, wherein the processing the fourth filtering candidate frame through the third-level neural network model to obtain a face image comprises:
processing the fourth screening candidate frame through a third-level neural network model to obtain a third candidate frame, and removing the overlapped third candidate frame to obtain a fifth screening candidate frame with multiple scales;
and acquiring a third face position and/or a third face area of each fifth screening candidate frame, and removing the fifth screening candidate frames of which the third face positions exceed the position threshold values of the corresponding scales and/or the fifth screening candidate frames of which the third face areas exceed the area threshold values of the corresponding scales to obtain the face images.
4. The method of claim 1, wherein removing the first screening candidate box with the first face position exceeding the position threshold of the corresponding scale further comprises:
acquiring a first preset number of face image samples, and determining the positions of faces in the face image samples;
determining a preset position threshold according to the positions of the faces in the face image samples;
and zooming the preset position threshold to obtain position thresholds of a plurality of scales, wherein the zooming scale of the preset position threshold is the same as that of the face image to be detected.
5. The method of claim 1, wherein removing the first screening candidate box having the first human face area exceeding the area threshold of the corresponding scale before obtaining the second screening candidate box, further comprises:
acquiring a second preset number of face image samples, and determining the area of the face in each face image sample;
determining a preset area threshold according to the area of the face in each face image sample;
and zooming the preset area threshold to obtain area thresholds of a plurality of scales, wherein the zooming scale of the preset area threshold is the same as that of the face image to be detected.
6. The method of any one of claims 1 to 5, wherein the first-level neural network model is a P-net neural network model, the second-level neural network model is an R-net neural network model, and the third-level neural network model is an O-net neural network model.
7. The method according to claim 6, wherein before the scaling process of the face image to be detected, the method further comprises:
and constructing an initial face detection neural network model, and training the initial face detection neural network model to obtain a face detection neural network model, wherein the face detection neural network model comprises a P-net neural network model, an R-net neural network model and an O-net neural network model.
8. A face detection apparatus, comprising:
the scaling module is used for scaling the face image to be detected to obtain the face images to be detected with a plurality of scales;
the first processing module is used for processing the face images to be detected in multiple scales through the first-level neural network model to obtain a first candidate frame, removing the overlapped first candidate frame and obtaining a first screening candidate frame in multiple scales;
the screening module is used for respectively acquiring a first face position and/or a first face area of each first screening candidate frame, removing the first screening candidate frames of which the first face positions exceed the position threshold values of the corresponding scales, and/or removing the first screening candidate frames of which the first face areas exceed the area threshold values of the corresponding scales, and obtaining second screening candidate frames;
and the second processing module is used for processing the second screening candidate frame sequentially through the second-level neural network model and the third-level neural network model to obtain the face image.
9. A face detection apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the face detection method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, implement the face detection method of any one of claims 1 to 7.
CN201910251460.4A 2019-03-29 2019-03-29 Face detection method and device Pending CN111753598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910251460.4A CN111753598A (en) 2019-03-29 2019-03-29 Face detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910251460.4A CN111753598A (en) 2019-03-29 2019-03-29 Face detection method and device

Publications (1)

Publication Number Publication Date
CN111753598A true CN111753598A (en) 2020-10-09

Family

ID=72672503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910251460.4A Pending CN111753598A (en) 2019-03-29 2019-03-29 Face detection method and device

Country Status (1)

Country Link
CN (1) CN111753598A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799877A (en) * 2012-09-11 2012-11-28 上海中原电子技术工程有限公司 Method and system for screening face images
CN107145867A (en) * 2017-05-09 2017-09-08 电子科技大学 Face and face occluder detection method based on multitask deep learning
CN107545249A (en) * 2017-08-30 2018-01-05 国信优易数据有限公司 A kind of population ages' recognition methods and device
CN107688786A (en) * 2017-08-30 2018-02-13 南京理工大学 A kind of method for detecting human face based on concatenated convolutional neutral net
CN108171196A (en) * 2018-01-09 2018-06-15 北京智芯原动科技有限公司 A kind of method for detecting human face and device
WO2018188453A1 (en) * 2017-04-11 2018-10-18 腾讯科技(深圳)有限公司 Method for determining human face area, storage medium, and computer device
CN109359603A (en) * 2018-10-22 2019-02-19 东南大学 A kind of vehicle driver's method for detecting human face based on concatenated convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799877A (en) * 2012-09-11 2012-11-28 上海中原电子技术工程有限公司 Method and system for screening face images
WO2018188453A1 (en) * 2017-04-11 2018-10-18 腾讯科技(深圳)有限公司 Method for determining human face area, storage medium, and computer device
CN107145867A (en) * 2017-05-09 2017-09-08 电子科技大学 Face and face occluder detection method based on multitask deep learning
CN107545249A (en) * 2017-08-30 2018-01-05 国信优易数据有限公司 A kind of population ages' recognition methods and device
CN107688786A (en) * 2017-08-30 2018-02-13 南京理工大学 A kind of method for detecting human face based on concatenated convolutional neutral net
CN108171196A (en) * 2018-01-09 2018-06-15 北京智芯原动科技有限公司 A kind of method for detecting human face and device
CN109359603A (en) * 2018-10-22 2019-02-19 东南大学 A kind of vehicle driver's method for detecting human face based on concatenated convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘宏哲;杨少鹏;袁家政;王雪峤;薛建明;: "基于单一神经网络的多尺度人脸检测", 《电子与信息学报》 *
李帅杰;陈虎;兰时勇;: "基于级联神经网络的人脸检测", 《现代计算机(专业版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel

Similar Documents

Publication Publication Date Title
CN109255352B (en) Target detection method, device and system
CN109949290B (en) Pavement crack detection method, device, equipment and storage medium
CN109829506B (en) Image processing method, image processing device, electronic equipment and computer storage medium
CN108234347B (en) Method, device, network equipment and storage medium for extracting feature string
CN109086734B (en) Method and device for positioning pupil image in human eye image
CN110443350B (en) Model quality detection method, device, terminal and medium based on data analysis
CN109685805B (en) Image segmentation method and device
CN112489063A (en) Image segmentation method, and training method and device of image segmentation model
CN110390344B (en) Alternative frame updating method and device
CN114943307A (en) Model training method and device, storage medium and electronic equipment
CN110119736B (en) License plate position identification method and device and electronic equipment
CN113221601A (en) Character recognition method, device and computer readable storage medium
CN110020939B (en) Device, method and storage medium for establishing default loss rate prediction model
CN114143734A (en) Data processing method and device for 5G Internet of things network card flow acquisition
CN111753598A (en) Face detection method and device
CN111476741A (en) Image denoising method and device, electronic equipment and computer readable medium
CN116527398A (en) Internet of things card risk identification method, device, equipment and storage medium
CN111563829A (en) Power price prediction method and device and power price prediction model training method and device
CN111597966B (en) Expression image recognition method, device and system
CN110135464B (en) Image processing method and device, electronic equipment and storage medium
CN114549884A (en) Abnormal image detection method, device, equipment and medium
CN114399432A (en) Target identification method, device, equipment, medium and product
CN113298182A (en) Early warning method, device and equipment based on certificate image
CN112949526A (en) Face detection method and device
CN113033542A (en) Method and device for generating text recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination