CN111241924A - Face detection and alignment method and device based on scale estimation and storage medium - Google Patents

Face detection and alignment method and device based on scale estimation and storage medium Download PDF

Info

Publication number
CN111241924A
CN111241924A CN201911387732.XA CN201911387732A CN111241924A CN 111241924 A CN111241924 A CN 111241924A CN 201911387732 A CN201911387732 A CN 201911387732A CN 111241924 A CN111241924 A CN 111241924A
Authority
CN
China
Prior art keywords
scale
face
attention
anchor
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911387732.XA
Other languages
Chinese (zh)
Other versions
CN111241924B (en
Inventor
徐小丹
刘小扬
何学智
王欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Newland Digital Technology Co ltd
Original Assignee
Newland Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Newland Digital Technology Co ltd filed Critical Newland Digital Technology Co ltd
Priority to CN201911387732.XA priority Critical patent/CN111241924B/en
Publication of CN111241924A publication Critical patent/CN111241924A/en
Application granted granted Critical
Publication of CN111241924B publication Critical patent/CN111241924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection and alignment method based on scale estimation, which comprises the steps of inputting a picture into a scale estimation network, and outputting a scale with a scale probability vector larger than a preset threshold; when the scale estimation network is trained, attention weights are pre-distributed to the face in the image according to the face scale, and a loss function of the scale estimation network during training comprises two classification losses of the face attention diagram; scaling an image to be detected through a scale obtained by a scale estimation network to obtain a plurality of scale images; inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame; and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame. The method has the advantages of strong adaptability and higher detection for small-scale human faces.

Description

Face detection and alignment method and device based on scale estimation and storage medium
Technical Field
The invention relates to the technical field of video monitoring and image processing, in particular to a face detection and alignment method and device based on scale estimation and a storage medium.
Background
With the rapid development of science and technology, computer vision is increasingly popular in social life, a face detection and alignment technology is one of the research hotspots, and has numerous applications in real life, such as face brushing access control, mobile phone unlocking, security monitoring, identity verification and the like, and the application of the face detection and alignment technology brings great convenience to daily life. In an actual scene, an image may simultaneously contain faces of different scales, such as a small-scale face and a large-scale face, and in order to be able to detect faces of different scales simultaneously, in the existing method, firstly, an image pyramid which is uniformly distributed is used to detect on a dense pyramid image; secondly, a large network is designed to detect on a multi-scale characteristic diagram. However, these methods have the disadvantage of high computational complexity. In addition, in order to reduce the number of pyramids, some detection technologies use a scale estimation method, and when a multi-scale face exists in an image, the method is easy to ignore a small-scale face, which causes a false detection missing of the face, and brings inconvenience to the application of face detection.
Disclosure of Invention
The invention aims to solve the technical problem of how to provide a method and a device for detecting and aligning a human face, which have low computational complexity and are not easy to ignore small-scale human faces.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a face detection and alignment method based on scale estimation comprises the following steps:
inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
scaling an image to be detected through a scale obtained by a scale estimation network to obtain images of multiple scales;
inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
Preferably, the training of the scale estimation network comprises:
labeling a face scale vector: presetting a plurality of scale intervals, taking the average value of the width and the height of the face as the face scale, and if the face belonging to one interval scale exists, setting the corresponding score on the score vector as 1; if no face belonging to the interval scale exists, setting the corresponding score on the score vector as 0;
making a face attention diagram: making a face mask, and pre-distributing attention weight according to the face scale, wherein the formula for pre-distributing the attention weight comprises the following steps:
Figure BDA0002344032350000021
wherein s is a face scale, and sigma and mu are probability distribution parameters;
loss of class two classes Using metricssAnd loss of binary classification of face attention mapsaAs a loss function, the loss is losss+λlossaWherein λ is a weight coefficient.
Preferably, the first and second electrodes are formed of a metal,
Figure BDA0002344032350000022
Nadenotes the number of scale intervals, pnA label representing the nth scale interval,
Figure BDA0002344032350000023
the estimation result of the nth scale interval is shown.
Preferably, the first and second electrodes are formed of a metal,
Figure BDA0002344032350000024
Naa number of pixels, q, representing the face attention mapnA label representing the nth pixel is attached to the first pixel,
Figure BDA0002344032350000025
indicating the estimation result of the nth pixel.
Preferably, the model training process of anchor Pnet and anchor Rnet comprises:
anchor Pnet training: the anchor Pnet is a full convolution network, K anchors with different proportions are preset, if the intersection ratio of a predefined frame and a marking frame corresponding to the anchors is greater than a first preset value, the anchors are marked as positive samples, and meanwhile classification and regression calculation are involved; if the intersection ratio is smaller than a second preset value, the negative sample is considered to be only involved in classification and not involved in regression calculation; if the intersection ratio is larger than a second preset value and smaller than a first preset value, the samples are not classified and judged, and only participate in regression; during training, K anchors are required to be classified and detected simultaneously;
anchor Rnet training: and generating required training data by using the result and the labeling frame after the Anchor Pnet detection and a preset anchor, and simultaneously performing tasks during training, wherein the tasks comprise face classification, boundary frame regression and feature point positioning on K preset anchors.
Preferably, the non-face candidate box is removed by a non-maximum suppression algorithm, and the steps are performed: when a non-maximum suppression algorithm is used to remove the redundant boxes to obtain the detection boxes,
also includes the local maximum must cover the number NnIs not a very great limitation of (1), wherein NnIs the coverage threshold.
Preferably, the scale estimation network comprises a feature extraction module, an attention-assisted prediction module and a prediction module;
the feature extraction module is a full convolution network and is used for generating features;
the attention auxiliary prediction module is used for deconvoluting the feature map into the size of an original map and learning a human face attention map and human face attention features;
and the prediction module is used for obtaining a scale probability vector by combining the characteristics of the characteristic module and the attention characteristics of the human face and outputting the scale with the scale probability vector larger than a preset threshold value.
In a second aspect, the present invention further provides a face detection and alignment system based on scale estimation, including:
a scale estimation module: inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
a scaling module: scaling an image to be detected through a scale obtained by a scale estimation network to obtain images of multiple scales;
anchor Pnet Module: inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
anchor Rnet module: and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
In a third aspect, the present invention further provides an electronic device for face detection and alignment based on scale estimation, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for face scale detection and alignment when executing the program.
In a fourth aspect, the present invention further provides a computer-readable storage medium for face detection and alignment based on scale estimation, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the face detection and alignment based on scale estimation method.
By adopting the technical scheme, because the anchor-based cascade face detection method is adopted, the candidate area of the face is extracted by using simple and quick anchor Pnet, and then is gradually corrected by using relatively complex anchor Rnet, so that the face detection can be quicker and more accurate, the face detection can adapt to a certain range of scales, the adaptability to scale estimation results is enhanced, and simultaneously two tasks of face detection and alignment are carried out. In addition, the attention-based human face scale estimation network is adopted, so that parameters do not need to be adjusted according to different scenes, different scenes can be self-adapted, the attention-based scale estimation network can detect small-scale human faces more highly, and the small-scale human faces can be prevented from being ignored during detection.
Drawings
FIG. 1 is a flowchart illustrating steps of an embodiment of a scale estimation-based face detection and alignment method according to the present invention;
FIG. 2 is an original image to be processed for face detection and alignment based on scale estimation according to the present invention;
FIG. 3 is a prepared face attention diagram of the face detection and alignment based on scale estimation of the present invention;
FIG. 4 is a block diagram of a scale estimation network;
FIG. 5 is a diagram showing the structure of anchor Pnet;
FIG. 6 is a structural diagram of anchor Rnet.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The technical scheme of the invention provides a face detection and alignment method based on scale estimation, which comprises the following steps:
s10: inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
s20: scaling an image to be detected through a scale obtained by a scale estimation network to obtain a plurality of scale images;
s30: inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
s40: and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
In step S10, the training process of the scale estimation network is:
labeling a face scale vector: presetting a plurality of scale intervals, taking the average value of the width and the height of the face as the face scale, and if the face belonging to one interval scale exists, setting the corresponding score on the score vector as 1; if no face belonging to the interval scale exists, setting the corresponding score on the score vector as 0; making a face attention diagram: making a face mask, and pre-distributing attention weight according to the face scale; loss of class two classes Using metricssAnd loss of binary classification of face attention mapsaAs a loss function, the loss is losss+λlossaWherein λ is a weight coefficient.
By adopting the technical scheme, because the anchor-based cascade human face detection method is adopted, the candidate region of the human face is extracted by using simple and rapid anchor Pnet, and then is gradually corrected by using relatively complex anchor Rnet, so that the human face detection can be more rapid and accurate, and can adapt to a certain range of scales. The adaptability to the scale estimation result is enhanced, two tasks of face detection and alignment are simultaneously carried out, and due to the mutual adaptation of the two networks, a small network structure can achieve good performance. In addition, the attention-based human face scale estimation network is adopted, so that parameters do not need to be adjusted according to different scenes, different scenes can be self-adapted, the attention-based scale estimation network can detect small-scale human faces more highly, and the small-scale human faces can be prevented from being ignored during detection.
In an embodiment of the present invention, the step of implementing face scale estimation includes:
the method comprises the following steps: attention-based face scale estimation.
Designing an attention-based scale estimation network for generating a scale probability vector, and then enabling the face scale probability vector to be larger than a threshold value T1As the final scale S ═ S1,S2,S3.............Sn};
The method comprises the following specific steps:
step 1: attention-based scale estimation network training
Referring to fig. 4, the scale estimation network is composed of a feature extraction module, an attention-assisted prediction module, and a prediction module. The feature extraction module is a full convolution network and is used for generating features; the attention auxiliary prediction module deconvolves the feature graph into the size of an original graph, learns the human face attention graph and learns the human face attention feature; and the prediction module combines the feature module features and the human face attention features to obtain a scale probability vector, and outputs the scale with the scale probability vector larger than a preset threshold value.
And 1.1, marking and manufacturing the human face scale vector. Due to the adaptability of the detection network, the scale interval of the face scale can be set to be larger, and the scale interval can be 21. Preset scale X ═ {2 ═ 22.5,23.5,24.5........2nThe scale space is XS { (2) {2,23),(23,24)..........(2n-0.5,2n+0.5) And (6) marking the face dimension s as the mean value of the face width and height, and marking 0 or 1 in each dimension interval according to the corresponding preset dimension. In implementation, the predetermined dimension X is {2 ═ 22,23,24........28In total 7 scale spaces with scale range of [2 ]2,28]。
Step 1.2, making the human face attention map. Taking a binary segmentation image formed by inscribed ellipses in a face labeling frame as a face mask, and pre-distributing attention weight to the face in the image according to the face scale, wherein the pre-distributing attention weight formula is as follows:
Figure BDA0002344032350000055
wherein s is a face scale, and sigma and mu are probability distribution parameters;
as shown in the figure, fig. 2 is an original figure, and fig. 3 is a prepared human face attention diagram.
Step 1.3 uses a multitask penalty function.
The loss of training is composed of two parts, one is the multi-class two-class loss of scaleaTwo is the two-class loss of the human attention mapa. Loss of trainings+λlossa
Wherein the content of the first and second substances,
Figure BDA0002344032350000051
Nadenotes the number of scale intervals, pnA label representing the nth scale interval,
Figure BDA0002344032350000052
the estimation result of the nth scale interval is shown.
Figure BDA0002344032350000053
In the formula, NaRepresenting the number of pixels of the face attention map. q. q.snA label representing the nth pixel is attached to the first pixel,
Figure BDA0002344032350000054
indicating the estimation result of the nth pixel. For the weighting factor, 2 may be used for implementation.
Step 2: the network test is estimated based on the scale of attention.
In the step 2.1 test, the attention auxiliary prediction module does not participate, and only needs to be usedA forward feature extraction module and a prediction module. The implementation is to down-sample the picture to 256 × 256 and then input it into the attention-based scale estimation network to obtain a 1 × 7 scale probability vector. Will be greater than the threshold value T0S ═ S as a suggested face scale1,s2..........sn}。
Step two: the method for cascade face detection and alignment based on anchors refers to fig. 5 and 6.
The method for detecting and aligning the cascaded human faces based on the anchor is composed of two convolutional neural network cascades, namely anchor Pnet and anchor Rnet. Firstly, a candidate region of a human face is extracted by using a simple and quick anchor Pnet, and then, a relatively complex anchor Rnet is used for gradually correcting, so that the human face detection can be more quick and accurate. The method comprises the following specific steps:
step 1: anchor-based cascading face detection and alignment method training
Step 1.1: anchor Pnet training.
The anchor Pnet is a full convolution network, and K anchors A ═ a { (a) } with different proportions are designed1,a2..........anMatch with the label box for training. If the Iou value of the predefined frame and the marking frame corresponding to the anchor is greater than 0.65, marking the anchor as a positive sample, and simultaneously participating in classification and regression calculation; if the number is less than 0.3, the negative sample is considered to be only involved in classification and not in regression calculation; for [0.4,0.65 ]]The samples of (1) are not classified and judged, and only participate in regression. During training, K anchors are required to be classified and detected simultaneously.
The anchors may have any aspect ratio, and in the embodiment, for convenience, the aspect ratio is 1, and the anchors are obtained by using the following formula: a isk=γ*ak-1Wherein a is116, γ is 0.709, the number of anchors is 3; the 3 dashed boxes on the 16 × 16 diagram are the preset 3 anchors.
Step 1.2: anchor Rnet is trained.
Generating required training data by using the result and the labeling frame after the detection of the Anchor Pnet and a preset anchor, wherein three tasks are required to be performed simultaneously during training, namely performing three tasks on K preset anchorsFace a ═ a1,a2.........akClassification, bounding box regression, feature point localization with 48 × 48 inputs. The design rule of the anchor is consistent with step 1.1 in the implementation, a148, γ is 0.709, and the anchor number is 3.
Step 2: anchor-based face detection and alignment method test
Step 2.1: anchor Pnet generates a candidate box. Using the scale estimation network to obtain the scale S ═ S1,s2.....snAnd zooming the image to obtain a plurality of scale images. Since the anchor Pnet is a full convolution network, any size of input can be accepted, and images with multiple scales are sequentially input into the Pnet to obtain a large number of candidate frames. Tests show that the places with dense candidate frames are high in probability of faces, and the isolated candidate frames are high in probability of non-face areas. Thus, more non-face candidate boxes may be removed using the improved non-maximum suppression algorithm. The improved non-maximum value suppression algorithm is that the local maximum must cover the number N on the basis of the non-maximum value suppression algorithmnIs not a very great limitation. The improved non-maxima suppression algorithm proceeds as follows:
Figure BDA0002344032350000061
wherein iou represents the cross-over ratio,
Figure BDA0002344032350000071
in this embodiment, the coverage threshold NnNMS threshold N2t0.5, confidence threshold T1=0.6。
Step 2.2: anchor Rnet gave the final result. The candidate frames generated in the first stage are clipped on the original image, scaled to 48 × 48 size, and input into the anchors Rnet, and one 48 × 48 input will result in K candidate frames, corresponding to K anchors respectively, which will be greater than the threshold value T2The detection frame removes the redundant frame by using a non-maximum suppression algorithm to obtain a detection frame, and extracts the redundant frame according to the detection frameAnd taking out the corresponding face characteristic points. The non-maxima suppression algorithm proceeds as follows:
Figure BDA0002344032350000072
in practice, NMS threshold Nt0.5, confidence threshold T2=0.7。
The invention also provides a face detection and alignment system based on scale estimation, which comprises:
a scale estimation module: inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
a scaling module: scaling an image to be detected through a scale obtained by a scale estimation network to obtain a plurality of scale images;
anchor Pnet Module: inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
anchor Rnet module: and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
The invention also provides electronic equipment for detecting and aligning the face based on the scale estimation, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the steps of the method for detecting and aligning the face scale are realized when the processor executes the program. The method comprises the following steps:
inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
: scaling an image to be detected through a scale obtained by a scale estimation network to obtain a plurality of scale images;
inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
The invention also proposes a computer-readable storage medium for face detection and alignment based on scale estimation, on which a computer program is stored, the computer program being executed by a processor to implement the steps of the above-mentioned method for face scale detection and alignment.
The method comprises the following steps:
inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
scaling an image to be detected through a scale obtained by a scale estimation network to obtain a plurality of scale images;
inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims (10)

1. A face detection and alignment method based on scale estimation is characterized by comprising the following steps:
inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
scaling an image to be detected through a scale obtained by a scale estimation network to obtain images of multiple scales;
inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
2. The scale estimation-based face detection and alignment method according to claim 1, wherein the training of the scale estimation network comprises:
labeling a face scale vector: presetting a plurality of scale intervals, taking the average value of the width and the height of the face as the face scale, and if the face belonging to one interval scale exists, setting the corresponding score on the score vector as 1; if no face belonging to the interval scale exists, setting the corresponding score on the score vector as 0;
making a face attention diagram: making a face mask, and pre-distributing attention weight according to the face scale, wherein the formula for pre-distributing the attention weight comprises the following steps:
Figure FDA0002344032340000011
wherein s is a face scale, and sigma and mu are probability distribution parameters;
loss of class two classes Using metricssAnd loss of binary classification of face attention mapsaAs a loss function, the loss is losss+λlossaWherein λ is a weight coefficient.
3. The scale estimation-based face detection and alignment method of claim 2, wherein:
Figure FDA0002344032340000012
Nadenotes the number of scale intervals, pnA label representing the nth scale interval,
Figure FDA0002344032340000013
the estimation result of the nth scale interval is shown.
4. The scale estimation-based face detection and alignment method of claim 2, wherein:
Figure FDA0002344032340000014
Naa number of pixels, q, representing the face attention mapnA label representing the nth pixel is attached to the first pixel,
Figure FDA0002344032340000015
indicating the estimation result of the nth pixel.
5. The scale estimation-based face detection and alignment method of claim 1, wherein: the model training process for anchorponet and anchor Rnet includes:
anchor Pnet training: the anchor Pnet is a full convolution network, K anchors with different proportions are preset, if the intersection ratio of a predefined frame and a marking frame corresponding to the anchors is greater than a first preset value, the anchors are marked as positive samples, and meanwhile classification and regression calculation are involved; if the intersection ratio is smaller than a second preset value, the negative sample is considered to be only involved in classification and not involved in regression calculation; if the intersection ratio is larger than a second preset value and smaller than a first preset value, the samples are not classified and judged, and only participate in regression; during training, K anchors are required to be classified and detected simultaneously;
anchor Rnet training: and generating required training data by using the result and the labeling frame after the Anchor Pnet detection and a preset anchor, and simultaneously performing tasks during training, wherein the tasks comprise face classification, boundary frame regression and feature point positioning on K preset anchors.
6. The scale estimation-based face detection and alignment method of claim 1, wherein: in the execution step: removing the non-face candidate box through a non-maximum suppression algorithm, and executing the following steps: when a non-maximum suppression algorithm is used to remove the redundant boxes to obtain the detection boxes,
also includes the local maximum must cover the number NnIs not a very great limitation of (1), wherein NnIs the coverage threshold.
7. The scale estimation-based face detection and alignment method according to claim 1, wherein the scale estimation network comprises a feature extraction module, an attention-assisted prediction module and a prediction module;
the feature extraction module is a full convolution network and is used for generating features;
the attention auxiliary prediction module is used for deconvoluting the feature map into the size of an original map and learning a human face attention map and human face attention features;
and the prediction module is used for obtaining a scale probability vector by combining the characteristics of the characteristic module and the attention characteristics of the human face and outputting the scale with the scale probability vector larger than a preset threshold value.
8. A face detection and alignment device based on scale estimation is characterized by comprising:
a scale estimation module: inputting the picture into a scale estimation network, and outputting the scale with the scale probability vector larger than a preset threshold value; when the scale estimation network is used for training, attention weights are pre-distributed to faces in the images according to the face scales so as to make a face attention diagram; the loss function of the scale estimation network during training comprises the binary loss of the face attention diagram;
a scaling module: scaling an image to be detected through a scale obtained by a scale estimation network to obtain images of multiple scales;
anchor Pnet Module: inputting the images with multiple scales into an anchor Pnet to obtain multiple candidate frames, and removing non-face candidate frames through a non-maximum suppression algorithm to obtain a pre-processing candidate frame;
anchor Rnet module: and cutting the preprocessing candidate frame on the original image, zooming the preprocessing candidate frame to a preset size, inputting the preprocessing candidate frame into anchor Rnet, removing redundant frames by using a non-maximum suppression algorithm to obtain a detection frame, and extracting corresponding human face characteristic points according to the detection frame.
9. An apparatus for scale estimation based face detection and alignment, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the program, implements the steps of the scale estimation based face detection and alignment method of any one of claims 1-7.
10. A storage medium for scale estimation based face detection and alignment, having a computer program stored thereon, wherein: the computer program is executed by a processor to perform the steps of the scale estimation based face detection and alignment method of any one of claims 1 to 7.
CN201911387732.XA 2019-12-30 2019-12-30 Face detection and alignment method, device and storage medium based on scale estimation Active CN111241924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911387732.XA CN111241924B (en) 2019-12-30 2019-12-30 Face detection and alignment method, device and storage medium based on scale estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911387732.XA CN111241924B (en) 2019-12-30 2019-12-30 Face detection and alignment method, device and storage medium based on scale estimation

Publications (2)

Publication Number Publication Date
CN111241924A true CN111241924A (en) 2020-06-05
CN111241924B CN111241924B (en) 2024-06-07

Family

ID=70864141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911387732.XA Active CN111241924B (en) 2019-12-30 2019-12-30 Face detection and alignment method, device and storage medium based on scale estimation

Country Status (1)

Country Link
CN (1) CN111241924B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783784A (en) * 2020-06-30 2020-10-16 创新奇智(合肥)科技有限公司 Method and device for detecting building cavity, electronic equipment and storage medium
CN111860510A (en) * 2020-07-29 2020-10-30 浙江大华技术股份有限公司 X-ray image target detection method and device
CN112037118A (en) * 2020-07-16 2020-12-04 新大陆数字技术股份有限公司 Image scaling hardware acceleration method, device and system and readable storage medium
CN112183463A (en) * 2020-10-23 2021-01-05 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112733671A (en) * 2020-12-31 2021-04-30 新大陆数字技术股份有限公司 Pedestrian detection method, device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403141A (en) * 2017-07-05 2017-11-28 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
CN107844785A (en) * 2017-12-08 2018-03-27 浙江捷尚视觉科技股份有限公司 A kind of method for detecting human face based on size estimation
CN109670452A (en) * 2018-12-20 2019-04-23 北京旷视科技有限公司 Method for detecting human face, device, electronic equipment and Face datection model
WO2019091271A1 (en) * 2017-11-13 2019-05-16 苏州科达科技股份有限公司 Human face detection method and human face detection system
CN109886128A (en) * 2019-01-24 2019-06-14 南京航空航天大学 A kind of method for detecting human face under low resolution
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403141A (en) * 2017-07-05 2017-11-28 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
WO2019091271A1 (en) * 2017-11-13 2019-05-16 苏州科达科技股份有限公司 Human face detection method and human face detection system
CN107844785A (en) * 2017-12-08 2018-03-27 浙江捷尚视觉科技股份有限公司 A kind of method for detecting human face based on size estimation
CN109670452A (en) * 2018-12-20 2019-04-23 北京旷视科技有限公司 Method for detecting human face, device, electronic equipment and Face datection model
CN109886128A (en) * 2019-01-24 2019-06-14 南京航空航天大学 A kind of method for detecting human face under low resolution
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783784A (en) * 2020-06-30 2020-10-16 创新奇智(合肥)科技有限公司 Method and device for detecting building cavity, electronic equipment and storage medium
CN112037118A (en) * 2020-07-16 2020-12-04 新大陆数字技术股份有限公司 Image scaling hardware acceleration method, device and system and readable storage medium
CN112037118B (en) * 2020-07-16 2024-02-02 新大陆数字技术股份有限公司 Image scaling hardware acceleration method, device and system and readable storage medium
CN111860510A (en) * 2020-07-29 2020-10-30 浙江大华技术股份有限公司 X-ray image target detection method and device
CN111860510B (en) * 2020-07-29 2021-06-18 浙江大华技术股份有限公司 X-ray image target detection method and device
CN112183463A (en) * 2020-10-23 2021-01-05 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112183463B (en) * 2020-10-23 2021-10-15 珠海大横琴科技发展有限公司 Ship identification model verification method and device based on radar image
CN112733671A (en) * 2020-12-31 2021-04-30 新大陆数字技术股份有限公司 Pedestrian detection method, device and readable storage medium

Also Published As

Publication number Publication date
CN111241924B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN109543606B (en) Human face recognition method with attention mechanism
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN111241924A (en) Face detection and alignment method and device based on scale estimation and storage medium
CN110633745B (en) Image classification training method and device based on artificial intelligence and storage medium
CN111814902A (en) Target detection model training method, target identification method, device and medium
CN112508975A (en) Image identification method, device, equipment and storage medium
CN111079739B (en) Multi-scale attention feature detection method
CN108846404B (en) Image significance detection method and device based on related constraint graph sorting
CN111695421B (en) Image recognition method and device and electronic equipment
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN112614136A (en) Infrared small target real-time instance segmentation method and device
CN114092833A (en) Remote sensing image classification method and device, computer equipment and storage medium
CN112084952B (en) Video point location tracking method based on self-supervision training
CN111428664A (en) Real-time multi-person posture estimation method based on artificial intelligence deep learning technology for computer vision
CN112348116A (en) Target detection method and device using spatial context and computer equipment
CN110969110A (en) Face tracking method and system based on deep learning
CN113065379B (en) Image detection method and device integrating image quality and electronic equipment
CN111951283A (en) Medical image identification method and system based on deep learning
CN113807237B (en) Training of in vivo detection model, in vivo detection method, computer device, and medium
CN110222568B (en) Cross-visual-angle gait recognition method based on space-time diagram
CN114677357A (en) Model, method and equipment for detecting self-explosion defect of aerial photographing insulator and storage medium
CN113158860A (en) Deep learning-based multi-dimensional output face quality evaluation method and electronic equipment
CN111639537A (en) Face action unit identification method and device, electronic equipment and storage medium
CN113591647B (en) Human motion recognition method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant