CN111310718A - High-accuracy detection and comparison method for face-shielding image - Google Patents

High-accuracy detection and comparison method for face-shielding image Download PDF

Info

Publication number
CN111310718A
CN111310718A CN202010156376.7A CN202010156376A CN111310718A CN 111310718 A CN111310718 A CN 111310718A CN 202010156376 A CN202010156376 A CN 202010156376A CN 111310718 A CN111310718 A CN 111310718A
Authority
CN
China
Prior art keywords
network
face
loss
training
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010156376.7A
Other languages
Chinese (zh)
Inventor
孙冰
潘召军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kehong New Technology Institute of Sichuan University
Original Assignee
Kehong New Technology Institute of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kehong New Technology Institute of Sichuan University filed Critical Kehong New Technology Institute of Sichuan University
Priority to CN202010156376.7A priority Critical patent/CN111310718A/en
Publication of CN111310718A publication Critical patent/CN111310718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-accuracy detection and comparison method for an occluded face image, which comprises the following steps: preprocessing training data; making a generation target picture set for training generation branches; constructing a feature enhancement branch to obtain local features concentrated on the face; constructing parallel feature extraction network branches, and strengthening the extraction and utilization of detailed features; constructing a complete network model, and performing face region classification and frame regression based on fusion feature and parallel feature extraction; network end-to-end training, updating network parameters, and obtaining a detection model for completing training; and inputting a sample of the shielding face image, and using the trained detection model to select the position of the face image to finish the detection of the shielding face image. The method can effectively improve the proportion of the characteristics of the visible face region in the overall characteristics, improves the robustness of the detection model to the shielded face image, and has higher detection accuracy and recall rate to the shielded face image.

Description

High-accuracy detection and comparison method for face-shielding image
Technical Field
The invention relates to a face image detection method, in particular to a high-accuracy detection and comparison method for an occluded face image.
Background
Face image Detection is also called Face Detection (Face Detection) for short, and refers to a process of judging whether a Face image exists in an input image and determining specific positions of all Face image areas. With the increasing popularity of intelligent identification technology, the face image automatic detection technology plays an important application value in a wide range of scenes such as case detection, identity identification, mobile social contact, shooting beautification and the like.
The face image detection technology is mainly divided into a traditional detection method and a detection method based on deep learning. The traditional face image detection technology mainly performs face and non-face two-classification on images by designing artificial features such as gray scale features, contour features, skin color features and the like. The VJ detection algorithm proposed by Paul Viola et al is an excellent representation of the traditional detection algorithm. The VJ algorithm utilizes Haar characteristics and an Adaboost cascade strategy, and constructs a strong detector through level training weak classifiers, so that the real-time detection rate and the relatively good detection accuracy rate are achieved.
Compared with the traditional machine learning method, the neural network has more advantages in the aspect of nonlinear function fitting. With the progress of related technologies in the field of deep learning in recent years, a related model has excellent performance in the aspects of image feature extraction and classification detection, so that the application of deep learning in face image detection is increasingly wide. For example, a classical RCNN detection model performs feature extraction on an input image through convolution and pooling layers, then obtains candidate regions with different proportions on the basis of a feature map, and performs classification and frame regression on whether the candidate regions are human faces.
The existing human face image detection model can obtain good detection effect under the constraint condition, but the actual application scene usually has shielding of various conditions, and the human face image with partial characteristic loss brings difficulty and challenge to accurate detection of the human face image. For example, fast RCNN can achieve higher accuracy on the public data set VOC2007, but when processing face images with a large amount of occlusion, there are a large number of missed and false detections.
Disclosure of Invention
The present invention is directed to solve the above problems and provide a method for detecting an occluded face image with high accuracy, which can significantly improve the accuracy of detecting an occluded face image.
The invention realizes the purpose through the following technical scheme:
a high-accuracy detection and comparison method for an occluded face image comprises the following steps:
step 1, preprocessing training data;
step 2, making a generation target picture set for training generation branches;
step 3, constructing a feature enhancement branch to obtain local features concentrated on the face; constructing parallel feature extraction network branches, and strengthening the extraction and utilization of detailed features;
step 4, constructing a complete network model, and performing face region classification and frame regression based on fusion feature and parallel feature extraction;
step 5, end-to-end training of the network, updating network parameters and obtaining a detection model after training;
and 6, inputting a sample of the shielded face image, and using the trained detection model to select the position of the face image to finish the detection of the shielded face image.
Preferably, in step 1, the training data set is a WiderFace public data set, and the preprocessing includes performing size scaling processing on all input images to avoid occupying too high video memory.
More specifically, in the step 1, the WiderFace public data set includes a large number of facial occlusion pictures, an occlusion item in the data label represents an occlusion degree, which is divided into 0, 1 and 2 levels, and respectively represents no occlusion, slight occlusion and large-area occlusion; randomly selecting 50% from samples with occlusion grade of 0, intercepting a square background area from a non-ground route area of each picture, wherein the size range of the area is random [0.2, 0.8] times of the side length of a maximum GT frame, and covering a part of the GT frame by using the cut background area to cause artificial shielding; before inputting into the network, all input images are subjected to size processing and are scaled to have short edges not more than 600 pixels and long edges not more than 800 pixels.
Preferably, the method of step 2 is: and making a corresponding generation target picture for each training set picture based on the WiderFace data set, and calculating the similarity generation loss.
Further, in the step 2, the non-GT region pixel value of each input image is set to zero, and an image only including a face region is obtained and is used as a generation target of the enhancement branch.
Preferably, the step 3 comprises the following steps:
step 3.1, constructing two feature enhancement branches to obtain local features concentrated on the face; the method specifically comprises the following steps:
step 3.1.1, constructing two feature enhancement branch networks, including a convolution network for feature screening and a deconvolution network for picture generation;
step 3.1.2, the features output in the main network feature extraction stage pass through 3 layers of convolution layers, the size of a convolution kernel is 3 x 3, the padding is 1, the step length is 1, and 512-channel intermediate features with unchanged scale are obtained;
3.1.3, generating a target area by the intermediate characteristic through a decoder module, and specifically, obtaining a 1-channel output image with the same input size through 4 layers of deconv layers under a buffer frame;
step 3.1.4, calculating the similarity loss of the generated image and the production target image manufactured in the step 2, and adjusting and enhancing branch network parameters based on the similarity loss; the similarity loss adopts L2 loss, and the calculation formula is as follows:
Lsim=αLf+(1-α)Lnf
wherein L issimIs the generated loss, α is a parameter for adjusting the degree of contribution of the face region to the loss in loss, LfFor loss of face area, LnfIs a loss of non-face regions, where LfAnd LnfWith the same L2 loss, the calculation is as follows:
Figure BSA0000203548460000041
wherein, yiTo generate the values of the pixels of the picture,
Figure BSA0000203548460000042
marking the corresponding value of the picture;
3.2, constructing a parallel feature extraction network, fusing the parallel feature extraction network with a backbone network, and enhancing the extraction of the detail features of the human face, wherein the method specifically comprises the following steps:
step 3.2.1, constructing a parallel feature extraction network, wherein the parallel network and the backbone network adopt a pre-convolution module of VGG16, and the parallel network comprises 5 convolution modules Conv1 ', Conv2 ',. and Conv5 ', wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels is 64, a maximum pooling layer, the output feature map size is 1/2 of the original, Conv2 ' comprises 2 3 × 3 convolutions, the number of channels is 128, a maximum pooling layer, the output feature map size is 1/4 of the original, Conv3 ' comprises 3 × 3 convolutions, the number of channels is 256, a maximum pooling layer, the output feature map size is 1/8 of the original, Conv4 ' comprises 3 × 3 convolutions, the number of channels is 512, a maximum pooling layer, the output feature map size is 1/16 of the original, Conv5 ' comprises 3 convolutions, and the number of channels is 512;
3.2.2, on the basis of the backbone network structure, connecting each convolution layer with the side surface of the backbone network through a 1 x 1 convolution, wherein the rest structures are completely the same as the backbone network;
step 3.2.3, except the first layer, each conv layer fuses the feature map obtained in the previous step and the feature map obtained by the backbone network, and then the feature maps are continuously used by the next layer of network;
and 3.2.4, fusing the conv5_ 3' layer output characteristic graph with the backbone network conv5_3 layer output characteristic graph and then using the fused graph as the input of the enhanced branch and the subsequent network.
Preferably, the step 4 comprises the following steps:
step 4.1, constructing a feature extraction backbone network, wherein the backbone network adopts pre-convolution modules of VGG16, and comprises 5 convolution modules, Conv1, Conv2, and Conv5, wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels 64, a maximum pooling layer, the size of an output feature map is 1/2 of the original, Conv2 comprises 2 3 × 3 convolutions, the number of channels 128, a maximum pooling layer, the size of an output feature map is 1/4 of the original, Conv3 comprises 3 × 3 convolutions, the number of channels 256, a maximum pooling layer, the size of an output feature map is 1/8 of the original, Conv4 comprises 3 × 3 convolutions, the number of channels 512, a maximum pooling layer, the size of the output feature map is 1/16 of the original, and Conv5 comprises 3 × 3 convolutions and the number of channels 512;
step 4.2, taking the fusion result of the conv5_3 layer output feature graph and the conv5_3 layer feature graph of the parallel feature extraction network as the input of the enhancement branch and the subsequent network;
step 4.3, enhancing the same size characteristics output by the branches and the parallel branches, fusing the same size characteristics with the original conv5_3 layer characteristics through point multiplication, enhancing the weight of the visible face area in the classification characteristics, and enhancing the extraction of the face detail characteristics;
4.4, on the basis of the fusion characteristics, acquiring a normalized proposed region by using an RPN module and an ROI module;
step 4.5, finishing face two classification and frame fine adjustment of the proposed area through the classification branch and the regression branch; wherein the classification regression loss is calculated as follows:
Figure BSA0000203548460000051
wherein L isclsTo classify losses, in which piIn order to be a classification score,
Figure BSA0000203548460000052
is an anchor label with a positive value of 1 and a negative value of 0, LregIs the regression loss multiplied by
Figure BSA0000203548460000053
Representing regression bounding boxes, t, only for anchors classified as foregroundiTo predict one of the bounding box parameter components (x, y, w, h),
Figure BSA0000203548460000054
is the group true box parameter corresponding to anchor marked as positive.
Preferably, the step 5 comprises the following steps:
step 5.1, setting an enhanced branch loss function and a classification regression loss function;
and 5.2, performing end-to-end training on the network, updating network parameters based on the joint loss adjustment parameters, and obtaining a detection model for completing training.
More specifically, in step 5.2, the VGG16 is used to pre-train the model, training of network parameters is started, the training adopts a stochastic gradient descent method with impulse and weight attenuation, the impulse is 0.8, the attenuation is 0.0005, 2 pictures are processed per mini-batch, the initial learning rate is set to be 0.001, and the attenuation rate per 18000 steps is 0.1.
The invention has the beneficial effects that:
aiming at the problem of interference caused by shielding on face image detection, a fast RCNN model is used as a backbone, a feature enhancement branch generated based on a visible region and a parallel feature extraction network branch for enhancing face detail feature extraction are designed, the proportion of visible face region features in the overall features can be effectively improved by superposing original image features, parallel feature extraction network features and generated face region features, the robustness of a detection model on the shielded face image is improved, loss and interference caused by shielding on the feature region are inhibited, the face image in an image sample is better positioned and extracted, and the detection accuracy and recall rate on the shielded face image are higher.
Drawings
FIG. 1 is a general flow chart of the method for detecting the high accuracy of the occluded face image according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings in which:
as shown in FIG. 1, the method for detecting the high accuracy of the image of the occluded human face comprises the following steps:
step 1, preprocessing training data;
in the step, the training data set adopts a WiderFace public data set, and the preprocessing comprises the step of carrying out size scaling processing on all input images so as to avoid occupying too high video memory; the WiderFace public data set comprises a large number of face shielding pictures, an oclusion item in data annotation represents shielding degree which is divided into 0, 1 and 2 grades and respectively represents no shielding, slight shielding and large-area shielding; randomly selecting 50% from samples with occlusion grade of 0, intercepting a square background area from a non-ground route area of each picture, wherein the size range of the area is random [0.2, 0.8] times of the side length of a maximum GT frame, and covering a part of the GT frame by using the cut background area to cause artificial shielding; before inputting into the network, all input images are subjected to size processing and are scaled to have short edges not more than 600 pixels and long edges not more than 800 pixels.
Step 2, making a generation target picture set for training generation branches: making a corresponding generation target picture for each training set picture based on the WiderFace data set, and calculating similarity generation loss; and setting the pixel value of the non-GT region of each input image to zero to obtain an image only containing a face region, and using the image as a generation target of the enhancement branch.
Step 3, constructing a feature enhancement branch to obtain local features concentrated on the face; constructing parallel feature extraction network branches, and strengthening the extraction and utilization of detailed features;
the method specifically comprises the following steps:
step 3.1, constructing two feature enhancement branches to obtain local features concentrated on the face; the method specifically comprises the following steps:
step 3.1.1, constructing two feature enhancement branch networks, including a convolution network for feature screening and a deconvolution network for picture generation;
step 3.1.2, the features output in the main network feature extraction stage pass through 3 layers of convolution layers, the size of a convolution kernel is 3 x 3, the padding is 1, the step length is 1, and 512-channel intermediate features with unchanged scale are obtained;
3.1.3, generating a target area by the intermediate characteristic through a decoder module, and specifically, obtaining a 1-channel output image with the same input size through 4 layers of deconv layers under a buffer frame;
step 3.1.4, calculating the similarity loss of the generated image and the production target image manufactured in the step 2, and adjusting and enhancing branch network parameters based on the similarity loss; the similarity loss adopts L2 loss, and the calculation formula is as follows:
Lsim=αLf+(1-α)Lnf
wherein L issimIs to generate loss, α is to adjust lossParameter of degree of contribution of face region to loss, LfFor loss of face area, LnfIs a loss of non-face regions, where LfAnd LnfWith the same L2 loss, the calculation is as follows:
Figure BSA0000203548460000081
wherein, yiTo generate the values of the pixels of the picture,
Figure BSA0000203548460000082
marking the corresponding value of the picture;
3.2, constructing a parallel feature extraction network, fusing the parallel feature extraction network with a backbone network, and enhancing the extraction of the detail features of the human face, wherein the method specifically comprises the following steps:
step 3.2.1, constructing a parallel feature extraction network, wherein the parallel network and the backbone network adopt a pre-convolution module of VGG16, and the parallel network comprises 5 convolution modules Conv1 ', Conv2 ',. and Conv5 ', wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels is 64, a maximum pooling layer, the output feature map size is 1/2 of the original, Conv2 ' comprises 2 3 × 3 convolutions, the number of channels is 128, a maximum pooling layer, the output feature map size is 1/4 of the original, Conv3 ' comprises 3 × 3 convolutions, the number of channels is 256, a maximum pooling layer, the output feature map size is 1/8 of the original, Conv4 ' comprises 3 × 3 convolutions, the number of channels is 512, a maximum pooling layer, the output feature map size is 1/16 of the original, Conv5 ' comprises 3 convolutions, and the number of channels is 512;
3.2.2, on the basis of the backbone network structure, connecting each convolution layer with the side surface of the backbone network through a 1 x 1 convolution, wherein the rest structures are completely the same as the backbone network;
step 3.2.3, except the first layer, each conv layer fuses the feature map obtained in the previous step and the feature map obtained by the backbone network, and then the feature maps are continuously used by the next layer of network;
and 3.2.4, fusing the conv5_ 3' layer output characteristic graph with the backbone network conv5_3 layer output characteristic graph and then using the fused graph as the input of the enhanced branch and the subsequent network.
Step 4, constructing a complete network model, and performing face region classification and frame regression based on fusion feature and parallel feature extraction;
the method specifically comprises the following steps:
step 4.1, constructing a feature extraction backbone network, wherein the backbone network adopts pre-convolution modules of VGG16, and comprises 5 convolution modules, Conv1, Conv2, and Conv5, wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels 64, a maximum pooling layer, the size of an output feature map is 1/2 of the original, Conv2 comprises 2 3 × 3 convolutions, the number of channels 128, a maximum pooling layer, the size of an output feature map is 1/4 of the original, Conv3 comprises 3 × 3 convolutions, the number of channels 256, a maximum pooling layer, the size of an output feature map is 1/8 of the original, Conv4 comprises 3 × 3 convolutions, the number of channels 512, a maximum pooling layer, the size of the output feature map is 1/16 of the original, and Conv5 comprises 3 × 3 convolutions and the number of channels 512;
step 4.2, taking the fusion result of the conv5_3 layer output feature graph and the conv5_3 layer feature graph of the parallel feature extraction network as the input of the enhancement branch and the subsequent network;
step 4.3, enhancing the same size characteristics output by the branches and the parallel branches, fusing the same size characteristics with the original conv5_3 layer characteristics through point multiplication, enhancing the weight of the visible face area in the classification characteristics, and enhancing the extraction of the face detail characteristics;
4.4, on the basis of the fusion characteristics, acquiring a normalized proposed region by using an RPN module and an ROI module;
step 4.5, finishing face two classification and frame fine adjustment of the proposed area through the classification branch and the regression branch; wherein the classification regression loss is calculated as follows:
Figure BSA0000203548460000091
wherein L isclsTo classify losses, in which piIn order to be a classification score,
Figure BSA0000203548460000092
is an anchor label with a positive value of 1 and a negative value of 0, LregIs the regression loss multiplied by
Figure BSA0000203548460000093
Representing regression bounding boxes, t, only for anchors classified as foregroundiTo predict one of the bounding box parameter components (x, y, w, h),
Figure BSA0000203548460000094
is the group true box parameter corresponding to anchor marked as positive.
Step 5, end-to-end training of the network, updating network parameters and obtaining a detection model after training;
the method specifically comprises the following steps:
step 5.1, setting an enhanced branch loss function and a classification regression loss function;
step 5.2, the network carries out end-to-end training, and updates the network parameters based on the joint loss adjustment parameters to obtain a detection model which completes the training; in the step, a VGG16 pre-training model is used, network parameters begin to be trained, a random gradient descent method with impulse and weight attenuation is adopted for training, the impulse is 0.8, the attenuation is 0.0005, each mini-batch processes 2 pictures, the initial learning rate is set to be 0.001, and the attenuation rate in each 18000 steps is 0.1.
And 6, inputting a sample of the shielded face image, and using the trained detection model to select the position of the face image to finish the detection of the shielded face image.
Description of the drawings: the steps in fig. 1 are not exactly the same as those described above, but correspond to each other, so as to facilitate simple extraction into a flow chart and understanding.
The invention designs a parallel feature extraction network based on feature enhancement branches generated by a face region and enhanced face detail feature extraction by using an attention mechanism; the feature enhancement branch generates images near a ground channel area based on the original image features, and performs point multiplication fusion on the trained features capable of generating better targets and the convolution features of the main network, so that the proportion of visible facial features is enhanced, and the interference of shielding on the features is reduced. The experimental result shows that the fused features can obviously improve the accuracy of the model for detecting the face to be shielded. The feature extraction branches parallel to the main network are laterally connected through 1 multiplied by 1 convolution, so that the detail features of the face image can be effectively captured, the positioning of the face area is enhanced, and the recognition accuracy of the face image can be effectively improved.
In order to accurately recover the face region from the features, the present invention utilizes supervised training enhancement branches to construct a target data set. And creating and generating a target picture by setting zero in a non-GT region of each input picture. According to the invention, the characteristic enhancement branches are fused into the fast RCNN detection model, and the experimental result shows that the model fused with the characteristic enhancement branches has a better detection effect on the shielded human face compared with the original model. In view of the fact that the face area does not occupy a large proportion in the picture, the constructed parallel feature extraction network branch can be further used for extracting bottom-layer features, and then operations such as convolution, pooling and the like are performed step by step, so that the method is beneficial to face area regression and face image recognition which do not occupy a large proportion in the picture.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the technical solutions of the present invention, so long as the technical solutions can be realized on the basis of the above embodiments without creative efforts, which should be considered to fall within the protection scope of the patent of the present invention.

Claims (9)

1. A high-accuracy detection and comparison method for an occluded face image is characterized by comprising the following steps: the method comprises the following steps:
step 1, preprocessing training data;
step 2, making a generation target picture set for training generation branches;
step 3, constructing a feature enhancement branch to obtain local features concentrated on the face; constructing parallel feature extraction network branches, and strengthening the extraction and utilization of detailed features;
step 4, constructing a complete network model, and performing face region classification and frame regression based on fusion feature and parallel feature extraction;
step 5, end-to-end training of the network, updating network parameters and obtaining a detection model after training;
and 6, inputting a sample of the shielded face image, and using the trained detection model to select the position of the face image to finish the detection of the shielded face image.
2. The method for high-accuracy detection of occluded face images according to claim 1, wherein: in the step 1, the training data set adopts a WiderFace public data set, and the preprocessing comprises the step of carrying out size scaling processing on all input images so as to avoid occupying too high video memory.
3. The method for high-accuracy detection of occluded face images according to claim 2, wherein: in the step 1, the WiderFace public data set comprises a large number of face shielding pictures, an occlusion item in data annotation represents shielding degree which is divided into 0, 1 and 2 grades and respectively represents no shielding, slight shielding and large-area shielding; randomly selecting 50% from samples with occlusion grade of 0, intercepting a square background area from a non-ground route area of each picture, wherein the size range of the area is random [0.2, 0.8] times of the side length of a maximum GT frame, and covering a part of the GT frame by using the cut background area to cause artificial shielding; before inputting into the network, all input images are subjected to size processing and are scaled to have short edges not more than 600 pixels and long edges not more than 800 pixels.
4. The method for detecting the high accuracy of the image of the occluded human face according to claim 2 or 3, wherein: the method of the step 2 comprises the following steps: and making a corresponding generation target picture for each training set picture based on the WiderFace data set, and calculating the similarity generation loss.
5. The method for high-accuracy detection of occluded face images according to claim 4, wherein: in the step 2, the non-GT region pixel value of each input image is set to zero, and an image only including a face region is obtained and is used as a generation target of an enhanced branch.
6. The method for high-accuracy detection of occluded face images according to claim 4, wherein: the step 3 comprises the following steps:
step 3.1, constructing two feature enhancement branches to obtain local features concentrated on the face; the method specifically comprises the following steps:
step 3.1.1, constructing two feature enhancement branch networks, including a convolution network for feature screening and a deconvolution network for picture generation;
step 3.1.2, the features output in the main network feature extraction stage pass through 3 layers of convolution layers, the size of a convolution kernel is 3 x 3, the padding is 1, the step length is 1, and 512-channel intermediate features with unchanged scale are obtained;
3.1.3, generating a target area by the intermediate characteristic through a decoder module, and specifically, obtaining a 1-channel output image with the same input size through 4 layers of deconv layers under a buffer frame;
step 3.1.4, calculating the similarity loss of the generated image and the production target image manufactured in the step 2, and adjusting and enhancing branch network parameters based on the similarity loss; the similarity loss adopts L2 loss, and the calculation formula is as follows:
Lsim=αLf+(1-α)Lnf
wherein L issimIs the generated loss, α is a parameter for adjusting the degree of contribution of the face region to the loss in loss, LfFor loss of face area, LnfIs a loss of non-face regions, where LfAnd LnfWith the same L2 loss, the calculation is as follows:
Figure FSA0000203548450000021
wherein, yiTo generate the values of the pixels of the picture,
Figure FSA0000203548450000022
marking the corresponding value of the picture;
3.2, constructing a parallel feature extraction network, fusing the parallel feature extraction network with a backbone network, and enhancing the extraction of the detail features of the human face, wherein the method specifically comprises the following steps:
step 3.2.1, constructing a parallel feature extraction network, wherein the parallel network and the backbone network adopt a pre-convolution module of VGG16, and the parallel network comprises 5 convolution modules Conv1 ', Conv2 ',. and Conv5 ', wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels is 64, a maximum pooling layer, the output feature map size is 1/2 of the original, Conv2 ' comprises 2 3 × 3 convolutions, the number of channels is 128, a maximum pooling layer, the output feature map size is 1/4 of the original, Conv3 ' comprises 3 × 3 convolutions, the number of channels is 256, a maximum pooling layer, the output feature map size is 1/8 of the original, Conv4 ' comprises 3 × 3 convolutions, the number of channels is 512, a maximum pooling layer, the output feature map size is 1/16 of the original, Conv5 ' comprises 3 convolutions, and the number of channels is 512;
3.2.2, on the basis of the backbone network structure, connecting each convolution layer with the side surface of the backbone network through a 1 x 1 convolution, wherein the rest structures are completely the same as the backbone network;
step 3.2.3, except the first layer, each conv layer fuses the feature map obtained in the previous step and the feature map obtained by the backbone network, and then the feature maps are continuously used by the next layer of network;
and 3.2.4, fusing the conv5_ 3' layer output characteristic graph with the backbone network conv5_3 layer output characteristic graph and then using the fused graph as the input of the enhanced branch and the subsequent network.
7. The method for high-accuracy detection of occluded face images according to claim 6, wherein: the step 4 comprises the following steps:
step 4.1, constructing a feature extraction backbone network, wherein the backbone network adopts pre-convolution modules of VGG16, and comprises 5 convolution modules, Conv1, Conv2, and Conv5, wherein Conv1 comprises 2 3 × 3 convolutions, the number of channels 64, a maximum pooling layer, the size of an output feature map is 1/2 of the original, Conv2 comprises 2 3 × 3 convolutions, the number of channels 128, a maximum pooling layer, the size of an output feature map is 1/4 of the original, Conv3 comprises 3 × 3 convolutions, the number of channels 256, a maximum pooling layer, the size of an output feature map is 1/8 of the original, Conv4 comprises 3 × 3 convolutions, the number of channels 512, a maximum pooling layer, the size of the output feature map is 1/16 of the original, and Conv5 comprises 3 × 3 convolutions and the number of channels 512;
step 4.2, taking the fusion result of the conv5_3 layer output feature graph and the conv5_3 layer feature graph of the parallel feature extraction network as the input of the enhancement branch and the subsequent network;
step 4.3, enhancing the same size characteristics output by the branches and the parallel branches, fusing the same size characteristics with the original conv5_3 layer characteristics through point multiplication, enhancing the weight of the visible face area in the classification characteristics, and enhancing the extraction of the face detail characteristics;
4.4, on the basis of the fusion characteristics, acquiring a normalized proposed region by using an RPN module and an ROI module;
step 4.5, finishing face two classification and frame fine adjustment of the proposed area through the classification branch and the regression branch; wherein the classification regression loss is calculated as follows:
Figure FSA0000203548450000041
wherein L isclsTo classify losses, in which piIn order to be a classification score,
Figure FSA0000203548450000042
is an anchor label with a positive value of 1 and a negative value of 0, LregIs the regression loss multiplied by
Figure FSA0000203548450000043
Representing regression bounding boxes, t, only for anchors classified as foregroundiTo predict one of the bounding box parameter components (x, y, w, h),
Figure FSA0000203548450000044
is the group true box parameter corresponding to anchor marked as positive.
8. The method for high-accuracy detection of occluded face images according to claim 7, wherein: the step 5 comprises the following steps:
step 5.1, setting an enhanced branch loss function and a classification regression loss function;
and 5.2, performing end-to-end training on the network, updating network parameters based on the joint loss adjustment parameters, and obtaining a detection model for completing training.
9. The occlusion face image high-accuracy detection method according to claim 8, characterized in that: in the step 5.2, a VGG16 pre-training model is used, network parameters begin to be trained, a random gradient descent method with impulse and weight attenuation is adopted for training, the impulse is 0.8, the attenuation is 0.0005, 2 pictures are processed by each mini-batch, the initial learning rate is set to be 0.001, and the attenuation rate is 0.1 in each 18000 steps.
CN202010156376.7A 2020-03-09 2020-03-09 High-accuracy detection and comparison method for face-shielding image Pending CN111310718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010156376.7A CN111310718A (en) 2020-03-09 2020-03-09 High-accuracy detection and comparison method for face-shielding image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010156376.7A CN111310718A (en) 2020-03-09 2020-03-09 High-accuracy detection and comparison method for face-shielding image

Publications (1)

Publication Number Publication Date
CN111310718A true CN111310718A (en) 2020-06-19

Family

ID=71149579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010156376.7A Pending CN111310718A (en) 2020-03-09 2020-03-09 High-accuracy detection and comparison method for face-shielding image

Country Status (1)

Country Link
CN (1) CN111310718A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860174A (en) * 2020-06-22 2020-10-30 西安工程大学 Method for detecting shielding face by fusing RepGT-RepBox function
CN111914665A (en) * 2020-07-07 2020-11-10 泰康保险集团股份有限公司 Face shielding detection method, device, equipment and storage medium
CN112232292A (en) * 2020-11-09 2021-01-15 泰康保险集团股份有限公司 Face detection method and device applied to mobile terminal
CN112270326A (en) * 2020-11-18 2021-01-26 珠海大横琴科技发展有限公司 Detection optimization method and device for ship sheltering and electronic equipment
CN112417980A (en) * 2020-10-27 2021-02-26 南京邮电大学 Single-stage underwater biological target detection method based on feature enhancement and refinement
CN112464701A (en) * 2020-08-26 2021-03-09 北京交通大学 Method for detecting whether people wear masks or not based on light weight characteristic fusion SSD
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN114266946A (en) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 Feature identification method and device under shielding condition, computer equipment and medium
CN114998605A (en) * 2022-05-10 2022-09-02 北京科技大学 Target detection method for image enhancement guidance under severe imaging condition
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction
CN116883670A (en) * 2023-08-11 2023-10-13 智慧眼科技股份有限公司 Anti-shielding face image segmentation method
CN117275075A (en) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 Face shielding detection method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985181A (en) * 2018-06-22 2018-12-11 华中科技大学 A kind of end-to-end face mask method based on detection segmentation
CN109145854A (en) * 2018-08-31 2019-01-04 东南大学 A kind of method for detecting human face based on concatenated convolutional neural network structure
CN110399826A (en) * 2019-07-22 2019-11-01 清华大学深圳研究生院 A kind of end-to-end human face detection and recognition method
CN110728224A (en) * 2019-10-08 2020-01-24 西安电子科技大学 Remote sensing image classification method based on attention mechanism depth Contourlet network
CN110909690A (en) * 2019-11-26 2020-03-24 电子科技大学 Method for detecting occluded face image based on region generation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985181A (en) * 2018-06-22 2018-12-11 华中科技大学 A kind of end-to-end face mask method based on detection segmentation
CN109145854A (en) * 2018-08-31 2019-01-04 东南大学 A kind of method for detecting human face based on concatenated convolutional neural network structure
CN110399826A (en) * 2019-07-22 2019-11-01 清华大学深圳研究生院 A kind of end-to-end human face detection and recognition method
CN110728224A (en) * 2019-10-08 2020-01-24 西安电子科技大学 Remote sensing image classification method based on attention mechanism depth Contourlet network
CN110909690A (en) * 2019-11-26 2020-03-24 电子科技大学 Method for detecting occluded face image based on region generation

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860174A (en) * 2020-06-22 2020-10-30 西安工程大学 Method for detecting shielding face by fusing RepGT-RepBox function
CN111914665B (en) * 2020-07-07 2023-06-20 泰康保险集团股份有限公司 Face shielding detection method, device, equipment and storage medium
CN111914665A (en) * 2020-07-07 2020-11-10 泰康保险集团股份有限公司 Face shielding detection method, device, equipment and storage medium
CN112464701A (en) * 2020-08-26 2021-03-09 北京交通大学 Method for detecting whether people wear masks or not based on light weight characteristic fusion SSD
CN112417980A (en) * 2020-10-27 2021-02-26 南京邮电大学 Single-stage underwater biological target detection method based on feature enhancement and refinement
CN112232292A (en) * 2020-11-09 2021-01-15 泰康保险集团股份有限公司 Face detection method and device applied to mobile terminal
CN112232292B (en) * 2020-11-09 2023-12-26 泰康保险集团股份有限公司 Face detection method and device applied to mobile terminal
CN112270326A (en) * 2020-11-18 2021-01-26 珠海大横琴科技发展有限公司 Detection optimization method and device for ship sheltering and electronic equipment
CN112270326B (en) * 2020-11-18 2022-03-22 珠海大横琴科技发展有限公司 Detection optimization method and device for ship sheltering and electronic equipment
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN113723414B (en) * 2021-08-12 2023-12-15 中国科学院信息工程研究所 Method and device for dividing mask face shielding object
CN114266946A (en) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 Feature identification method and device under shielding condition, computer equipment and medium
CN114998605A (en) * 2022-05-10 2022-09-02 北京科技大学 Target detection method for image enhancement guidance under severe imaging condition
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction
CN116883670A (en) * 2023-08-11 2023-10-13 智慧眼科技股份有限公司 Anti-shielding face image segmentation method
CN116883670B (en) * 2023-08-11 2024-05-14 智慧眼科技股份有限公司 Anti-shielding face image segmentation method
CN117275075A (en) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 Face shielding detection method, system, device and storage medium
CN117275075B (en) * 2023-11-01 2024-02-13 浙江同花顺智能科技有限公司 Face shielding detection method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN110909690B (en) Method for detecting occluded face image based on region generation
CN111310718A (en) High-accuracy detection and comparison method for face-shielding image
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN108520219B (en) Multi-scale rapid face detection method based on convolutional neural network feature fusion
CN109543606B (en) Human face recognition method with attention mechanism
CN109934200B (en) RGB color remote sensing image cloud detection method and system based on improved M-Net
WO2021073418A1 (en) Face recognition method and apparatus, device, and storage medium
CN104050471B (en) Natural scene character detection method and system
US20200410212A1 (en) Fast side-face interference resistant face detection method
CN110543846B (en) Multi-pose face image obverse method based on generation countermeasure network
CN106897673B (en) Retinex algorithm and convolutional neural network-based pedestrian re-identification method
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN114022432B (en) Insulator defect detection method based on improved yolov5
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
CN109446922B (en) Real-time robust face detection method
CN110781962B (en) Target detection method based on lightweight convolutional neural network
CN112418087B (en) Underwater video fish identification method based on neural network
CN112232204B (en) Living body detection method based on infrared image
CN111626200A (en) Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN110135446A (en) Method for text detection and computer storage medium
CN112329771B (en) Deep learning-based building material sample identification method
CN111476727B (en) Video motion enhancement method for face-changing video detection
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
CN113297959A (en) Target tracking method and system based on corner attention twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200619