CN115527254B - Face recognition and model training method and device, electronic equipment and storage medium - Google Patents

Face recognition and model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115527254B
CN115527254B CN202211154023.9A CN202211154023A CN115527254B CN 115527254 B CN115527254 B CN 115527254B CN 202211154023 A CN202211154023 A CN 202211154023A CN 115527254 B CN115527254 B CN 115527254B
Authority
CN
China
Prior art keywords
mask
face
feature
face recognition
coverage area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211154023.9A
Other languages
Chinese (zh)
Other versions
CN115527254A (en
Inventor
魏梦
随海亮
赵欲苗
陈智超
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Dilusense Technology Co Ltd filed Critical Hefei Dilusense Technology Co Ltd
Priority to CN202211154023.9A priority Critical patent/CN115527254B/en
Publication of CN115527254A publication Critical patent/CN115527254A/en
Application granted granted Critical
Publication of CN115527254B publication Critical patent/CN115527254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to the field of artificial intelligence, and discloses a face recognition and model training method, a device, electronic equipment and a storage medium, wherein the model training method comprises the following steps: acquiring a face feature image of an input image, wherein the input image comprises a face image of a mask-wearing person and a face image of a mask-not-wearing person; generating a mask feature mask with mask coverage area features set to zero according to the face feature map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result; acquiring a feature vector of an input image according to the mask feature mask and the face feature map; and obtaining a loss value of the face recognition result according to the feature vector and a preset loss function, and updating parameters of the face recognition model according to the loss value. When the face image of the wearer is identified, the generated mask feature mask is utilized to enable the contribution of the face features of the mask coverage area to face recognition to be reduced, and the face recognition of the face image of the wearer is accurately completed.

Description

Face recognition and model training method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of artificial intelligence, in particular to a face recognition and model training method and device, electronic equipment and a storage medium.
Background
The face recognition system widely used at present has excellent recognition capability in a coordinated type scene, but still faces some challenges in a complex real application scene. An important influencing factor is face shielding, when a face area is shielded, the visible information of the face is reduced, and the recognition capability of a general face recognition system without specific design is greatly reduced. Especially, people wear the mask in the travel in recent years, and improving the recognition capability of the face of the wearer is a problem which needs to be solved in the face recognition field.
Aiming at the recognition of the face of the wearer, the following two solutions generally exist: the method relies on the accuracy of a face generation model for face occlusion region recovery, and face generation is a difficult problem, and the generated face is extremely easy to lose face identity information, so that the recognition accuracy is insufficient and the algorithm is complex.
The other idea is to remove the characteristics which are blocked and destroyed by the mask, and only the effective characteristics of the visible part of the human face are used for carrying out human face recognition. The method is characterized in that the mask detection is carried out before face recognition, whether the face area is blocked by the mask is judged, and the image segmentation is carried out on the face blocked by the mask. The other method is to add a feature mask generator in a face recognition model, define a plurality of rectangular feature masks in advance, add feature mask labels in a training sample, and use the training sample to supervise and train the feature mask generator and the face recognition model together, but the feature mask is limited by the type of mask defined in advance and can cover a mask with limited shielding condition, so that the face recognition is easy to be interfered, the recognition precision is unstable, and the feature mask generator and the face recognition model need to be trained separately and jointly, and parameter adjustment in the training process is difficult.
Disclosure of Invention
The embodiment of the application aims to provide a face recognition method, a model training method, a device, electronic equipment and a storage medium, so that training difficulty and cost of a face recognition model are reduced, and meanwhile, the trained face recognition model can generate a mask feature mask which enables face features of a mask coverage area to contribute less in a face recognition process when a face image of a wearer is acquired, and face recognition of the face image of the wearer is accurately completed.
In order to solve the above technical problems, an embodiment of the present application provides a face recognition model training method, including: acquiring a face feature map of an input image, wherein the input image comprises a face image of a mask and a face image of a mask not to be worn; generating a mask feature mask with the mask coverage area feature set to zero according to the face feature map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result; acquiring a feature vector of the input image according to the mask feature mask and the face feature map; and obtaining a loss value of a face recognition result according to the feature vector and a preset loss function, and updating parameters of the face recognition model according to the loss value.
In order to solve the above technical problem, the embodiment of the present application further provides a face recognition method, including: acquiring a face recognition model after training according to the face recognition model training method; and acquiring a face image to be recognized, taking the face image to be recognized as the input of the face recognition model, and acquiring a face recognition result.
In order to solve the above technical problem, the embodiment of the present application further provides a face recognition model training device, including: the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a face feature image of an input image, wherein the input image comprises a face image of a mask and a face image of a mask not to be worn; the generating module is used for generating a mask characteristic mask with the mask coverage area characteristic set to zero according to the face characteristic map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result; the extraction module is used for acquiring the feature vector of the input image according to the mask feature mask and the face feature map; and the training module is used for acquiring a loss value of a face recognition result according to the feature vector and a preset loss function, and carrying out parameter updating on the face recognition model according to the loss value.
In order to solve the above technical problem, an embodiment of the present application further provides a face recognition device, including: the acquisition module is used for acquiring the face recognition model after training according to the face recognition model training method; the recognition module is used for acquiring a face image to be recognized, taking the face image to be recognized as the input of the face recognition model, and acquiring a face recognition result.
In order to solve the above technical problem, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face recognition model training method, or face recognition method, described above.
To solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium storing a computer program, where the computer program implements the face recognition model training method or the face recognition method when executed by a processor.
In the face recognition model training method provided by the embodiment of the application, in the face recognition model training process, the input face images with and without the mask are subjected to feature extraction to obtain the face feature images, then the mask feature mask with the mask coverage area feature set to zero is generated according to the mask coverage area position information and the face feature images determined based on the face key point detection result, and the feature vector of the input image is obtained according to the mask feature mask and the face feature images. And then obtaining a loss value of the face recognition result according to the feature vector and a preset loss function, and updating the model parameters based on the loss value. By utilizing the mask coverage area position information determined based on the face key point detection result to generate a mask characteristic mask, the need of adding mask labels to carry out the supervision training of the model is avoided, and the parameter adjustment difficulty and the training cost in the training process are reduced; the feature vector is obtained based on the mask feature mask and the face feature map to carry out model training, so that the trained model can directly generate the mask feature mask with smaller contribution in the face recognition process when obtaining the image with poorer mask coverage area feature value, and further the recognition of the face of the wearer mask can be accurately completed.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
Fig. 1 is a flowchart of a face recognition model training method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a key point detection result in an embodiment of the present application;
fig. 3 is a schematic view of an added simulated mask and mask coverage area in an embodiment of the present application;
FIG. 4 is a schematic diagram of a feature extraction network model structure in an embodiment of the present application;
FIG. 5 is a schematic diagram of the working principle of the model in the embodiment of the present application;
fig. 6 is a flowchart of a face recognition method according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a face recognition model training device according to another embodiment of the present application;
fig. 8 is a schematic structural diagram of a face recognition device according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
According to the background technology, in the existing mask-wearing face recognition model training process based on mask characteristic masks, parameter adjustment is difficult, the mask type is limited by the custom mask type in the supervision training process, the face recognition of the trained model is easy to be interfered, and the recognition accuracy is unstable, so that how to simply and efficiently train the face recognition model with the efficient and accurate face recognition effect on the mask-wearing face images is an urgent problem to be solved.
In order to achieve the purpose of simply and efficiently training a face recognition model with an efficient and accurate face recognition effect on a face image of a wearer mask, an embodiment of the invention provides a face recognition model training method, which comprises the following steps: acquiring a face feature image of an input image, wherein the input image comprises a face image of a mask-wearing person and a face image of a mask-not-wearing person; generating a mask feature mask with mask coverage area features set to zero according to the face feature map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result; acquiring a feature vector of an input image according to the mask feature mask and the face feature map; and obtaining a loss value of the face recognition result according to the feature vector and a preset loss function, and updating parameters of the face recognition model according to the loss value.
In the face recognition model training method provided by the embodiment of the application, in the face recognition model training process, the input face images with and without the mask are subjected to feature extraction to obtain the face feature images, then the mask feature mask with the mask coverage area feature set to zero is generated according to the mask coverage area position information and the face feature images determined based on the face key point detection result, and the feature vector of the input image is obtained according to the mask feature mask and the face feature images. And then obtaining a loss value of the face recognition result according to the feature vector and a preset loss function, and updating the model parameters based on the loss value. By utilizing the mask coverage area position information determined based on the face key point detection result to generate a mask characteristic mask, the need of adding mask labels to carry out the supervision training of the model is avoided, and the parameter adjustment difficulty and the training cost in the training process are reduced; the feature vector is obtained based on the mask feature mask and the face feature map to carry out model training, so that the trained model can directly generate the mask feature mask with smaller contribution in the face recognition process when obtaining the image with poorer mask coverage area feature value, and further the recognition of the face of the wearer mask can be accurately completed.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present invention, and the embodiments can be mutually combined and referred to without contradiction.
Implementation details of the face recognition model training method described in the present application will be specifically described below with reference to specific embodiments, and the following details are provided only for understanding, and are not necessary to implement the present embodiment.
In a specific application, the face recognition model training method may be applied to a terminal capable of performing data interaction and operation, such as a computer, a mobile phone, and other electronic devices, and the embodiment is described by taking the application to the computer as an example, and a face recognition model training process may refer to fig. 1, including the following steps:
step 101, acquiring a face feature map of an input image, wherein the input image comprises a face image of a wearer and a face image of a wearer without the wearer.
Specifically, in the process of training a face recognition model, a computer acquires a face recognition model to be trained and training samples adopted in model training, namely a plurality of face images with a mask and a plurality of face images without the mask, from a preset storage address or a local storage space. And then taking each training sample as the input of the face recognition model to be trained one by one, and extracting the characteristics of the input image through a characteristic extraction network in the face recognition model to be trained to obtain a face characteristic diagram of the input image. When the training sample is obtained, the training sample can be generated by reading the pre-stored processed image, or the unprocessed image can be obtained, and the training sample can be generated in a real-time processing mode.
In one example, the face image of a mask-free face used in face recognition model training may be obtained by: according to a preset face detection algorithm, face key points are detected on a plurality of face images without masks; and correcting the face key points of the face images without the mask according to the face key point detection result, and generating a plurality of face images without the mask.
Specifically, a plurality of face images without a mask which are not processed can be obtained in advance or in real time, and then, for each face image without a mask, face areas and face key points including five positions of left and right eyes, nose tips and left and right mouth corners in the face image without a mask are detected according to a preset face detection algorithm, so that a face key point detection result including a face contour is obtained. And correcting the key points of the face according to the detection result of the key points of the face, and taking the corrected image as the face image without wearing a mask in the training sample. The face key points in the image are corrected according to the face key point detection result, so that the accuracy of face features in the input image used in the model training process is ensured, and the model training efficiency and effect are improved.
It should be noted that after the face key point correction is completed, a plurality of specific key points can be determined in the corrected face key point detection result according to a plurality of key points contained in the template, the specific key points are respectively aligned with the corresponding key points in the template, then the normalization processing is performed on the image, so that the face big head photo with a certain size is obtained, and the face big head photo is used as the face image without wearing mask. The training efficiency is improved by processing the input image into an image of a certain size.
In another example, a facial image of a wearer may be obtained by: selecting a plurality of face images without wearing a mask; and adding a simulated mask to each selected face image of the non-mask according to a preset simulated mask adding algorithm to generate a plurality of face images of the non-mask.
Specifically, because the cost and difficulty of directly acquiring the face image of the mask are high, a plurality of images can be directly selected from the generated face image without the mask in the process of acquiring the face image of the mask. And obtaining templates of the added simulated masks according to a preset simulated mask adding algorithm for each selected face image without the mask, adding the simulated masks for the face image without the mask according to the obtained templates, and determining and storing the position information of the coverage area of the simulated masks according to the face feature point detection result. By adding the mode of simulating the mask, the face image acquisition cost of the wearer is reduced.
It should be noted that, when the face image of the wearer without wearing the mask is selected to add the simulated mask, the image selection mode may be random selection, fixed interval selection or selection according to the number of the image, and the simulated mask addition may be three-dimensional modeling, AR simulation, a coverage ring for defining the coverage of the simulated mask, and the specific modes of image selection and simulated mask addition in this embodiment are not limited.
In another example, the preset face detection algorithm is any one of the following detection algorithms: 68 keypoint detection algorithm, 98 keypoint detection algorithm, and 128 keypoint detection algorithm.
Specifically, in the model training process, according to the accuracy requirement on the trained face recognition model, when an input image in the model training process is generated, a proper face detection algorithm is selected to adjust the feature accuracy of the input image, so that the applicability of the model training method is improved.
For example, when the input image during model training is generated by detecting and correcting an unprocessed image by adopting a 68 key point detection algorithm, the face area on the face image and five key points including a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner are detected according to the 68 key point detection algorithm, and face contour key points including a face, eyebrows, eyes, a nose and a mouth are obtained. The face key point detection result may refer to fig. 2. Then the face image is cut according to the requirement, and the 5-point key points are corrected according to the 68-point key points, so that more accurate 5-point key points are obtained. And aligning the corrected 5-point key points with the 5-point key points of the template, carrying out normalization processing on the image, and taking the obtained face big head photograph with a certain size as the face image without wearing the mask. And then selecting a plurality of face images without the mask, adding a simulated mask to the face images without the mask according to a preset algorithm, generating a plurality of face images with the mask, and determining the position information of the coverage area of the simulated mask according to the face feature point detection result. The face image of the wearer wearing the simulated mask and the mask coverage area can refer to fig. 3, and the area surrounded by the black contour lines identifies the coverage area of the simulated mask.
Step 102, generating a mask feature mask with zero mask coverage area features according to the face feature map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on the face key point detection result.
Specifically, after the face feature map of the input image is acquired, the mask coverage area position information of the input image is read, wherein the mask coverage area position information is determined based on the face key point detection result, the mask coverage area position information of the mask wearing the mask image is determined according to the simulated mask boundary and the face key point detection result, and the mask coverage area position information of the mask not wearing the mask image is 0 or empty. And then generating a mask characteristic mask with the mask coverage area characteristic set to zero according to the face characteristic map and the mask coverage area position information of the input image.
In one example, generating a mask feature mask with mask coverage area features set to zero according to a face feature map and mask coverage area position information of an input image includes: acquiring weights corresponding to all feature values in the face feature map according to a preset function; and setting the weight of the characteristic value of the mask coverage area to zero according to the position information of the mask coverage area to generate a mask characteristic mask.
Specifically, in order to generate a mask feature mask, a mask is added to the basis of a face recognition model in advance, the mask is input into a face feature map extracted from the face recognition model and mask coverage area position information of an input image, after the face feature map is acquired, the mask performs convolution calculation on the face feature map through a convolution layer which is consistent with the number of channels of the face feature map, and weights of all features are acquired. Then, the weight of the feature value of the mask coverage area is set to zero according to the mask coverage area position information of the input image, for example, according to the 68 key point area position information, so as to generate a mask feature mask of the input face feature map. The weight of the characteristic value of the mask coverage area is set to be zero according to the position information of the mask coverage area, so that the generated mask characteristic mask can accurately reduce the contribution of the mask coverage area characteristic in the face recognition process.
In another example, mask coverage area location information may be obtained by: according to the face key point detection result and a preset selection rule, selecting a group of face key points as boundaries of a mask coverage area; and taking the codes and the position information of the selected key points of the faces as the position information of the covering area of the mask.
Specifically, when the position information of the mask coverage area is obtained, a group of face key points can be selected as the boundary of the mask coverage area after the simulated mask is added in the face key point detection result according to the algorithm of the simulated mask addition, and then the codes and the position information of the selected face key points are used as the position information of the mask coverage area. By selecting a group of face key points as mask boundaries according to the face key point detection result, a mask characteristic mask which is more fit with the face shape is obtained, and the mask characteristic mask is not required to be additionally added with a label for supervision training, so that the problem that the mask device and the face recognition model are difficult to blend and converge due to the fact that the mask weight parameters are difficult to blend while the cost caused by obtaining the mask label is avoided.
It should be noted that when the key points are selected, the key points can be selected according to the scene corresponding to the simulated mask, so that a plurality of mask coverage areas which are more fit with different face shapes can be combined, and the specific selection of the key points is not limited in the embodiment.
In another example, generating a mask feature mask with mask coverage area features set to zero according to a face feature map and mask coverage area position information of an input image includes: acquiring an original face feature map of an ith layer output by a feature extraction network; wherein i is a positive integer; acquiring weights corresponding to all feature values in the original face feature map of the ith layer according to a preset function; according to the position information of the mask coverage area, the weight of the characteristic value of the mask coverage area is set to zero, and an i-th mask characteristic mask is generated; acquiring a face feature map of an input image, comprising: and obtaining a result of multiplication of the original face feature map of the ith layer and the mask feature mask of the ith layer according to the corresponding channel, and taking the multiplied result as the face feature map of the ith layer.
Specifically, the face recognition model to be trained is constructed by adding a mask device on the basis of a convolutional neural network model with the capability of updating parameters according to the backward propagation of the loss gradient, and for example, the face recognition model to be trained can be constructed on the basis of a VGG-Net deep convolutional neural network model, a ResNet residual convolutional neural network model, a DenseNet convolutional neural network model or the like. The feature extraction network in the face recognition model comprises multiple layers of feature extraction, so that a mask device can be added behind each layer of output of the feature extraction network, after the feature extraction network outputs an original face feature map of an ith layer, the mask device obtains the original face feature map of the ith layer, weights corresponding to feature values in the original face feature map of the ith layer are calculated according to a preset function, and then the weights of the feature values of a mask coverage area are set to zero according to mask coverage area position information, so that the mask feature mask of the ith layer is generated. After the mask feature mask of the ith layer is obtained, multiplying the original face feature map of the ith layer by the mask feature mask of the ith layer according to the corresponding channels, and taking the multiplication result according to the corresponding channels as the face feature map of the ith layer for subsequent calling. And similarly processing the output of each layer of the feature extraction network until a one-dimensional face feature vector processed by using a mask feature mask is finally output. By adding a mask device for each layer of feature network and taking the product of the mask feature mask output by the mask device and the original face feature map of the ith layer output by the feature extraction network as the face feature map of the ith layer, the interference of the degradation features of the mask shielding area on feature extraction in the feature extraction process is avoided as much as possible.
In another example, according to the mask coverage area position information, the characteristic value of the mask coverage area is set to zero; comprising the following steps: acquiring the feature size of an original face feature map of an i-th layer; performing size scaling on the mask coverage area position information according to the feature size of the original face feature map of the ith layer; and setting the weight of the characteristic value of the mask coverage area in the original face characteristic diagram of the ith layer to zero according to the mask coverage area position information after the size is scaled.
Specifically, when features are extracted from a feature extraction network in a face recognition model, the sizes of the output feature images are different, and when an ith-layer mask feature mask is generated, the feature size of an ith-layer original face feature image is firstly obtained, corresponding size scaling is carried out on mask coverage area position information according to the ratio of the feature size of the ith-layer original face feature image to the original size, and then, zero setting is carried out on the feature value weight of the mask coverage area in the ith-layer original face feature image according to the mask coverage area position information after size scaling. By carrying out corresponding size scaling on the mask coverage area position information according to the characteristic size of the original face characteristic map of the ith layer, the accuracy of mask characteristic mask generation is ensured, the influence of degradation characteristics of the mask area is eliminated, and meanwhile, the loss of the characteristics of the visible area of the face is avoided.
For example, taking a ResNet-50 model as a basic model of a face recognition model, the feature extraction network of the model comprises 5 stages, one mask is added in each of the stages 2 to 5, and a schematic diagram of the feature extraction network model after adding the mask can be referred to as fig. 4, wherein scaling of the position information of the mask coverage area is represented in a picture form. The working principle of the model added with the mask device can refer to fig. 5, taking the face feature map generation of the second stage as an example, the size of the 2 nd layer original face feature map output by the feature extraction network is h×w×c, and then the size of the initial mask feature mask generated by the mask device through the convolution layer is also h×w×c. And then the mask device performs size scaling on the mask coverage area position information according to the original size corresponding to the mask coverage area position information and the current size of the face feature map, and then performs zero setting on the weight of the feature value of the mask coverage area in the initial mask feature mask according to the mask coverage area position information after size scaling to obtain a layer 2 mask feature mask. Multiplying the mask feature mask with the input 2 nd layer original face feature map according to the corresponding channel, and taking the multiplied result as the 2 nd layer face feature map. And the 3 rd to 5 th layers are similarly processed until a one-dimensional face feature vector processed by the mask feature mask is finally generated for subsequent face recognition.
In another example, according to a preset function, obtaining weights corresponding to feature values in a face feature map includes: and normalizing the characteristic values of each point in the mask characteristic mask to be in a range of 0 to 1 according to a preset function.
Specifically, in the process of generating the mask feature mask according to the obtained face feature map, the mask device generates an initial mask feature mask through convolution calculation, and then normalizes each feature weight in the initial mask feature mask according to a preset function, for example, a sigmoid function, so that feature values in the initial mask feature mask are normalized to continuous values in the interval of [ 0,1 ]. Through normalizing the eigenvalue to the continuous value in the interval of [ 0,1 ], the model after training is not completely removed the characteristics of the mask area while weakening the characteristic contribution of the mask shielding area, thereby avoiding the information loss caused by the error of the mask shielding area, simultaneously, the eigenvalue of the real shielding area is smaller and smaller along with the increase of the network depth and the superposition of the mask operation of the mask characteristic for many times, the part with errors in the earlier stage is not completely removed, and the characteristic robustness is enhanced.
And step 103, obtaining the feature vector of the input image according to the mask feature mask and the face feature map.
Specifically, the face recognition model to be trained acquires the face feature map of the input image and the generated mask feature mask layer by layer, multiplies the face feature map and the corresponding mask feature mask according to corresponding channels, and takes the multiplied result as the face feature map of each layer. And then, carrying out layer-by-layer feature extraction and mask feature mask processing to obtain a one-dimensional face feature vector corresponding to the input image.
And 104, acquiring a loss value of a face recognition result according to the feature vector and a preset loss function, and updating parameters of the face recognition model according to the loss value.
Specifically, after the feature vector of the input image is obtained, the feature vector is used as input of a face recognition model, face identity information corresponding to the input image is used as a supervision signal, for example, a face image is input, the face recognition model predicts the identity information corresponding to the face in the input image according to the feature vector, and then a loss value of the prediction result and the supervision signal is calculated through a preset loss function, so that the loss value of the face recognition result is obtained. The loss function may be a classification loss function obtained by a Softmax function and variants thereof or a loss function based on metric learning. And then, according to the calculated loss value, carrying out gradient update based on the loss on the face recognition model. And training and updating parameters of the face recognition model by using a plurality of face images with the mask and face images without the mask as input images until the calculated loss value converges, and judging that the training of the model is completed.
The face recognition model is subjected to mixed training by using the face image of the wearing mask and the face image of the non-wearing mask until the loss value converges, so that after the face feature image is acquired by the mask in the trained face recognition model, whether the input image is the face image of the wearing mask or not can be automatically judged according to the face feature image, the mask feature mask with lower mask wearing region feature weight is adaptively generated for the face image of the wearing mask, the mask feature mask with all weights of 1 is generated for the face image of the non-wearing mask, the face feature vector of the input image is acquired according to the mask feature mask, and the accurate face recognition of the face image of the wearing mask and the face image of the non-wearing mask is completed according to the face feature vector.
Another aspect of the embodiments of the present application provides a face recognition method, where a face recognition process may refer to fig. 6, and the face recognition method includes the following steps:
step 601, obtaining a face recognition model after training.
Specifically, a face recognition model is trained in real time according to the face recognition model training method, or the face recognition model which is trained in advance according to the face recognition model training method is read at a preset storage address.
Step 602, obtaining a face image to be recognized, taking the face image to be recognized as input of a face recognition model, and obtaining a face recognition result.
Specifically, after the trained face recognition model is obtained, the face image to be recognized is obtained through communication or real-time acquisition, and the face image to be recognized is used as the input of the face recognition model to obtain the face recognition result.
Another aspect of the embodiments of the present application provides a face recognition model training device, referring to fig. 7, including:
the acquiring module 701 is configured to acquire a face feature map of an input image, where the input image includes a face image of a wearer and a face image of a non-wearer.
The generating module 702 is configured to generate a mask feature mask with a mask coverage area feature set to zero according to the face feature map and the mask coverage area location information of the input image; the mask coverage area position information is determined based on the face key point detection result.
The extracting module 703 is configured to obtain a feature vector of the input image according to the mask feature mask and the face feature map.
The training module 704 is configured to obtain a loss value of the face recognition result according to the feature vector and a preset loss function, and update parameters of the face recognition model according to the loss value.
It is easy to find that this embodiment is an embodiment of a device corresponding to an embodiment of a face recognition model training method, and this embodiment may be implemented in cooperation with an embodiment of a face recognition model training method. The related technical details mentioned in the embodiment of the face recognition model training method are still valid in this embodiment, and in order to reduce repetition, details are not repeated here. Accordingly, the related technical details mentioned in the present embodiment may also be applied to the embodiment of the face recognition model training method.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units less closely related to solving the technical problem presented by the present invention are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Another aspect of the embodiments of the present application provides a face recognition apparatus, referring to fig. 8, including:
an obtaining module 801, configured to obtain a trained face recognition model according to the face recognition model training method described above;
the recognition module 802 is configured to obtain a face image to be recognized, and take the face image to be recognized as input of a face recognition model to obtain a face recognition result.
It is to be noted that this embodiment is an embodiment of the apparatus corresponding to an embodiment of the face recognition method, and this embodiment may be implemented in cooperation with an embodiment of the face recognition method. The details of the related technologies mentioned in the embodiment of the face recognition method are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment may also be applied to the face recognition method embodiment.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units less closely related to solving the technical problem presented by the present invention are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Another aspect of the embodiments of the present application further provides an electronic device, referring to fig. 9, including: at least one processor 901; and a memory 902 communicatively coupled to the at least one processor 901; the memory 902 stores instructions executable by the at least one processor 901, where the instructions are executable by the at least one processor 901 to enable the at least one processor 901 to perform the face recognition model training method, or face recognition method, described in any one of the method embodiments above.
Where the memory 902 and the processor 901 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 901 and the memory 902 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 901 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 901.
The processor 901 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 902 may be used to store data used by processor 901 in performing operations.
Another aspect of the embodiments of the present application also provides a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments in which the present application is implemented and that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (12)

1. The human face recognition model training method is characterized by comprising the following steps of:
acquiring a face feature map of an input image, wherein the input image comprises a face image of a mask and a face image of a mask not to be worn;
generating a mask feature mask with the mask coverage area feature set to zero according to the face feature map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result;
acquiring a feature vector of the input image according to the mask feature mask and the face feature map;
acquiring a loss value of a face recognition result according to the feature vector and a preset loss function, and updating parameters of the face recognition model according to the loss value;
the feature extraction network of the face recognition model comprises multi-layer feature extraction, and the generating of the mask feature mask with the mask coverage area feature set to zero comprises the following steps:
acquiring an original face feature map of an ith layer output by a feature extraction network; wherein i is a positive integer; acquiring weights corresponding to all feature values in the original face feature map of the ith layer according to a preset function; according to the position information of the mask coverage area, the weight of the characteristic value of the mask coverage area is set to zero, and an i-th mask characteristic mask is generated;
the step of obtaining the face feature map of the input image comprises the following steps: and obtaining a result of multiplication of the original face feature map of the ith layer and the mask feature mask of the ith layer according to the corresponding channel, and taking the multiplied result as the face feature map of the ith layer.
2. The face recognition model training method according to claim 1, wherein the characteristic value of the mask coverage area is set to zero according to the mask coverage area position information; comprising the following steps:
acquiring the feature size of the original face feature map of the ith layer;
performing size scaling on the mask coverage area position information according to the feature size of the i-th layer original face feature map;
and setting the weight of the characteristic value of the mask coverage area in the original face characteristic diagram of the ith layer to zero according to the scaled mask coverage area position information.
3. The face recognition model training method according to claim 1, wherein the obtaining weights corresponding to the feature values in the face feature map according to a preset function includes:
and normalizing the characteristic values of each point in the mask characteristic mask to be in a range of 0 to 1 according to the preset function.
4. A face recognition model training method according to any one of claims 1 to 3, wherein the face image without a mask can be obtained by:
according to a preset face detection algorithm, face key points are detected on a plurality of face images without masks;
and correcting the face key points of the face images without the mask according to the face key point detection result, and generating a plurality of face images without the mask.
5. The face recognition model training method according to claim 4, wherein the face image of the wearer can be obtained by:
selecting a plurality of face images without the mask;
and adding a simulated mask to each selected face image of the non-mask according to a preset simulated mask adding algorithm, so as to generate a plurality of face images of the non-mask.
6. The face recognition model training method of claim 4, wherein the preset face detection algorithm is any one of the following detection algorithms: 68 keypoint detection algorithm, 98 keypoint detection algorithm, and 128 keypoint detection algorithm.
7. The face recognition model training method of claim 4, wherein the mask coverage area location information is obtained by:
selecting a group of face key points as boundaries of a mask coverage area according to the face key point detection result and a preset selection rule;
and taking the codes and the position information of the selected key points of the face as the position information of the covering area of the mask.
8. A face recognition method, comprising:
obtaining a trained face recognition model according to the face recognition model training method of any one of claims 1 to 7;
and acquiring a face image to be recognized, taking the face image to be recognized as the input of the face recognition model, and acquiring a face recognition result.
9. A face recognition model training device, comprising:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a face feature image of an input image, wherein the input image comprises a face image of a mask and a face image of a mask not to be worn;
the generating module is used for generating a mask characteristic mask with the mask coverage area characteristic set to zero according to the face characteristic map and the mask coverage area position information of the input image; the mask coverage area position information is determined based on a face key point detection result;
the extraction module is used for acquiring the feature vector of the input image according to the mask feature mask and the face feature map;
the training module is used for acquiring a loss value of a face recognition result according to the feature vector and a preset loss function, and carrying out parameter updating on the face recognition model according to the loss value;
the feature extraction network of the face recognition model comprises multiple layers of feature extraction, and the generation module is specifically configured to: acquiring an original face feature map of an ith layer output by a feature extraction network; wherein i is a positive integer; acquiring weights corresponding to all feature values in the original face feature map of the ith layer according to a preset function; according to the position information of the mask coverage area, the weight of the characteristic value of the mask coverage area is set to zero, and an i-th mask characteristic mask is generated;
the acquisition module is specifically configured to: and obtaining a result of multiplication of the original face feature map of the ith layer and the mask feature mask of the ith layer according to the corresponding channel, and taking the multiplied result as the face feature map of the ith layer.
10. A face recognition device, comprising:
an acquisition module for acquiring a trained face recognition model according to the face recognition model training method according to any one of claims 1 to 7;
the recognition module is used for acquiring a face image to be recognized, taking the face image to be recognized as the input of the face recognition model, and acquiring a face recognition result.
11. An electronic device, comprising: at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face recognition model training method of any one of claims 1 to 7 or the face recognition method of claim 8.
12. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the face recognition model training method of any one of claims 1 to 7, or the face recognition method of claim 8.
CN202211154023.9A 2022-09-21 2022-09-21 Face recognition and model training method and device, electronic equipment and storage medium Active CN115527254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211154023.9A CN115527254B (en) 2022-09-21 2022-09-21 Face recognition and model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211154023.9A CN115527254B (en) 2022-09-21 2022-09-21 Face recognition and model training method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115527254A CN115527254A (en) 2022-12-27
CN115527254B true CN115527254B (en) 2023-06-20

Family

ID=84699408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211154023.9A Active CN115527254B (en) 2022-09-21 2022-09-21 Face recognition and model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115527254B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3958173A4 (en) * 2020-06-24 2022-12-28 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for training facial recognition model, electronic device, and storage medium
CN111783601B (en) * 2020-06-24 2024-04-26 北京百度网讯科技有限公司 Training method and device of face recognition model, electronic equipment and storage medium
CN111932439A (en) * 2020-06-28 2020-11-13 深圳市捷顺科技实业股份有限公司 Method and related device for generating face image of mask
CN111881770B (en) * 2020-07-06 2024-05-31 上海序言泽网络科技有限公司 Face recognition method and system
WO2021174880A1 (en) * 2020-09-01 2021-09-10 平安科技(深圳)有限公司 Feature extraction model training method, facial recognition method, apparatus, device and medium
CN112818901B (en) * 2021-02-22 2023-04-07 成都睿码科技有限责任公司 Wearing mask face recognition method based on eye attention mechanism
CN113420731B (en) * 2021-08-23 2021-11-16 北京的卢深视科技有限公司 Model training method, electronic device and computer-readable storage medium
CN114373210A (en) * 2021-12-31 2022-04-19 北京工业大学 Face recognition method under mask shielding scene
CN114898450B (en) * 2022-07-14 2022-10-28 中国科学院自动化研究所 Face confrontation mask sample generation method and system based on generation model

Also Published As

Publication number Publication date
CN115527254A (en) 2022-12-27

Similar Documents

Publication Publication Date Title
US11302064B2 (en) Method and apparatus for reconstructing three-dimensional model of human body, and storage medium
CN111428581A (en) Face shielding detection method and system
CN109063584B (en) Facial feature point positioning method, device, equipment and medium based on cascade regression
CN104978764A (en) Three-dimensional face mesh model processing method and three-dimensional face mesh model processing equipment
CN112446302B (en) Human body posture detection method, system, electronic equipment and storage medium
CN113420731B (en) Model training method, electronic device and computer-readable storage medium
KR102400609B1 (en) A method and apparatus for synthesizing a background and a face by using deep learning network
CN112419170A (en) Method for training occlusion detection model and method for beautifying face image
CN112488067B (en) Face pose estimation method and device, electronic equipment and storage medium
CN111695462A (en) Face recognition method, face recognition device, storage medium and server
CN113033524B (en) Occlusion prediction model training method and device, electronic equipment and storage medium
JP2020197833A (en) Data extension system, data extension method and program
CN107844742A (en) Facial image glasses minimizing technology, device and storage medium
CN111862040B (en) Portrait picture quality evaluation method, device, equipment and storage medium
CN111127309A (en) Portrait style transfer model training method, portrait style transfer method and device
CN111680573B (en) Face recognition method, device, electronic equipment and storage medium
CN113658324A (en) Image processing method and related equipment, migration network training method and related equipment
CN111353325A (en) Key point detection model training method and device
CN109145720A (en) A kind of face identification method and device
CN111723688B (en) Human body action recognition result evaluation method and device and electronic equipment
CN115527254B (en) Face recognition and model training method and device, electronic equipment and storage medium
CN110765843B (en) Face verification method, device, computer equipment and storage medium
CN116543437A (en) Occlusion face recognition method based on occlusion-feature mapping relation
CN113012030A (en) Image splicing method, device and equipment
CN115984978A (en) Face living body detection method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230404

Address after: 230091 room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei, Anhui Province

Applicant after: Hefei lushenshi Technology Co.,Ltd.

Address before: 100083 room 3032, North B, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing

Applicant before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Applicant before: Hefei lushenshi Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant