CN111291700A

CN111291700A - Face attribute identification method, device and equipment and readable storage medium

Info

Publication number: CN111291700A
Application number: CN202010104961.2A
Authority: CN
Inventors: 赵文忠; 毛晓蛟; 章勇; 曹李军
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2020-06-16

Abstract

The invention discloses a face attribute identification method, a face attribute identification device, face attribute identification equipment and a readable storage medium, wherein the method comprises the following steps: the face attribute identification network acquires a target face image and extracts a feature map of the target face image; dividing the feature map by using a region attention mechanism to obtain feature maps of the regions; classifying the characteristic graphs of the regions to obtain classification results of the regions; and taking the classification result of each region as the attribute recognition result of the target face image. In the method, the multi-attribute recognition is carried out on the target face image, the target face image is used as input, a corresponding attribute recognition result can be obtained by using only one face attribute recognition network, a plurality of networks and classification models are not required to be arranged, and the multi-attribute recognition can be carried out on the face image on equipment with poor computing performance and storage resources.

Description

Face attribute identification method, device and equipment and readable storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a face attribute identification method, a face attribute identification device, face attribute identification equipment and a readable storage medium.

Background

The biological characteristics are the intrinsic attributes of human, have strong self-stability and individual difference, and are the most ideal basis for identity authentication. The human face is a very important biological feature, has the characteristics of complex structure, various detail changes and the like, and also contains a great deal of information such as sex, race, age, expression and the like. A normal adult can easily understand the information of the face.

A system similar to a human body is built by using an image acquisition device such as a camera and a computer, wherein the image acquisition device such as the camera is an eye, and the computer is a brain. These simple hardware facilities are not enough for the machine to perform the task of understanding the face information, and it is necessary to carry thinking ability, i.e. algorithms.

At present, an algorithm for identifying and processing a face image discriminates attributes of different regions of a face, such as whether a mask is worn, whether sunglasses are worn, whether a hat is worn, and the like. The existing identification method uses a separate network structure and classification model for each attribute, the input of the network is the corresponding face area, and the output is whether the corresponding attribute exists. If the face is judged to be worn by the mask, the face image is input through the network, and the mask is output. This identification method has the disadvantages: there are many networks and classification models and different networks use different inputs. And because one attribute corresponds to one network and classification model, when a plurality of attributes of the face are judged, a plurality of networks and classification models are needed, so that the management of the networks and the classification models is complex, the calculation amount is large, the storage space is occupied, and the deployment and the use of the models are not facilitated.

In summary, how to effectively solve the problems of face attribute recognition and the like is a technical problem that needs to be solved urgently by those skilled in the art at present.

Disclosure of Invention

The invention aims to provide a face attribute recognition method, a face attribute recognition device, face attribute recognition equipment and a readable storage medium, which are used for reducing the calculation amount and the occupied storage space and can be used for carrying out multi-attribute recognition on a face image on equipment with lower calculation capacity and smaller storage space.

In order to solve the technical problems, the invention provides the following technical scheme:

a face attribute identification method comprises the following steps:

a face attribute identification network acquires a target face image and extracts a feature map of the target face image;

segmenting the feature map by using a region attention mechanism to obtain feature maps of all regions;

classifying the characteristic graphs of the regions to obtain classification results of the regions;

and taking the classification result of each region as the attribute recognition result of the target face image.

Preferably, the step of segmenting the feature map by using a region attention mechanism to obtain the feature map of each region includes:

and utilizing the region attention mechanism and combining mapping transformation parameters to cut the feature map so as to obtain the feature map of each region.

Preferably, while clipping the feature map by using the region attention mechanism and combining affine transformation parameters, the method further includes:

rotating and/or translating the feature map.

Preferably, the face attribute recognition network includes:

a feature extraction network structure, a regional attention mechanism network structure, and at least one attribute classification branching network structure.

Preferably, the step of classifying the feature maps of the regions to obtain the classification result of the regions includes:

and respectively inputting one or more regional characteristic graphs into the corresponding attribute classification branch network structure for classification to obtain the classification result of each region.

the regional attention mechanism network structure divides the feature map into the feature maps of the regions with the attributes respectively corresponding to the feature maps of the regions; the attributes comprise at least one of whether glasses are worn, whether a mask is worn and whether a hat is worn.

Preferably, the method further comprises the following steps:

and when the face attribute recognition network is trained, calculating a loss value corresponding to the face attribute recognition network by using the loss function corresponding to each attribute classification branch network structure.

A face attribute recognition apparatus, comprising:

the characteristic image acquisition unit is used for acquiring a target face image and extracting a characteristic image of the target face image;

the characteristic region segmentation unit is used for a region attention mechanism network structure and is used for segmenting the characteristic graph by using a region attention mechanism so as to obtain characteristic graphs of the regions;

the classification identification unit is used for classifying the characteristic graphs of the regions to obtain classification results of the regions;

and the attribute identification result acquisition unit is used for taking the classification result of each region as the attribute identification result of the target face image.

A face attribute recognition device, comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the human face attribute identification method when the computer program is executed.

A readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned face attribute recognition method.

By applying the method provided by the embodiment of the invention, the face attribute identification network acquires a target face image and extracts a characteristic map of the target face image; dividing the feature map by using a region attention mechanism to obtain feature maps of the regions; classifying the characteristic graphs of the regions to obtain classification results of the regions; and taking the classification result of each region as the attribute recognition result of the target face image.

Considering different face attributes, corresponding to different areas of the face image, such as whether a hat is worn or not, for the upper half of the face image; whether a wearing mask is configured or not corresponds to the lower half part of the face image; whether the glasses are worn or not corresponds to the middle part of the face image. It is difficult to extract the features of the attributes of different regions respectively by using the same network structure (for example, in the prior art, a plurality of networks are used to extract the corresponding attribute features for the face image respectively). In the method, in order to reduce the network scale and classifier training difficulty corresponding to the multi-attribute recognition of the face image, after the feature map of the same face image is extracted, a region attention mechanism is adopted to perform region division on the feature map, so that only each region feature map is classified, the classification difficulty is reduced, the model training difficulty is also reduced, a plurality of feature extraction network architectures are not needed in a feature extraction link, and the network scale can be compressed. In the method, the multi-attribute recognition is carried out on the target face image, the target face image is used as input, a corresponding attribute recognition result can be obtained by using only one face attribute recognition network, a plurality of networks and classification models are not required to be arranged, and the multi-attribute recognition can be carried out on the face image on equipment with poor computing performance and storage resources.

Correspondingly, the embodiment of the invention also provides a face attribute recognition device, equipment and a readable storage medium corresponding to the face attribute recognition method, which have the technical effects and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for identifying human face attributes according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a face attribute recognition network structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a regional attention network structure according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a segmentation of a feature map of each region according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a face attribute recognition network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a face attribute recognition apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a face attribute recognition device in an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a face attribute recognition device in an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

referring to fig. 1, fig. 1 is a flowchart of a face attribute recognition method according to an embodiment of the present invention, where the method is applicable to a face attribute recognition network, and the face attribute recognition network may be deployed in a single computer or in a single computer system. The method comprises the following steps:

s101, acquiring a target face image by a face attribute recognition network, and extracting a feature map of the target face image.

In the embodiment of the present invention, a face attribute recognition network may be established, and the network may specifically include: a feature extraction network structure, a regional attention mechanism network structure, and at least one attribute classification branching network structure.

The feature extraction network structure can be used for extracting features of the target face image; the network structure of the regional attention mechanism can be realized; the attribute classification branch network structure is a branch structure capable of classifying and identifying a specific attribute and can be regarded as a classifier. It should be noted that, in this embodiment, the specific number of attribute classification score network structures in the face attribute recognition network and which attributes correspond to specifically are not limited. For example, when it is necessary to identify whether a face image is hat-worn or mask-worn, the attribute classification branch network structure may specifically include a classifier 1 for determining whether a hat is worn or not, and a classifier 2 for determining whether a mask is configured or not. That is, the specific number and specific judgment attributes of the attribute classification branch network structures may be configured according to the actual application scenario.

For example, referring to fig. 2, fig. 2 is a schematic diagram of a face attribute recognition network structure according to an embodiment of the present invention. Wherein c1, c2, c3 and c4 are feature extraction network structures, the region attention mechanism comprises 1 local network, the feature map is divided into 3 region feature maps (each region feature map corresponds to one region), and the region feature maps are respectively input into 3 different attribute classification branches; the attribute classification branches corresponding to the hat patterns comprise c51, c61, c71 and c81, and the attribute classification branches corresponding to the glasses patterns comprise c52, c62, c72 and c 82; the attribute classification branches corresponding to the mask patterns comprise c53, c63, c73 and c 84.

The target face image may be any face image that needs to be subjected to multi-attribute recognition, for example, the target face image may be a face image obtained by cutting a video frame acquired by a video monitoring device, may also be any face image in a face image database, and may also be a face image obtained through a network for retrieval.

After the face attribute recognition network acquires the target face image, feature extraction can be carried out on the target face image, and a feature map corresponding to the target face image can be obtained.

And S102, segmenting the feature map by using a region attention mechanism to obtain the feature map of each region.

Attention Mechanism (Attention Mechanism) stems from the study of human vision. In cognitive science, humans selectively focus on a portion of all information while ignoring other visible information due to bottlenecks in information processing, a mechanism commonly referred to as attentiveness. And (4) a regional attention mechanism, namely focusing attention on the features of a certain region in the feature map, so as to divide the feature map to obtain the feature map of each region. Note that, for the same feature map, a plurality of feature maps for respective regions may be divided.

Specifically, the feature map is clipped by using an attention mechanism and combining with the mapping transformation parameters to obtain the feature map of each region. Preferably, the feature map may be rotated and/or translated (i.e., only the feature map may be rotated, only the feature map may be translated, and the feature map may also be rotated and offset) while the feature map is cropped by using an attention mechanism in combination with affine transformation parameters, so as to obtain feature maps of respective regions, which are convenient for attribute judgment.

Namely, the regional attention mechanism network structure divides the feature map into the feature maps of the regions corresponding to the attributes respectively; the attribute includes at least one of whether glasses are worn, whether a mask is worn, and whether a hat is worn. In particular, in practical applications, the local attributes may further include whether to draw eyebrows, eyelids of eyes, pupil color, and the like.

For convenience of description, the following specifically describes how to divide the feature map by using a region attention mechanism to obtain feature maps of respective regions, taking as an example the identification of three attributes of the hat, sunglasses, and whether the hat is worn or not.

Specifically, the region attention mechanism divides the face feature map into three regions through affine transformation of the image, wherein each region corresponds to an attribute, which is respectively: the mask, the sunglasses and the hat are convenient for the attribute classification branch to extract the features corresponding to the attributes of the human face from the feature maps of the corresponding regions by utilizing the deep neural network, and finally, the judgment of the attributes of the mask, the sunglasses and the hat is completed. Referring to fig. 3, fig. 3 is a schematic diagram of a local attention network structure according to an embodiment of the present invention, and it can be seen that the local attention mechanism (i.e., the local attention mechanism network structure) includes:

and one local network, which may be a simple lightweight regression network, is used to perform two convolution operations on the input feature map, and then one complete connection regresses feature vectors (θ 1, θ 2, θ 3) of 3 segmentation regions. Each theta has 4 characteristic values which respectively correspond to the parameters of cutting and translation, namely the region transformation (image affine transformation) of the image, so that the cutting and translation of the face characteristic diagram are realized.

x^d＝θ₀x^s+θ₁

y^d＝θ₂y^s+θ₃；

In the above formula (x)^s,y^s) Is a pixel point on the original characteristic diagram, (x)^d,y^d) And representing the characteristic diagram pixel points after affine transformation. As shown in fig. 4, fig. 4 is a schematic diagram illustrating segmentation of a feature map of each region according to an embodiment of the present invention. As can be seen, the feature map can be divided based on the feature vectors of the divided regions, and the feature map of each region corresponding to the divided region can be obtained.

That is, by dividing the feature map based on the affine transformation parameter using the regional attention regional network structure, it is possible to obtain the feature maps of the respective regions corresponding only to the region of interest.

S103, classifying the feature maps of the regions to obtain classification results of the regions.

After obtaining the feature maps of the regions, the classification processing can be performed based on the feature maps of the regions, and the classification result of each region corresponding to each feature map of the regions is obtained.

Specifically, the feature maps of the respective regions may be input to the corresponding attribute classification branch network structure for classification, and a classification result of the respective regions is obtained. For example, when the attribute of whether a hat is worn needs to be identified, the feature maps of the regions corresponding to the top half of the face image where the corresponding hat is divided may be input into the attribute classification branch network structure for determining whether a hat is worn.

And S104, taking the classification result of each region as an attribute identification result of the target face image.

After obtaining the classification result of each region, the classification result of each region can be used as the attribute recognition result of the target face image.

It should be noted that, in this embodiment, before performing multi-attribute recognition on a target face image by using a face attribute recognition network, that is, before the face attribute is structured, when the face attribute recognition network is trained, a loss value corresponding to the face attribute recognition network is calculated by using a loss function corresponding to each attribute classification branch network structure. For convenience of description, the following describes in detail the training process of the face attribute recognition network shown in fig. 5. In fig. 5, each box represents a neural network layer; conv: a convolutional network layer; pooling: a pooling network layer; fc (full connected layers): the full-connection network layer plays a role of a classifier in the whole convolutional neural network; global average potential: global average pooling network layer; 2 x 2, 3 x 3, 5 x 5: the size of the nucleus is 2 x 2, 3 x 3, 5 x 5; 64. 128: the number of convolution kernels; /2: the finger step size is 2.

Each attribute classification branch network structure uses an independent loss function during training, and only softmax can be used as the loss function during training because only two classification problems (such as whether a hat is worn, whether sunglasses are worn, and whether a mask is worn) exist. For the whole network, the loss functions may be specifically obtained by simply summing or averaging the three loss functions, or by weighted averaging or weighted summing (for example, attributes of interest are given greater weights). For regional attention network architectures, a separate loss function may not need to be provided. The optimization process is performed as part of the overall network.

The training process is to input a face image with a correct attribute recognition result into the network shown in fig. 5, obtain an attribute recognition result after the processing in steps S101 to S104, calculate a loss value by using a loss function in combination with the attribute recognition result and the correct attribute recognition result, and determine whether to continue training or complete training based on the loss value. For how to specifically adjust the network parameters based on the loss values, reference may be made to a common recognition network training process, which is not described in detail herein. For the judgment condition of the completion of the training, setting may be performed based on the loss value, for example, the loss value is smaller than a certain threshold, or the training duration or the training round reaches a preset upper limit as the judgment condition.

Considering different face attributes, corresponding to different areas of the face image, such as whether a hat is worn or not, for the upper half of the face image; whether a wearing mask is configured or not corresponds to the lower half part of the face image; whether the glasses are worn or not corresponds to the middle part of the face image. It is difficult to extract the features of the attributes of different areas by using the same network structure (for example, in the prior art, a plurality of networks are used to extract the corresponding attribute features respectively). In the method, in order to reduce the network scale and classifier training difficulty corresponding to the multi-attribute recognition of the face image, after the feature map of the same face image is extracted, a region attention mechanism is adopted to perform region division on the feature map, so that only each region feature map is classified, the classification difficulty is reduced, the model training difficulty is also reduced, a plurality of feature extraction network architectures are not needed in a feature extraction link, and the network scale can be compressed. In the method, the multi-attribute recognition is carried out on the target face image, the target face image is used as input, a corresponding attribute recognition result can be obtained by using only one face attribute recognition network, a plurality of networks and classification models are not required to be arranged, and the multi-attribute recognition can be carried out on the face image on equipment with poor computing performance and storage resources.

Example two:

corresponding to the above method embodiment, the embodiment of the present invention further provides a face attribute recognition apparatus, and the following face attribute recognition apparatus and the above described face attribute recognition method may be referred to in correspondence with each other.

Referring to fig. 6, the apparatus includes the following modules:

a feature map acquisition unit 101, configured to acquire a target face image and extract a feature map of the target face image;

a feature region segmentation unit 102, configured to segment a feature map by using a region attention mechanism to obtain a feature map of each region;

a classification identification unit 103, configured to classify the feature maps of the regions to obtain a classification result of each region;

and an attribute recognition result acquisition unit 104, configured to take the classification result of each region as an attribute recognition result of the target face image.

By applying the device provided by the embodiment of the invention, the face attribute identification network acquires a target face image and extracts a characteristic diagram of the target face image; dividing the feature map by using a region attention mechanism to obtain feature maps of the regions; classifying the characteristic graphs of the regions to obtain classification results of the regions; and taking the classification result of each region as the attribute recognition result of the target face image.

Considering different face attributes, corresponding to different areas of the face image, such as whether a hat is worn or not, for the upper half of the face image; whether a wearing mask is configured or not corresponds to the lower half part of the face image; whether the glasses are worn or not corresponds to the middle part of the face image. It is difficult to extract the features of the attributes of different areas by using the same network structure (for example, in the prior art, a plurality of networks are used to extract the corresponding attribute features respectively). In the device, in order to reduce the network scale and the classifier training difficulty corresponding to the multi-attribute recognition of the face images, after the feature maps of the same face image are extracted, a region attention mechanism is adopted to perform region division on the feature maps so as to classify only each region feature map, reduce the classification difficulty and also reduce the difficulty of model training, and a plurality of feature extraction network architectures are not needed in a feature extraction link, so that the network scale can be compressed. In the device, multi-attribute recognition is carried out on the target face image, the target face image is used as input, a corresponding attribute recognition result can be obtained by using only one face attribute recognition network, a plurality of networks and classification models are not required to be arranged, and multi-attribute recognition can be carried out on the face image on equipment with poor computing performance and storage resources.

In an embodiment of the present invention, the feature region segmentation unit 102 is specifically configured to crop the feature map by using an attention mechanism and combining the mapping transformation parameters to obtain the feature map of each region.

In an embodiment of the present invention, the feature region segmentation unit 102 is specifically configured to rotate and/or translate the feature map while clipping the feature map with an attention mechanism in combination with affine transformation parameters.

In a specific embodiment of the present invention, a face attribute recognition network may be deployed in the apparatus, where the face attribute recognition network includes: a feature extraction network structure, a regional attention mechanism network structure, and at least one attribute classification branching network structure.

In an embodiment of the present invention, the classification identifying unit 103 is specifically configured to input each region feature map into a corresponding attribute classification branch network structure for classification, so as to obtain a classification result of each region.

In an embodiment of the present invention, the feature region segmentation unit 102 is specifically configured to segment the feature map into feature maps of regions corresponding to respective attributes by using a region attention mechanism network structure; the attribute comprises at least one local attribute of whether glasses are worn, whether a mask is worn and whether a hat is worn.

In one embodiment of the present invention, the method further comprises:

and the face attribute recognition network training module is used for calculating loss values corresponding to the face attribute recognition network by using loss functions respectively corresponding to the attribute classification branch network structures when the face attribute recognition network is trained.

Example three:

corresponding to the above method embodiment, the embodiment of the present invention further provides a face attribute recognition device, and a face attribute recognition device described below and a face attribute recognition method described above may be referred to in a corresponding manner.

Referring to fig. 7, the face attribute recognition apparatus includes:

a memory D1 for storing computer programs;

and a processor D2, configured to implement the steps of the face attribute identification method of the above-mentioned method embodiment when executing the computer program.

Specifically, referring to fig. 8, a schematic diagram of a specific structure of a face attribute recognition device provided in this embodiment is shown, where the face attribute recognition device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing an application 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the face attribute recognition device 301.

The face attribute recognition device 301 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341. Such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps in the above-described face attribute recognition method may be implemented by the structure of a face attribute recognition apparatus.

Example four:

corresponding to the above method embodiment, an embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a face attribute identification method described above may be referred to in a mutually corresponding manner.

A readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the face attribute identification method of the above-mentioned method embodiment.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. A face attribute recognition method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of segmenting the feature map by using a region attention mechanism to obtain the feature map of each region comprises:

3. The face attribute recognition method according to claim 2, wherein, while clipping the feature map by using the region attention mechanism in combination with affine transformation parameters, the method further comprises:

rotating and/or translating the feature map.

4. The face attribute recognition method according to any one of claims 1 to 3, wherein the face attribute recognition network includes:

5. The method for identifying human face attributes according to claim 4, wherein the step of classifying the feature maps of the regions to obtain the classification result of the regions comprises:

6. The method according to claim 4, wherein the step of segmenting the feature map by using a region attention mechanism to obtain the feature map of each region comprises:

7. The face attribute recognition method of claim 4, further comprising:

8. A face attribute recognition apparatus, comprising:

9. A face attribute recognition apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the face attribute recognition method according to any one of claims 1 to 7 when executing the computer program.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the face attribute recognition method according to any one of claims 1 to 7.