CN111401294B - Multi-task face attribute classification method and system based on adaptive feature fusion - Google Patents

Multi-task face attribute classification method and system based on adaptive feature fusion Download PDF

Info

Publication number
CN111401294B
CN111401294B CN202010228805.7A CN202010228805A CN111401294B CN 111401294 B CN111401294 B CN 111401294B CN 202010228805 A CN202010228805 A CN 202010228805A CN 111401294 B CN111401294 B CN 111401294B
Authority
CN
China
Prior art keywords
fusion
adaptive feature
feature fusion
face
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010228805.7A
Other languages
Chinese (zh)
Other versions
CN111401294A (en
Inventor
崔超然
申朕
黄瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Finance and Economics
Original Assignee
Shandong University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Finance and Economics filed Critical Shandong University of Finance and Economics
Priority to CN202010228805.7A priority Critical patent/CN111401294B/en
Publication of CN111401294A publication Critical patent/CN111401294A/en
Application granted granted Critical
Publication of CN111401294B publication Critical patent/CN111401294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/179Human faces, e.g. facial parts, sketches or expressions metadata assisted face recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multitask face attribute classification method and a multitask face attribute classification system based on self-adaptive feature fusion, wherein the method comprises the following steps of: acquiring a face image to be classified; carrying out preprocessing operation on the face image to be classified; inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute. According to the method, the self-adaptive feature fusion layer is constructed, network branches of different tasks are connected to form a uniform multi-task deep convolution neural network, so that information can be effectively shared among the different tasks, and the classification accuracy effect is remarkably improved.

Description

Multitask face attribute classification method and system based on self-adaptive feature fusion
Technical Field
The disclosure relates to the technical field of computer vision and machine learning, in particular to a multitask face attribute classification method and system based on adaptive feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, deep convolutional neural networks have achieved breakthrough in many computer vision tasks, such as target detection, semantic segmentation, depth prediction, and the like. The multi-task deep convolution neural network aims to process a plurality of related tasks together, improves the learning efficiency, and meanwhile improves the prediction accuracy and generalization performance through the characteristic interaction between tasks to prevent overfitting.
When a multitask deep convolution neural network is implemented, the most common scheme is to construct a network architecture based on parameter hard sharing. In this scheme, different tasks share a lower network layer and maintain respective branches at the higher network layer. Prior to training, the shared network layer needs to be manually specified by experience. This approach lacks theoretical guidance, and an unreasonable choice for the shared network layer may also lead to a severe degradation of the performance of the method.
In view of this, many researchers have proposed automatically building shared network layers by learning optimal feature combinations for different tasks on a single network layer, thereby avoiding the complex enumeration and model training processes when parameters are shared hard.
For example, in the Cross Stitch method (see IshanMisra, AbhinavShrivastava, Abhinav Gupta, and Martial Hebert. Cross-batch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3994-; in the NDDR method (see Yuan Gao, Jianyi Ma, Mingbo Zhao, Wei Liu, and Alan L you, Nddr-cnn: Layer with feature fusion in multi-task CNNs by neural discrete dimensional reconstruction, in Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pages 3205 and 3214,2019), researchers stack feature maps from different tasks along the channel dimensions and reduce them using a1 × 1 convolution to meet the feature map channel size requirements of subsequent network branches.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
although the above works have been demonstrated in experiments to achieve better performance, they are essentially all learning to construct a fixed feature fusion strategy. After training is complete, all input samples correspond to the same set of feature fusion weights. And the characteristics of the image cannot be well expressed by the features after feature fusion.
Disclosure of Invention
In order to solve the defects of the prior art, the disclosure provides a multitask face attribute classification method and system based on self-adaptive feature fusion; in the multi-task face attribute classification, for some samples, the features needing to be fused among tasks may be very similar; while for other samples, the features may be very different or even complementary to each other. Therefore, when the feature fusion of the multi-task learning is carried out, the characteristics of the features to be fused should be fully considered. Based on the inspiration, the disclosure introduces a dynamic feature fusion mechanism when designing a multitask deep convolution neural network, and adaptively fuses features according to the dependency relationship among the features to realize the sharing and interaction of the features among tasks.
In a first aspect, the present disclosure provides a multitask face attribute classification method based on adaptive feature fusion;
the multitask face attribute classification method based on the self-adaptive feature fusion comprises the following steps:
acquiring a face image to be classified;
carrying out preprocessing operation on the face image to be classified;
inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a second aspect, the present disclosure provides a multitask face attribute classification system based on adaptive feature fusion;
a multitask face attribute classification system based on self-adaptive feature fusion comprises the following steps:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the beneficial effect of this disclosure is:
the method and the device take the relation among different task feature maps in the multitask deep convolution neural network into consideration, namely, when feature fusion is carried out, the degree of sharing or retaining feature information is determined according to the characteristics of the feature maps.
When the method is realized, a self-adaptive feature fusion layer is constructed, network branches of different tasks are connected to form a uniform multi-task deep convolution neural network, so that information can be effectively shared among the different tasks, and the classification accuracy effect is improved remarkably.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart of a deep multi-task learning method based on adaptive feature fusion according to a first embodiment of the present disclosure.
FIG. 2 is a schematic diagram of a network branch connecting two tasks by using an adaptive feature fusion layer to form a unified multitask deep convolutional neural network according to a first embodiment of the present disclosure;
FIG. 3 is a schematic view of the internal connection of a feature fusion layer according to the first embodiment of the disclosure;
fig. 4 is a schematic diagram of an internal connection relationship of a channel level fusion module according to a first embodiment of the present disclosure;
fig. 5 is a schematic diagram of a spatial hierarchy fusion module according to a first embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiment I provides a multitask face attribute classification method based on self-adaptive feature fusion;
as shown in fig. 1, the multi-task face attribute classification method based on adaptive feature fusion includes:
s1: acquiring a face image to be classified;
s2: carrying out preprocessing operation on the face image to be classified;
s3: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
As one or more embodiments, the preprocessing operation specifically includes:
first, all images are scaled to 224 × 224 pixels;
and then, calculating the pixel average value of the training set image, and subtracting the pixel average value from each face image to be classified to perform normalization operation.
As one or more embodiments, the obtaining of the multitask face attribute classification model based on the adaptive feature fusion includes:
constructing a multitask neural network model based on self-adaptive feature fusion;
constructing a training set, wherein the training set comprises: the method comprises the following steps of (1) obtaining a plurality of face images, wherein each face image comprises at least two known attributes;
carrying out preprocessing operation on the images of the training set, comprising: first, all images are scaled to 224 × 224 pixels; then, calculating the pixel average value of the training set images, and enabling each image to subtract the average value to perform normalization operation; finally, before each training, performing horizontal inversion and Gaussian blur processing on the training image according to a set probability;
training the multi-task neural network model based on the adaptive feature fusion by using the image after the preprocessing operation to obtain a trained multi-task neural network model based on the adaptive feature fusion; namely, the multi-task human face attribute classification model based on the self-adaptive feature fusion.
The beneficial effects of the above technical scheme are: through the preprocessing step, the number of training samples can be effectively expanded, and the diversity of the training samples is improved.
It is to be understood that the known attributes, at least, include one or more of the following examples: age, gender, expression, etc.
It should be appreciated that in this embodiment, the Adience dataset is selected to perform the task of age classification and gender classification on the face image simultaneously. In an Adience data set, the age classification tasks are divided into eight categories of 0-2, 4-6, 8-12, 15-20, 25-32, 38-43, 48-53 and 60 +; gender classification comprises a male and a female category together;
it should be understood that the criterion for model training is that the loss function reaches a minimum. Defining the loss in gender classification as L using a cross-entropy loss functionageThe loss in age classification is LsexThen the total loss function is L ═ λ Lage+Lsex. Wherein, lambda is a hyperparameter of two types of losses of the balance model. Considering that gender classification is a two-classification problem and age classification is a multiple-classification problem, the value of λ is set to 1/2. Training the network by adopting a random gradient descent algorithm, and determining the network weight which can minimize the loss function;
as one or more embodiments, the adaptive feature fusion based multitasking neural network model comprises:
two network branches in parallel: a first network branch and a second network branch;
a first network branch comprising: the system comprises a convolution layer group A1, a convolution layer group A2, a convolution layer group A3, a convolution layer group A4, a convolution layer group A5, a full connection layer A6 and a softmax layer A7 which are connected in sequence;
a second network branch comprising: a convolution layer group B1, a convolution layer group B2, a convolution layer group B3, a convolution layer group B4, a convolution layer group B5, a full connection layer B6 and a Softmax layer B7 which are connected in sequence;
and the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers.
Further, the convolution layer group corresponding to the first network branch and the second network branch is connected through four adaptive feature fusion layers, which specifically includes:
the output end of the convolution layer group A1 and the output end of the convolution layer group B1 are both connected with the input end of the first adaptive characteristic fusion layer;
an input end of the convolution layer group A2 and an input end of the convolution layer group B2 are both connected with an output end of the first adaptive characteristic fusion layer;
the output end of the convolution layer group A2 and the output end of the convolution layer group B2 are both connected with the input end of the second adaptive characteristic fusion layer;
the input end of the convolution layer group A3 and the input end of the convolution layer group B3 are both connected with the output end of the second adaptive characteristic fusion layer;
the output end of the convolution layer group A3 and the output end of the convolution layer group B3 are both connected with the input end of the third adaptive characteristic fusion layer;
the input end of the convolution layer group A4 and the input end of the convolution layer group B4 are both connected with the output end of the third adaptive characteristic fusion layer;
the output end of the convolution layer group A4 and the output end of the convolution layer group B4 are both connected with the input end of the fourth adaptive characteristic fusion layer;
an input of the convolution layer group a5 and an input of the convolution layer group B5 are both connected to an output of the fourth adaptive feature fusion layer.
It should be understood that the working principle of the above multitask neural network model based on adaptive feature fusion is as follows:
the first network branch and the second network branch receive the same input image, the first network branch is responsible for classifying the age of the face in the input image, the second network branch is responsible for classifying the gender of the face in the input image, and the output of the network branches represents the probability that the input image belongs to each category on the corresponding attribute;
the first network branch and the second network branch are identical in structure and are based on the ResNet101 network structure (see Kaiming He, Xiangyu Zhuang, Shaoqingren, and Jianan Sun. deep residual learning for image Recognition in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016). Each network branch consists of five convolutional layer groups, one fully-connected layer and one softmax layer. Wherein each convolution layer group comprises a plurality of continuous convolution layers and a maximum pooling layer.
And respectively introducing a first adaptive feature fusion layer, a second adaptive feature fusion layer and a fourth adaptive feature fusion layer, and connecting the convolution layer groups corresponding to the first network branch and the second network branch, thereby realizing feature interaction between two tasks and constructing a uniform multi-task deep convolution neural network, wherein the structure of the network is shown in figure 2.
Further, the fully-connected layer a6 of the first network branch performs nonlinear transformation on the input feature map, and maps the feature map into a column vector; the dimension of the column vector is equal to the number of categories on the age attribute, and each dimension corresponds to a specific age category;
further, the fully-connected layer B6 of the second network branch performs nonlinear transformation on the input feature map, and maps it into a column vector; the dimensions of the column vector are equal to the number of categories on the gender attribute, with each dimension corresponding to a particular gender category.
Further, Softmax layer a7 of the first network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of age attribute;
further, Softmax layer B7 of the second network branch converts each dimension of the input vector into a probability value representing the probability of the input image on each category of gender attribute;
for one or more embodiments, the first adaptive feature fusion layer, the second adaptive feature fusion layer, the third adaptive feature fusion layer, and the fourth adaptive feature fusion layer are identical in structure.
As one or more embodiments, as shown in fig. 3, the first adaptive feature fusion layer includes:
the system comprises a channel level fusion module and a space level fusion module which are sequentially connected, wherein the input end of the channel level fusion module is the input end of the current adaptive feature fusion layer; and the output end of the spatial hierarchy fusion module is the output end of the current adaptive feature fusion layer.
As one or more embodiments, the channel hierarchy fusion module includes:
the first average pooling layer and the second average pooling layer are parallel;
the output ends of the first average pooling layer and the second average pooling layer are connected with the series unit;
the series unit is connected with the first full connection layer, and the first full connection layer is connected with the second full connection layer;
the second full-connection layer is respectively connected with the third full-connection layer and the fourth full-connection layer;
the third full connection layer is connected with the first Softmax function layer;
the fourth full connection layer is connected with the second Softmax function layer;
the first Softmax function layer is respectively connected with the first multiplier and the second multiplier;
the second Softmax function layer is connected with the third multiplier and the fourth multiplier respectively;
the first multiplier and the second multiplier are both connected with the first adder;
the third multiplier and the fourth multiplier are both connected with the second adder.
As one or more embodiments, as shown in fig. 4, the channel level fusion module works according to the following principles:
firstly, in a channel level fusion module, inputting original feature maps x of two network branchesAAnd xBRespectively carrying out average pooling along the channel dimension to obtain
Figure BDA0002428644860000091
And
Figure BDA0002428644860000092
and will be
Figure BDA0002428644860000093
And
Figure BDA0002428644860000094
are connected together;
then, the connected results are subjected to dimensionality reduction processing respectively through a first full connection layer and a second full connection layer to obtain two guide vectors
Figure BDA0002428644860000101
And
Figure BDA0002428644860000102
make it possible to
Figure BDA0002428644860000103
Through the third full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure BDA0002428644860000104
And
Figure BDA0002428644860000105
make it
Figure BDA0002428644860000106
Through the fourth full connection layer, x is obtainedAAnd xBAre respectively paired withCorresponding fusion weight vector
Figure BDA0002428644860000107
And
Figure BDA0002428644860000108
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002428644860000109
and
Figure BDA00024286448600001010
is equal to the original feature map xAThe number of the channels of (a) is,
Figure BDA00024286448600001011
and
Figure BDA00024286448600001012
is equal to the original feature map xBThe number of channels of (a);
will be provided with
Figure BDA00024286448600001013
And
Figure BDA00024286448600001014
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001015
Will be provided with
Figure BDA00024286448600001016
And
Figure BDA00024286448600001017
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001018
Finally, the original feature map is compared withMultiplying and adding the fusion weight vectors to respectively obtain
Figure BDA00024286448600001019
And
Figure BDA00024286448600001020
namely that
Figure BDA00024286448600001021
Will be provided with
Figure BDA00024286448600001022
And
Figure BDA00024286448600001023
input to the spatial hierarchy fusion module.
As one or more embodiments, the spatial hierarchy fusion module includes:
the third average pooling layer and the fourth average pooling layer are arranged in parallel;
the output ends of the third average pooling layer and the fourth average pooling layer are connected with the stacking unit;
the stacking unit is connected with the first convolution layer and the second convolution layer respectively;
the first convolution layer is connected with the fifth full-connection layer, and the second convolution layer is connected with the sixth full-connection layer;
the fifth full connection layer is connected with the third Softmax function layer; the sixth full connection layer is connected with the fourth Softmax function layer;
the third Softmax function layer is respectively connected with the fifth multiplier and the sixth multiplier;
the fourth Softmax function layer is connected with the seventh multiplier and the eighth multiplier respectively;
the fifth multiplier and the sixth multiplier are both connected with the third adder;
the seventh multiplier and the eighth multiplier are both connected with the fourth adder.
As one or more embodiments, as shown in fig. 5, the spatial hierarchy fusion module operates according to the following principle:
firstly, in a spatial hierarchy fusion module, an input feature map is input
Figure BDA0002428644860000111
And
Figure BDA0002428644860000112
respectively carrying out average pooling along the spatial dimension to obtain
Figure BDA0002428644860000113
And
Figure BDA0002428644860000114
and will be
Figure BDA0002428644860000115
And
Figure BDA0002428644860000116
stacking together;
then, the stacked results are respectively passed through two convolution layers, each convolution layer has only one convolution kernel of 1 × 1, so as to obtain two guide matrixes
Figure BDA0002428644860000117
And
Figure BDA0002428644860000118
will be provided with
Figure BDA0002428644860000119
Vectorizing and passing through a full connection layer to obtain
Figure BDA00024286448600001110
And respectively corresponding fusion weight vectors
Figure BDA00024286448600001111
And
Figure BDA00024286448600001112
will be provided with
Figure BDA00024286448600001113
Vectorizing and passing through a full connection layer to obtain
Figure BDA00024286448600001114
And
Figure BDA00024286448600001115
respectively corresponding fusion weight vector
Figure BDA00024286448600001116
And
Figure BDA00024286448600001117
will be provided with
Figure BDA00024286448600001118
And
Figure BDA00024286448600001119
matrixing them to a size equal to the input profile
Figure BDA00024286448600001120
The size of the space of (a).
Will be provided with
Figure BDA00024286448600001121
And
Figure BDA00024286448600001122
matrixing them to a size equal to the input profile
Figure BDA00024286448600001123
The spatial dimensions of (a);
will be provided with
Figure BDA00024286448600001124
And
Figure BDA00024286448600001125
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001126
Will be provided with
Figure BDA00024286448600001127
And
Figure BDA00024286448600001128
performing Softmax operations on the corresponding position elements two by two, such that
Figure BDA00024286448600001129
Finally, multiplying and adding the input characteristic graph and the fusion weight vector to respectively obtain
Figure BDA00024286448600001130
And
Figure BDA00024286448600001131
namely, it is
Figure BDA00024286448600001132
Will be provided with
Figure BDA00024286448600001133
And
Figure BDA00024286448600001134
into the next convolution layer group of the first network branch and the second network branch, respectively.
The method takes the relation among different task characteristic graphs in the multitask deep convolution neural network into consideration, namely when the characteristic fusion is carried out, the degree of sharing or retaining the characteristic information is determined according to the characteristics of the characteristic graphs, and the self-adaptive characteristic fusion is realized.
The second embodiment provides a multitask face attribute classification system based on adaptive feature fusion;
a multitask face attribute classification system based on self-adaptive feature fusion comprises:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute.
In a third embodiment, the present invention further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the method in the first embodiment.
In a fourth embodiment, the present embodiment further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, implement the method of the first embodiment.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (9)

1. The multitask face attribute classification method based on the self-adaptive feature fusion is characterized by comprising the following steps:
acquiring a face image to be classified;
carrying out preprocessing operation on the face image to be classified;
inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute;
the method for acquiring the multitask face attribute classification model based on the adaptive feature fusion comprises the following steps:
constructing a multitask neural network model based on self-adaptive feature fusion;
the multitask neural network model based on the self-adaptive feature fusion comprises the following steps:
two network branches in parallel: a first network branch and a second network branch;
the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers;
the adaptive feature fusion layer comprises:
the channel level fusion module and the spatial level fusion module are connected in sequence;
the channel level fusion module has the working principle that:
firstly, in a channel level fusion module, inputting original feature maps x of two network branchesAAnd xBRespectively carrying out average pooling along the channel dimension to obtain
Figure FDA0003680926250000011
And
Figure FDA0003680926250000012
and will be
Figure FDA0003680926250000013
And
Figure FDA0003680926250000014
are connected together;
then, the connected results are subjected to dimensionality reduction processing respectively through a first full-connection layer and a second full-connection layer to obtain two guide vectors
Figure FDA0003680926250000015
And
Figure FDA0003680926250000016
make it possible to
Figure FDA0003680926250000021
Through the third full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure FDA0003680926250000022
And
Figure FDA0003680926250000023
make it
Figure FDA0003680926250000024
Through the fourth full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure FDA0003680926250000025
And
Figure FDA0003680926250000026
wherein the content of the first and second substances,
Figure FDA0003680926250000027
and
Figure FDA0003680926250000028
is equal to the original feature map xAThe number of the channels of (a) is,
Figure FDA0003680926250000029
and
Figure FDA00036809262500000210
is equal to the original feature map xBThe number of channels of (a);
will be provided with
Figure FDA00036809262500000211
And
Figure FDA00036809262500000212
softmax operations are performed two by two on the corresponding location elements such that
Figure FDA00036809262500000213
Will be provided with
Figure FDA00036809262500000214
And
Figure FDA00036809262500000215
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA00036809262500000216
Finally, multiplying and adding the original characteristic graph and the fusion weight vector to respectively obtain
Figure FDA00036809262500000217
And
Figure FDA00036809262500000218
namely, it is
Figure FDA00036809262500000219
And
Figure FDA00036809262500000220
will be provided with
Figure FDA00036809262500000221
And
Figure FDA00036809262500000222
input to the spatial hierarchy fusion module.
2. The method according to claim 1, wherein the preprocessing operation comprises:
first, all images are scaled to 224 × 224 pixels;
and then, calculating the pixel average value of the training set image, and subtracting the pixel average value from each human face image to be classified to perform normalization operation.
3. The method of claim 1, wherein the obtaining of the multi-tasking face attribute classification model based on adaptive feature fusion further comprises:
constructing a training set, wherein the training set comprises: the method comprises the following steps of (1) obtaining a plurality of face images, wherein each face image comprises at least two known attributes;
preprocessing the images in the training set, including: first, all images are scaled to 224 × 224 pixels; then, calculating the pixel average value of the training set images, and enabling each image to subtract the average value to perform normalization operation; finally, before each training, carrying out horizontal turning and Gaussian fuzzy processing on the training image according to a set probability;
training the multi-task neural network model based on the adaptive feature fusion by using the image after the preprocessing operation to obtain a trained multi-task neural network model based on the adaptive feature fusion; namely, the multi-task human face attribute classification model based on the self-adaptive feature fusion.
4. The method as set forth in claim 3,
a first network branch comprising: the system comprises a convolution layer group A1, a convolution layer group A2, a convolution layer group A3, a convolution layer group A4, a convolution layer group A5, a full connection layer A6 and a softmax layer A7 which are connected in sequence;
a second network branch comprising: connected in sequence are convolution layer group B1, convolution layer group B2, convolution layer group B3, convolution layer group B4, convolution layer group B5, full connection layer B6 and Softmax layer B7.
5. The method as set forth in claim 4, wherein,
the working principle of the multitask neural network model based on the self-adaptive feature fusion is as follows:
the first network branch and the second network branch receive the same input image, the first network branch is responsible for classifying the age of the face in the input image, the second network branch is responsible for classifying the gender of the face in the input image, and the output of the network branches represents the probability that the input image belongs to each category on the corresponding attribute;
the input end of the channel level fusion module is the input end of the current adaptive feature fusion layer; and the output end of the spatial hierarchy fusion module is the output end of the current self-adaptive feature fusion layer.
6. The method of claim 5, wherein the spatial hierarchy fusion module operates on a principle comprising:
firstly, in a spatial hierarchy fusion module, an input feature map is input
Figure FDA0003680926250000041
And
Figure FDA0003680926250000042
respectively performing average pooling along spatial dimension to obtain
Figure FDA0003680926250000043
And
Figure FDA0003680926250000044
and will be
Figure FDA0003680926250000045
And
Figure FDA0003680926250000046
stacking together;
then, the stacked results are passed through two convolutional layers, respectively, in each convolutional layerWith only one 1 × 1 convolution kernel, two steering matrices are obtained
Figure FDA0003680926250000047
And
Figure FDA0003680926250000048
will be provided with
Figure FDA0003680926250000049
Vectorizing and passing through a full connection layer to obtain
Figure FDA00036809262500000410
And respectively corresponding fusion weight vectors
Figure FDA00036809262500000411
And
Figure FDA00036809262500000412
will be provided with
Figure FDA00036809262500000413
Vectorizing and passing through a full connection layer to obtain
Figure FDA00036809262500000414
And
Figure FDA00036809262500000415
respectively corresponding fusion weight vector
Figure FDA00036809262500000416
And
Figure FDA00036809262500000417
will be provided with
Figure FDA00036809262500000418
And
Figure FDA00036809262500000419
matrixing them to a size equal to the input profile
Figure FDA00036809262500000420
The spatial dimension of (a);
will be provided with
Figure FDA00036809262500000421
And
Figure FDA00036809262500000422
matrixing them to a size equal to the input profile
Figure FDA00036809262500000423
The spatial dimensions of (a);
will be provided with
Figure FDA00036809262500000424
And
Figure FDA00036809262500000425
softmax operations are performed two by two on the corresponding location elements such that
Figure FDA00036809262500000426
Will be provided with
Figure FDA00036809262500000427
And
Figure FDA00036809262500000428
performing Softmax operations on the corresponding position elements two by two, such that
Figure FDA00036809262500000429
Finally, input feature map is merged withMultiplying and adding the resultant weight vectors to obtain
Figure DEST_PATH_FDA0002428644850000047
And
Figure DEST_PATH_FDA0002428644850000048
namely that
Figure DEST_PATH_FDA0002428644850000049
And
Figure DEST_PATH_FDA00024286448500000410
will be provided with
Figure DEST_PATH_FDA00024286448500000411
And
Figure DEST_PATH_FDA00024286448500000412
respectively into the next set of convolution layers of the first network branch and the second network branch.
7. The multitask face attribute classification system based on the self-adaptive feature fusion is characterized by comprising the following steps:
an acquisition module configured to: acquiring a face image to be classified;
a pre-processing module configured to: carrying out preprocessing operation on the face image to be classified;
a classification module configured to: inputting the preprocessed face images to be classified into a multitask face attribute classification model based on self-adaptive feature fusion to obtain the probability of different classes of the images on each face attribute, and selecting the class with the maximum probability as a classification result on the corresponding attribute;
the method for acquiring the multitask face attribute classification model based on the adaptive feature fusion comprises the following steps:
constructing a multitask neural network model based on self-adaptive feature fusion;
the multitask neural network model based on the adaptive feature fusion comprises the following steps:
two network branches in parallel: a first network branch and a second network branch;
the convolution layer groups corresponding to the first network branch and the second network branch are connected through four self-adaptive feature fusion layers;
the adaptive feature fusion layer comprises:
the channel level fusion module and the space level fusion module are sequentially connected;
the channel level fusion module has the working principle that:
firstly, in a channel level fusion module, inputting original feature maps x of two network branchesAAnd xBRespectively carrying out average pooling along the channel dimension to obtain
Figure FDA0003680926250000051
And
Figure FDA0003680926250000052
and will be
Figure FDA0003680926250000053
And
Figure FDA0003680926250000054
are connected together;
then, the connected results are subjected to dimensionality reduction processing respectively through a first full connection layer and a second full connection layer to obtain two guide vectors
Figure FDA0003680926250000055
And
Figure FDA0003680926250000056
make it
Figure FDA0003680926250000057
Through the third stepConnecting the layers to obtain xAAnd xBRespectively corresponding fusion weight vector
Figure FDA0003680926250000058
And
Figure FDA0003680926250000059
make it possible to
Figure FDA0003680926250000061
Through the fourth full connection layer, x is obtainedAAnd xBRespectively corresponding fusion weight vector
Figure FDA0003680926250000062
And
Figure FDA0003680926250000063
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003680926250000064
and
Figure FDA0003680926250000065
is equal to the original feature map xAThe number of the channels of (a) is,
Figure FDA0003680926250000066
and
Figure FDA0003680926250000067
is equal to the original feature map xBThe number of channels of (a);
will be provided with
Figure FDA0003680926250000068
And
Figure FDA0003680926250000069
carried out two by two on corresponding position elementsSoftmax operates so that
Figure FDA00036809262500000610
Will be provided with
Figure FDA00036809262500000611
And
Figure FDA00036809262500000612
softmax operations are performed two by two on the corresponding location elements such that
Figure FDA00036809262500000613
Finally, multiplying and adding the original characteristic graph and the fusion weight vector to respectively obtain
Figure FDA00036809262500000614
And
Figure FDA00036809262500000615
namely, it is
Figure FDA00036809262500000616
And
Figure FDA00036809262500000617
will be provided with
Figure FDA00036809262500000618
And
Figure FDA00036809262500000619
input to the spatial hierarchy fusion module.
8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of any of the methods of claims 1-6.
9. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 6.
CN202010228805.7A 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion Active CN111401294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010228805.7A CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010228805.7A CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Publications (2)

Publication Number Publication Date
CN111401294A CN111401294A (en) 2020-07-10
CN111401294B true CN111401294B (en) 2022-07-15

Family

ID=71432935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010228805.7A Active CN111401294B (en) 2020-03-27 2020-03-27 Multi-task face attribute classification method and system based on adaptive feature fusion

Country Status (1)

Country Link
CN (1) CN111401294B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832522B (en) * 2020-07-21 2024-02-27 深圳力维智联技术有限公司 Face data set construction method, system and computer readable storage medium
CN112215157B (en) * 2020-10-13 2021-05-25 北京中电兴发科技有限公司 Multi-model fusion-based face feature dimension reduction extraction method
CN112651960A (en) * 2020-12-31 2021-04-13 上海联影智能医疗科技有限公司 Image processing method, device, equipment and storage medium
CN112784776B (en) * 2021-01-26 2022-07-08 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402B (en) * 2016-09-27 2019-05-28 中国科学院自动化研究所 The face character analysis method of convolutional neural networks based on multi-task learning
CN106815566B (en) * 2016-12-29 2021-04-16 天津中科智能识别产业技术研究院有限公司 Face retrieval method based on multitask convolutional neural network
CN107766850B (en) * 2017-11-30 2020-12-29 电子科技大学 Face recognition method based on combination of face attribute information
CN108615010B (en) * 2018-04-24 2022-02-11 重庆邮电大学 Facial expression recognition method based on parallel convolution neural network feature map fusion
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN109978074A (en) * 2019-04-04 2019-07-05 山东财经大学 Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
CN110119689A (en) * 2019-04-18 2019-08-13 五邑大学 A kind of face beauty prediction technique based on multitask transfer learning
CN110197217B (en) * 2019-05-24 2020-12-18 中国矿业大学 Image classification method based on deep interleaving fusion packet convolution network
CN110796239A (en) * 2019-10-30 2020-02-14 福州大学 Deep learning target detection method based on channel and space fusion perception

Also Published As

Publication number Publication date
CN111401294A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401294B (en) Multi-task face attribute classification method and system based on adaptive feature fusion
CN111767979B (en) Training method, image processing method and image processing device for neural network
CN110175671B (en) Neural network construction method, image processing method and device
Gao et al. Global second-order pooling convolutional networks
CN112926641B (en) Three-stage feature fusion rotating machine fault diagnosis method based on multi-mode data
CN112215332B (en) Searching method, image processing method and device for neural network structure
CN108710906B (en) Real-time point cloud model classification method based on lightweight network LightPointNet
CN114255361A (en) Neural network model training method, image processing method and device
CN113298235A (en) Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN112561028A (en) Method for training neural network model, and method and device for data processing
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN114898171B (en) Real-time target detection method suitable for embedded platform
WO2022227024A1 (en) Operational method and apparatus for neural network model and training method and apparatus for neural network model
CN113239949A (en) Data reconstruction method based on 1D packet convolutional neural network
Obeso et al. Introduction of explicit visual saliency in training of deep cnns: Application to architectural styles classification
CN116246110A (en) Image classification method based on improved capsule network
CN115330759B (en) Method and device for calculating distance loss based on Hausdorff distance
CN113516580B (en) Method and device for improving neural network image processing efficiency and NPU
CN114118415B (en) Deep learning method of lightweight bottleneck attention mechanism
CN113688946B (en) Multi-label image recognition method based on spatial correlation
CN112529064B (en) Efficient real-time semantic segmentation method
CN115169548A (en) Tensor-based continuous learning method and device
Huang et al. Algorithm of image classification based on Atrous-CNN
CN112560824A (en) Facial expression recognition method based on multi-feature adaptive fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant