CN115588218A - Face recognition method and device - Google Patents

Face recognition method and device Download PDF

Info

Publication number
CN115588218A
CN115588218A CN202211055844.7A CN202211055844A CN115588218A CN 115588218 A CN115588218 A CN 115588218A CN 202211055844 A CN202211055844 A CN 202211055844A CN 115588218 A CN115588218 A CN 115588218A
Authority
CN
China
Prior art keywords
feature map
attention
processing
convolution
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211055844.7A
Other languages
Chinese (zh)
Inventor
王夏洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Longzhi Digital Technology Service Co Ltd
Original Assignee
Beijing Longzhi Digital Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Longzhi Digital Technology Service Co Ltd filed Critical Beijing Longzhi Digital Technology Service Co Ltd
Priority to CN202211055844.7A priority Critical patent/CN115588218A/en
Priority to PCT/CN2022/129343 priority patent/WO2024045320A1/en
Publication of CN115588218A publication Critical patent/CN115588218A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of computers, and provides a face recognition method and device. The method comprises the following steps: acquiring a first feature map of a face image to be recognized; carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map; performing attention circulation processing on the second feature map to obtain a third feature map; and performing convolution processing of increasing channels, attention circulation processing, convolution processing of reducing channels and attention circulation processing on the third feature diagram in sequence to obtain a target feature diagram corresponding to the first feature diagram.

Description

Face recognition method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a face recognition method and apparatus.
Background
The face technology is often required to be deployed to a cloud end and an edge end in the practical application process, and is limited by the computing power and storage resources of the edge end such as an embedded terminal, and the edge end face recognition model meets the requirements of small model size, low computing complexity, high reasoning speed and the like while meeting the high-precision requirement.
In the related art, common lightweight networks capable of realizing the face recognition task include squeezet, mobileNet, shuffleNet and the like, and due to the particularity of the face structure, the models have poor accuracy on the face recognition task. The mobile terminal lightweight network MobileFaceNet specially designed for the face recognition task adopts smaller expansion rate based on MobileNet, and replaces the global average pooling layer with a global depth-by-depth convolution layer. However, the main building module of MobileFaceNet still adopts a common residual bottleneck module, and the calculation of each module is also the same, so that the problem of poor precision is also caused.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a face recognition method, a face recognition apparatus, an electronic device, and a computer-readable storage medium, so as to solve the problem in the prior art that the accuracy of a face recognition model is not good enough.
In a first aspect of the embodiments of the present disclosure, a face recognition method is provided, where the method includes: acquiring a first feature map of a face image to be recognized; carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map; performing attention transfer processing on the second feature map to obtain a third feature map; and performing convolution processing of increasing channels, attention circulation processing, convolution processing of reducing channels and attention circulation processing on the third feature diagram in sequence to obtain a target feature diagram corresponding to the first feature diagram.
In a second aspect of the embodiments of the present disclosure, a face recognition apparatus is provided, the apparatus including: the acquisition module is used for acquiring a first feature map of a face image to be recognized; the convolution module is used for carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map; the attention circulation module is used for carrying out attention circulation processing on the second feature map to obtain a third feature map; and the mixed processing module is used for sequentially performing convolution processing of increasing channels, attention circulation processing, convolution processing of reducing channels and attention circulation processing on the third feature map to obtain a target feature map corresponding to the first feature map.
In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.
Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: the feature map processing of the face recognition is carried out through the combination of the convolution processing and the attention circulation processing, the circulation of attention in multiple direction dimensions is promoted, the finally obtained feature map has high discrimination on all the direction dimensions, and therefore the recognition accuracy of the face recognition model is improved.
Specifically, the embodiment of the present disclosure provides a lightweight attention circulation module, where the tensor dimension of the attention circulation module is very low, the convolution calculation amount of the low-dimensional tensor is very small, and a relatively fast overall operation speed can be realized. If the whole network is subjected to feature extraction in a low-dimensional space, the incompleteness of information and the unreliability of features are possibly caused, and in the embodiment of the disclosure, the channel number expansion of the set expansion coefficient is performed in the middle convolution processing process, so that the feature extraction capability of the whole module can be improved, and a delicate balance between the calculated amount and the feature expression capability is achieved.
In the embodiment of the disclosure, the whole attention circulation module enables attention flows concerned by a face recognition task to be circulated and converted between space and channels through the combination of operations such as convolution, expansion and compression of channel numbers, attention circulation technologies and the like of different types, the feature fusion is more efficient, the feature map is finally and effectively focused on an interested area of face recognition, and in addition, the attention circulation module has the advantages of small parameter number, small calculated amount and high speed.
Drawings
To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.
FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present disclosure;
fig. 3 is a flow chart of attention flow processing provided by an embodiment of the present disclosure;
fig. 4 is a schematic flow chart of another face recognition method provided in the embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
A face recognition method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include terminal devices 101, 102, and 103, server 104, and network 105.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When terminal devices 101, 102, and 103 are hardware, they may be various electronic devices having a display screen and supporting communication with server 104, including but not limited to smart phones, robots, laptop portable computers, desktop computers, and the like (e.g., 102 may be a robot); when the terminal apparatuses 101, 102, and 103 are software, they can be installed in the electronic apparatus as above. The terminal devices 101, 102, and 103 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, various applications, such as data processing applications, instant messaging tools, social platform software, search-type applications, shopping-type applications, etc., may be installed on the terminal devices 101, 102, and 103.
The server 104 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 104 may be a server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.
The server 104 may be hardware or software. When the server 104 is hardware, it may be various electronic devices that provide various services to the terminal devices 101, 102, and 103. When the server 104 is software, it may be multiple software or software modules providing various services for the terminal devices 101, 102, and 103, or may be a single software or software module providing various services for the terminal devices 101, 102, and 103, which is not limited by the embodiment of the present disclosure.
The network 105 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, bluetooth (Bluetooth), near Field Communication (NFC), infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.
The target user can establish a communication connection with the server 104 via the network 105 through the terminal devices 101, 102, and 103 to receive or transmit information or the like. It should be noted that the specific types, numbers and combinations of the terminal devices 101, 102 and 103, the server 104 and the network 105 may be adjusted according to the actual requirements of the application scenario, and the embodiment of the present disclosure does not limit this.
In the related technology, the computational power and storage resources of edge ends such as embedded terminals are limited, only a small model size can be supported, and the recognition accuracy of a general light-weight human face large model to a human face is not high.
In order to solve the technical problem, the embodiment of the present disclosure provides a face recognition scheme, in which a compact and effective lightweight general model for extracting face features is designed, and a face recognition model with real-time response is specially designed for an edge end and an embedded device, so as to improve the accuracy of face recognition.
Specifically, the technical scheme of the embodiment of the disclosure provides a universal attention circulation technology, which can respectively and effectively capture the attention on the space and the channel, and improve the feature discrimination by a channel-by-channel learnable nonlinear mapping mode, and the whole technology can extract an effective feature combination mode, thereby promoting the circulation of the attention on multiple direction dimensions.
Fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present disclosure. The method provided by the embodiment of the disclosure can be executed by any electronic equipment with computer processing capability, such as a terminal or a server. As shown in fig. 2, the face recognition method includes:
step S201, a first feature map of a face image to be recognized is obtained.
Specifically, the first feature map is a 4-dimensional tensor whose dimensions are (N, C, H, W), where N represents the number of batch images, C represents the number of channels, H represents the height, and W represents the width. The first feature map is obtained by extracting features of the face image to be recognized.
Step S202, carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map.
Specifically, a depth-wise Convolution (DWConv) performs Convolution operation in each independent channel, each Convolution kernel performs one calculation for each channel in the conventional Convolution, and each Convolution kernel performs only one calculation for each channel in the depth-wise Convolution.
In step S203, attention diversion processing is performed on the second feature map to obtain a third feature map.
In particular, the attention-flow process may cause attention to flow between space and channels for more efficient feature fusion.
And step S204, performing convolution processing of the increasing channel, attention circulation processing, convolution processing of the decreasing channel and attention circulation processing on the third feature map in sequence to obtain a target feature map corresponding to the first feature map.
Specifically, the convolution processing for increasing the channels and the convolution processing for decreasing the channels are two corresponding conventional convolution calculation processes, the convolution processing for increasing the channels is performed first to increase the number of the channels, and then the convolution processing for decreasing the channels is performed to restore the number of the channels to the previous number.
According to the technical scheme of the embodiment of the disclosure, through attention circulation processing, an effective feature combination mode can be extracted, and attention circulation in multiple direction dimensions is promoted. Through the design and combination of the attention circulation processing technology and different types of convolution, the requirements of a face recognition task and the lightweight class requirements of embedded equipment can be met simultaneously, and compared with the prior art, the method can realize higher recognition accuracy by using less parameter quantity.
As shown in fig. 3, the attention flow processing in step S203 and step S204 includes the steps of:
step S301, a first dimension and a second dimension of the input feature map are flattened to obtain a first intermediate feature map.
In particular, the first dimension may be a height and the second dimension may be a width. Assume that the input feature map is f 1 A 1 is to f 1 Is flat (flatten), i.e. the dimension (N, C, H, W) can be transformed into (N, C, R), where R = H × W.
Step S302, a second intermediate feature map is obtained according to the first intermediate feature map and the first learnable parameter matrix.
In the technical solution of the embodiment of the present disclosure, a first product of the first intermediate feature map and a function value of the logistic regression function softmax thereof may be obtained, and then a second intermediate feature map may be obtained according to a mean value of the first product. Specifically, the first intermediate eigen map may be right-multiplied by the first learnable parameter matrix to obtain a tensor, a hadamard product of the softmax function value of the tensor and the tensor is further calculated to obtain a matrix, and the matrix is averaged in a certain dimension to obtain the second intermediate eigen map. The first learnable parameter matrix may learn attention flow information in a spatial dimension.
Step S303, acquiring a spatial attention feature map according to the product of the second intermediate feature map and the input feature map.
Specifically, the spatial attention feature map is a feature map with fused spatial attention.
Step S304, a channel attention feature map is obtained according to the second learnable parameter matrix, the third learnable parameter matrix and the spatial attention feature map, wherein a first dimension of the second learnable parameter matrix is equal to a second dimension of the third learnable parameter matrix, and the first dimension of the third learnable parameter matrix is equal to the second dimension of the second learnable parameter matrix.
Specifically, the spatial attention feature map may be right-multiplied by the second learnable parameter matrix to obtain a second product; and carrying out sparsification processing on the second product, and right-multiplying the third learnable parameter matrix to obtain a channel attention feature map. The second learnable parameter matrix and the third learnable parameter matrix can learn attention circulation information on channel dimensions, and the weight of each channel is learned by capturing feature relations among different channels, so that features are more discriminative for information of each channel.
In step S305, an attention flow feature map is obtained according to the spatial attention feature map and the channel attention feature map.
Specifically, when the attention flow feature map is acquired according to the spatial attention feature map and the channel attention feature map, the spatial attention feature map may be subjected to nonlinear mapping processing to obtain a third intermediate feature map; obtaining a fourth intermediate feature map according to the product of the third intermediate feature map and the channel attention feature map; and carrying out nonlinear mapping processing on the fourth intermediate characteristic diagram to obtain an attention circulation characteristic diagram. According to the attention circulation feature map obtained from the space attention feature map and the channel attention feature map, attention circulation information in the space dimension and the channel dimension can be learned, and therefore accuracy of attention circulation in the space dimension and the channel dimension can be enhanced.
The following is a detailed description of steps S301 to S305:
in step S301, assume that the input feature map is f 1 Dimension is (N, C, H, W), respectively 1 The two dimensions H and W of (f) are flattened (flat), and dimension transformation is performed to (N, C, R), so that a second intermediate feature map can be obtained, wherein R = H × W.
To learn attention in the dimension of the feature H x W, such that attention flows in the spatial dimension, in an embodiment of the present disclosure, a first learnable parameter matrix Q is introduced 1 And the dimension is (R, R) (R < R).
In step S302, the first intermediate feature map obtained after dimension transformation is right-multiplied by Q 1 To obtain a tensor f 'with dimension (N, C, r)' 1 F to' 1 Performing softmax operation in the dimension of r to obtain tensor A with the same dimension of (N, C, r) s F 'are' 1 And A s Multiplying corresponding elements in the dimension r to obtain f' 1 And A s The Hadamard product (Hadamard product) of (C, r) to obtain a matrix M of size (N, C, r) 1 ,M 1 Representing a fusion of various combinations of features, the greater r, the higher the complexity. Will M 1 Averaging (avg) according to the dimension r, compressing the dimension to 1 to obtain a second intermediate feature map
Figure BDA0003824872250000081
The dimensionality is (N, C), and the specific calculation process is shown as the following formula (1):
Figure BDA0003824872250000082
in the disclosed embodiment, a first learnable parameter matrix Q is introduced 1 In order to calculate and obtain r space linear transformation results, representative feature combination modes in the space can be extracted. In the extracted face feature mapAlthough each spatial pixel has the same receptive field, the regions of the receptive fields mapped to the original image are different, and the final recognition task is contributed differently, so different weights should be given to different pixels. Using a first learnable parameter matrix Q 1 Attention in the dimension H x W of the features can be learned, so that attention can be circulated in the spatial dimension to obtain a fusion result of various feature combinations.
In step S303, the second intermediate feature map output in step S301 is processed
Figure BDA0003824872250000083
And f 1 Multiplying to obtain space attention feature map
Figure BDA0003824872250000084
The dimensionality is (N, C, H, W), and the specific calculation process is shown in the following formula (2):
Figure BDA0003824872250000085
wherein,
Figure BDA0003824872250000086
namely a feature map with fused spatial attention.
In step S304, a spatial attention feature map with dimensions (N, C, H, W) is introduced into a second learnable parameter matrix Q 2 And a third learnable parameter matrix Q 3 Processing to obtain a channel attention feature map
Figure BDA0003824872250000087
Specifically, the second learnable parameter matrix Q 2 Has a dimension of (C, C// p), and a third learnable parameter matrix Q 3 Has a dimension of (C// p, C), wherein C is a natural number. It can be seen that the first dimension of the second learnable parameter matrix is equal to the second dimension of the third learnable parameter matrix, the first dimension of the third learnable parameter matrix is equal to the second learnable parameter momentThe second dimension of the array. Will be provided with
Figure BDA0003824872250000088
Right multiplying Q 2 Obtaining the dimension (N, C// p), thinning by relu, and right-multiplying by Q 3 Obtaining the channel attention feature map
Figure BDA0003824872250000089
Its dimension is (N, C).
The specific calculation process is shown in the following formula (3):
Figure BDA00038248722500000810
in step S305, a second learnable parameter matrix Q is introduced to the output in step S304 2 And a third learnable parameter matrix Q 3 Attention circulation information on channel dimensions can be learned, the characteristic relationship among channels is more concerned in the design of the channel, and the weight of each channel is learned by capturing the characteristic relationship among different channels, so that the characteristics are more discriminative for each channel information. p represents a scaling coefficient, and the design parameter p can reduce the calculation amount and control the size of the model.
Road attention feature map
Figure BDA0003824872250000091
Performing nonlinear mapping to obtain a third intermediate feature fs, wherein the specific calculation process is shown in the following formulas (4) and (5):
Figure BDA0003824872250000092
Figure BDA0003824872250000093
wherein,
Figure BDA0003824872250000094
i represents the ith channel, i.e. to feature map f' 1 The channel-by-channel nonlinear mapping is carried out, the nonlinear mapping function of each channel can be different, and the mapping parameter epsilon of each channel is i And k i Need to be learned.
In the process of data processing by adopting a nonlinear mapping mode, for negative value input, compared with the operation that the input with the relu direct value of 0 or less than 0 is mapped into 0 for output, positive and negative responses of a convolution kernel can be considered to be accepted, namely, the human face needs to learn negative value input. Applying such a non-linear mapping approach can learn more complex relationships in the data. Secondly, it is beneficial to learn the mapping values depth by depth, i.e. to perform channel-independent weight learning, which can be regarded as an attention learning mode among different channels, and enhances the accuracy of attention circulation among channels. In addition, for the channel-by-channel mapping mode, the nonlinear mapping gradually becomes more "nonlinear" in the process of depth deepening, namely, the model tends to keep information in a shallow network, and discriminative power is enhanced in a deep network, namely, the low-layer feature map is generally considered to have high resolution, the semantic information is weak, but the spatial information is rich, and the high-layer feature map has low resolution but the semantic information is strong.
Further, f is s And
Figure BDA0003824872250000095
multiplying to obtain a fourth intermediate characteristic diagram f c The dimension is (N, C, H, W), and the specific calculation process is shown in the following formula (6):
Figure BDA0003824872250000096
to further enhance the expressive power of the features, a fourth intermediate feature map f is applied c Carrying out nonlinear mapping to obtain an attention circulation characteristic diagram f C The specific calculation process is shown in the following formulas (7) and (8):
Figure BDA0003824872250000101
Figure BDA0003824872250000102
wherein,
Figure BDA0003824872250000103
f c feature maps representing that attention is sufficiently circulated in both the spatial and channel directions until the attention flow of interest spans the entire feature space.
From the above, f c Has the dimension of (N, C, H, W) and the input feature map f 1 The dimension is kept consistent, so the attention circulation technology can be used as a plug-and-play module to be inserted into any module and any position of the neural network, and the use mode is flexible. The attention circulation technology is mainly used for carrying out more effective feature fusion through circulation of attention between a space and a channel, and enhancing feature expression capacity through a positive response and a negative response respectively channel by channel learning nonlinear mapping mode, so that more discriminative human face features can be extracted. If we define this attention-flow technique as an SC function, the input is f 1 Output is f C Then the following attention flow equation (9) can be obtained:
f C =SC(f 1 ) (9)
in the disclosed embodiment, an attention-flow module may be formed according to the attention-flow technique as a basic constituent module of a neural network. The module can realize the function of extracting the strong discriminant face features by adopting the least calculation amount through carrying out the refined convolution module design aiming at the structural particularity of the face, and effectively focuses the attention of the feature map on the region which is favorable for the recognition task.
When the attention circulation module is applied in step S201 to step S204, the implementation process of step S201 to step S204 may be detailed as follows:
in step S202, a first profile may be performedAnd carrying out depth-by-depth convolution processing, and carrying out batch normalization processing on the depth-by-depth convolution results to obtain a second characteristic diagram. Specifically, a depth-by-depth convolution calculation (DWConv) may be performed with a convolution kernel of n × n (n > 1), the number of input channels of C, the number of output channels of C, a padding (padding) of 1, and a step length (stride) of s, and then a batch normalization (BatchNorm, BN for short) may be performed to obtain a result f' 1 Taking n =3 as an example, the specific calculation process is shown in the following formula (10):
f’ 1 =BN(DWConv(f 1 ,3×3)) (10)
the step size varies according to the network design, and is a configurable hyper-parameter. In the embodiment of the disclosure, based on the idea of designing a small-sized module, the parameter amount is reduced by adopting depth-wise convolution instead of ordinary convolution, and the parameter amount of the depth-wise convolution can be calculated to be 1/C of the ordinary convolution. It should be noted that the 3 × 3 convolution may be replaced with a convolution kernel of 5 × 5 or 7 × 7, but the 3 × 3 convolution is most cost effective.
In step S203, the output f 'of step S202 is converted' 1 The above attention circulation calculation is carried out to obtain
Figure BDA0003824872250000111
The specific calculation process is shown in the following formula (11):
Figure BDA0003824872250000112
in step S204, the convolution processing of adding channels includes: performing convolution processing for increasing the channel by N times on the input characteristic diagram, and performing batch normalization processing on the convolution result, wherein N is a natural number; the reduced channel convolution process includes: and carrying out convolution processing with the channels reduced to 1/N on the input feature map, and carrying out batch normalization processing on the convolution result. Specifically, in step S204, the following steps may be performed in order:
output of step S202
Figure BDA0003824872250000113
Performing convolution calculation (Conv) with convolution kernel of 1 × 1, input channel number of C, output channel number of C × extension and step length of 1, then performing batch normalization, and calculating to obtain result f 2 The specific calculation process is shown in the following formula (12):
Figure BDA0003824872250000114
will f is 2 The above attention circulation calculation is carried out to obtain
Figure BDA0003824872250000115
The specific calculation process is shown in the following formula (13):
Figure BDA0003824872250000116
will be provided with
Figure BDA0003824872250000117
Performing convolution calculation with convolution kernel of 1 × 1, input channel number of C × extension, output channel number of C and step length of 1, then performing batch normalization, and calculating to obtain result f 3 The specific calculation process is shown in the following formula (14):
Figure BDA0003824872250000118
will f is mixed 3 The above attention circulation calculation is carried out to obtain
Figure BDA0003824872250000119
The specific calculation process is shown in the following formula (15):
Figure BDA0003824872250000121
the embodiment of the disclosure provides a lightweight attention diversion module, which is designed for face recognition technology in a refined manner, wherein technologies such as convolution design, linear and nonlinear mapping and the like all follow two principles, and firstly, network parameters are reduced, calculation amount is saved, and operation speed is improved; secondly, more effective feature fusion is carried out on the space dimension and the channel dimension, the feature expression capability is enhanced, and the human face features with more discriminative performance are extracted.
The number of basic channels of the attention circulation module in the embodiment of the present disclosure can be designed to be 64, the tensor dimension of the module is very low, the convolution calculation amount of the low-dimension tensor is very small, and a relatively high overall operation speed can be realized. If the whole network is subjected to feature extraction in a low-dimensional space, the incompleteness of information and the unreliability of features are possibly caused, and in the embodiment of the disclosure, the channel number expansion of the set expansion coefficient is performed in the middle convolution processing process, so that the feature extraction capability of the whole module can be improved, and a delicate balance between the calculated amount and the feature expression capability is achieved.
In the embodiment of the disclosure, the whole attention circulation module enables attention flows concerned by a face recognition task to be circulated and converted between space and channels through the combination of operations such as convolution, expansion and compression of channel numbers, attention circulation technologies and the like of different types, the feature fusion is more efficient, the feature map is finally and effectively focused on an interested area of face recognition, and in addition, the attention circulation module has the advantages of small parameter number, small calculated amount and high speed.
As shown in fig. 4, a face recognition method provided in the embodiments of the present disclosure includes the following steps:
step S401, inputting the face image to be recognized into the convolution layer and the normalization layer with convolution kernel of 3 x 3, channel number of 64 and step length of 1. In one embodiment, the resolution of the face image to be recognized is (1,3, 112, 112). The resolution of the feature map output in step S401 is (1, 64, 112, 112).
Step S402, inputting the feature map obtained in the previous step into an attention diversion module with 1 basic channel number of 64, an expansion coefficient of 1 and a configurable step length of 2. The resolution of the feature map output in step S402 is (1, 64, 56, 56).
Step S403, inputting the feature map obtained in the previous step into an attention circulation module with 1 basic channel number of 64, an expansion coefficient of 1, and a configurable step size of 1. The resolution of the feature map output in step S403 is (1, 64, 56, 56).
Step S404, inputting the feature map obtained in the previous step into 1 attention diversion module with a basic channel number of 64, an expansion coefficient of 2, and a configurable step size of 2. The resolution of the feature map output in step S404 is (1, 64, 28, 28).
Step S405, inputting the feature map obtained in the previous step into an attention circulation module with 4 basic channels, the number of which is 64, the expansion coefficient of which is 2, and the configurable step size of which is 1. The resolution of the feature map output in step S405 is (1, 64, 28, 28).
Step S406, inputting the feature map obtained in the previous step into 1 attention diversion module with a base channel number of 128, an expansion coefficient of 2, and a configurable step size of 2. The resolution of the feature map output in step S406 is (1, 128, 14, 14).
Step S407, the feature map obtained in the previous step is input into the attention circulation module with 6 basic channels, the number of which is 128, the expansion coefficient of which is 2, and the configurable step size of which is 1. The resolution of the feature map output in step S407 is (1, 128, 14, 14).
Step S408, inputting the feature map obtained in the previous step into an attention diversion module with 1 basic channel number of 128, an expansion coefficient of 2, and a configurable step size of 2. The resolution of the feature map output in step S408 is (1, 128,7,7).
Step S409, inputting the feature map obtained in the previous step into the attention circulation module with 2 basic channels, the number of which is 128, the expansion coefficient of which is 2, and the configurable step size of which is 1. The resolution of the feature map output in step S409 is (1, 128,7,7).
Step S410, the feature map obtained in the previous step is input into the convolution layer and normalization layer with convolution kernel of 1 × 1 and channel number of 512. The resolution of the feature map output in step S410 is (1, 512,7,7).
In step S411, the feature map obtained in the previous step is input into the convolution layer and normalization layer with convolution kernel of 7 × 7 and channel number of 512. The resolution of the feature map output in step S411 is (1, 512,1,1).
And step S412, after the characteristic diagram obtained in the previous step is subjected to flattening processing, performing (512 ) full-connection matrix calculation to obtain 512-dimensional vectors as a target characteristic diagram.
In the face recognition method shown in fig. 4, step S402 and step S403 may be regarded as one stage, step S404 and step S405 may be regarded as one stage, step S406 and step S407 may be regarded as one stage, step S408 and step S409 may be regarded as one stage, and the number of the attention circulation modules included in each stage is (2,5,7,3), respectively, but the combination manner of the attention circulation modules is merely an exemplary description, and the technical effect of the technical solution of the embodiment of the present disclosure can be achieved by other combination manners of the attention circulation modules.
The technical scheme of the embodiment of the disclosure provides a universal attention circulation technology, which can respectively and effectively capture the attention on the space and the channel, and improve the feature discrimination by a channel-by-channel learnable nonlinear mapping mode, and the whole technology can extract an effective feature combination mode, thereby promoting the circulation of the attention on dimensions in multiple directions.
According to the face recognition method disclosed by the embodiment of the disclosure, the feature diagram processing of face recognition is performed through the combination of convolution processing and attention circulation processing, the circulation of attention in multiple direction dimensions is promoted, so that the finally obtained feature diagram has higher discrimination for each direction dimension, and the recognition accuracy of a face recognition model is improved.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. The face recognition apparatus described below and the face recognition method described above may be referred to in correspondence with each other. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 5 is a schematic diagram of a face recognition apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the face recognition apparatus includes:
the obtaining module 501 may be configured to obtain a first feature map of a face image to be recognized.
Specifically, the first feature map is a 4-dimensional tensor having dimensions (N, C, H, W), where N represents the number of batch images, C represents the number of channels, H represents the height, and W represents the width. The first feature map is obtained by extracting features of the face image to be recognized.
The convolution module 502 may be configured to perform depth-by-depth convolution processing on the first feature map to obtain a second feature map.
Specifically, the depth-by-depth convolution performs convolution operations in each individual channel, with each convolution kernel performing a computation on each channel in conventional convolution, and with each convolution kernel performing a computation on only one channel in depth-by-depth convolution.
The attention circulation module 503 may be configured to perform attention circulation processing on the second feature map to obtain a third feature map.
In particular, the attention-flow process may cause attention to flow between space and channels for more efficient feature fusion.
The hybrid processing module 504 may be configured to perform convolution processing of increasing channels, attention circulation processing, convolution processing of decreasing channels, and attention circulation processing on the third feature map in sequence to obtain a target feature map corresponding to the first feature map.
Specifically, the convolution processing for increasing the channels and the convolution processing for decreasing the channels are two corresponding conventional convolution calculation processes, the convolution processing for increasing the channels is performed first to increase the number of the channels, and then the convolution processing for decreasing the channels is performed to restore the number of the channels to the previous number.
According to the technical scheme of the embodiment of the disclosure, through attention circulation processing, an effective feature combination mode can be extracted, and circulation of attention in multiple direction dimensions is promoted. Through the design and combination of the attention circulation processing technology and different types of convolution, the requirements of a face recognition task and the lightweight class requirements of embedded equipment can be met simultaneously, and compared with the prior art, the method can realize higher recognition accuracy by using less parameter quantity.
In this embodiment of the present disclosure, the attention diversion module 503 may be further configured to perform flattening processing on a first dimension and a second dimension of the input feature map to obtain a first intermediate feature map; acquiring a second intermediate characteristic diagram according to the first intermediate characteristic diagram and the first learnable parameter matrix; acquiring a spatial attention feature map according to the product of the second intermediate feature map and the input feature map; acquiring a channel attention feature map according to a second learnable parameter matrix, a third learnable parameter matrix and a space attention feature map, wherein a first dimension of the second learnable parameter matrix is equal to a second dimension of the third learnable parameter matrix, and the first dimension of the third learnable parameter matrix is equal to the second dimension of the second learnable parameter matrix; and acquiring an attention circulation characteristic diagram according to the spatial attention characteristic diagram and the channel attention characteristic diagram.
In the technical solution of the embodiment of the present disclosure, a first product of the first intermediate feature map and a function value of the logistic regression function softmax thereof may be obtained, and then a second intermediate feature map may be obtained according to a mean value of the first product. Specifically, the first intermediate eigen map may be right-multiplied by the first learnable parameter matrix to obtain a tensor, a hadamard product of the softmax function value of the tensor and the tensor is further calculated to obtain a matrix, and the matrix is averaged in a certain dimension to obtain the second intermediate eigen map.
Specifically, the spatial attention feature map is a feature map with fused spatial attention. The first learnable parameter matrix may learn attention flow information in a spatial dimension. The second learnable parameter matrix and the third learnable parameter matrix can learn attention circulation information on channel dimensions, and the weight of each channel is learned by capturing feature relations among different channels, so that features are more discriminative for information of each channel. According to the attention circulation feature map obtained from the space attention feature map and the channel attention feature map, attention circulation information in the space dimension and the channel dimension can be learned, and therefore accuracy of attention circulation in the space dimension and the channel dimension can be enhanced.
In this embodiment of the present disclosure, the attention diverting module 503 may be further configured to perform a nonlinear mapping process on the spatial attention feature map to obtain a third intermediate feature map; obtaining a fourth intermediate feature map according to the product of the third intermediate feature map and the channel attention feature map; and carrying out nonlinear mapping processing on the fourth intermediate characteristic diagram to obtain an attention circulation characteristic diagram.
In the disclosed embodiment, the application of such a non-linear mapping approach can learn more complex relationships in the data. It is beneficial to learn the mapping values depth by depth, i.e. to perform channel-independent weight learning, which can be regarded as a way of learning attention between different channels, and enhances the accuracy of attention flow between channels. In addition, for the channel-by-channel mapping mode, the nonlinear mapping gradually becomes more "nonlinear" in the process of depth deepening, namely, the model tends to keep information in a shallow network, and discriminative power is enhanced in a deep network, namely, the low-layer feature map is generally considered to have high resolution, the semantic information is weak, but the spatial information is rich, and the high-layer feature map has low resolution but the semantic information is strong.
In the embodiment of the present disclosure, the attention diversion module 503 may be further configured to obtain a first product of the first intermediate feature map and a logistic regression function value thereof; and acquiring a second intermediate characteristic diagram according to the mean value of the first product.
In this embodiment of the present disclosure, the attention diversion module 503 may be further configured to right-multiply the spatial attention feature map by the second learnable parameter matrix to obtain a second product; and carrying out sparsification processing on the second product, and right-multiplying the third learnable parameter matrix to obtain a channel attention feature map.
In the disclosed embodiment, a first learnable parameter matrix Q is introduced 1 In order to calculate and obtain r space linear transformation results, representative feature combination modes in the space can be extracted. In the extracted face feature map, although each spatial pixel point has the same receptive field, the regions of the receptive fields mapped to the original image are different, and the final recognition task is contributed differently, so different weights should be given to different pixel points. Using the first learnableParameter matrix Q 1 Attention in the dimension H x W of the features can be learned, so that attention can be circulated in the spatial dimension to obtain a fusion result of various feature combinations. Introducing a second learnable parameter matrix Q 2 And a third learnable parameter matrix Q 3 Attention circulation information on channel dimensions can be learned, the characteristic relationship among channels is more concerned in the design of the channel, and the weight of each channel is learned by capturing the characteristic relationship among different channels, so that the characteristics are more discriminative for each channel information.
In this embodiment of the disclosure, the hybrid processing module 504 may be further configured to add convolution processing of channels including: performing convolution processing for increasing the channel by N times on the input characteristic diagram, and performing batch normalization processing on the convolution result, wherein N is a natural number; the reduced channel convolution process includes: and carrying out convolution processing with the channels reduced to 1/N on the input feature map, and carrying out batch normalization processing on the convolution result.
In this embodiment of the present disclosure, the convolution module 502 may be further configured to perform depth-by-depth convolution processing on the first feature map, and perform batch normalization processing on the depth-by-depth convolution result to obtain a second feature map.
The embodiment of the disclosure provides a lightweight attention diversion module, which performs fine design for a face recognition technology, wherein technologies such as convolution design, linear and nonlinear mapping and the like all follow two principles, and firstly, network parameters are reduced, the calculation amount is saved, and the operation speed is improved; and secondly, more effective feature fusion is carried out on the space dimension and the channel dimension, the feature expression capability is enhanced, and more discriminative human face features are extracted.
The number of basic channels of the attention circulation module in the embodiment of the present disclosure can be designed to be 64, the tensor dimension of the module is very low, the convolution calculation amount of the low-dimension tensor is very small, and a relatively high overall operation speed can be realized. If the whole network is subjected to feature extraction in a low-dimensional space, the incompleteness of information and the unreliability of features are possibly caused, and in the embodiment of the disclosure, the channel number expansion of the set expansion coefficient is performed in the middle convolution processing process, so that the feature extraction capability of the whole module can be improved, and a delicate balance between the calculated amount and the feature expression capability is achieved.
In the embodiment of the disclosure, the whole attention circulation module makes attention flow concerned by a face recognition task circulate and transform between space and channels through the combination of different types of convolution, expansion and compression of channel numbers, attention circulation technologies and other operations, the feature fusion is more efficient, the feature map is finally and effectively focused on the region interested by face recognition, and in addition, the attention circulation module has the advantages of small parameter number, small calculated amount and high speed.
The technical scheme of the embodiment of the disclosure provides a universal attention circulation technology, which can respectively and effectively capture the attention on the space and the channel, improve the feature discrimination through a channel-by-channel learnable nonlinear mapping mode, and can extract an effective feature combination mode to promote the circulation of the attention on multiple direction dimensions.
As each functional module of the face recognition apparatus in the exemplary embodiment of the present disclosure corresponds to the step of the exemplary embodiment of the face recognition method, please refer to the embodiment of the face recognition method in the present disclosure for details that are not disclosed in the embodiment of the apparatus in the present disclosure.
According to the face recognition device disclosed by the embodiment of the disclosure, the feature map processing of face recognition is performed through the combination of convolution processing and attention circulation processing, the circulation of attention in multiple direction dimensions is promoted, so that the finally obtained feature map has high discrimination for each direction dimension, and the recognition accuracy of a face recognition model is improved.
Fig. 6 is a schematic diagram of an electronic device 6 provided by an embodiment of the present disclosure. As shown in fig. 6, the electronic apparatus 6 of this embodiment includes: a processor 601, a memory 602, and a computer program 603 stored in the memory 602 and executable on the processor 601. The steps in the various method embodiments described above are implemented when the processor 601 executes the computer program 603. Alternatively, the processor 601 implements the functions of the respective modules in the above-described respective apparatus embodiments when executing the computer program 603.
The electronic device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 6 may include, but is not limited to, a processor 601 and a memory 602. Those skilled in the art will appreciate that fig. 6 is merely an example of an electronic device 6, and does not constitute a limitation of the electronic device 6, and may include more or less components than those shown, or different components.
The Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like.
The storage 602 may be an internal storage unit of the electronic device 6, for example, a hard disk or a memory of the electronic device 6. The memory 602 may also be an external storage device of the electronic device 6, for example, a plug-in hard disk provided on the electronic device 6, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. The memory 602 may also include both internal and external storage units of the electronic device 6. The memory 602 is used for storing computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit.
The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims (10)

1. A face recognition method, comprising:
acquiring a first feature map of a face image to be recognized;
carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map;
performing attention circulation processing on the second feature map to obtain a third feature map;
and sequentially performing convolution processing of increasing channels, attention circulation processing, convolution processing of reducing channels and attention circulation processing on the third feature diagram to obtain a target feature diagram corresponding to the first feature diagram.
2. The method of claim 1, wherein the attention diversion process comprises:
leveling a first dimension and a second dimension of the input feature map to obtain a first intermediate feature map;
acquiring a second intermediate characteristic diagram according to the first intermediate characteristic diagram and the first learnable parameter matrix;
acquiring a spatial attention feature map according to the product of the second intermediate feature map and the input feature map;
acquiring a channel attention feature map according to a second learnable parameter matrix, a third learnable parameter matrix and the spatial attention feature map, wherein a first dimension of the second learnable parameter matrix is equal to a second dimension of the third learnable parameter matrix, and the first dimension of the third learnable parameter matrix is equal to the second dimension of the second learnable parameter matrix;
and acquiring an attention circulation characteristic diagram according to the space attention characteristic diagram and the channel attention characteristic diagram.
3. The method of claim 2, wherein obtaining an attention flow profile from the spatial attention profile and the channel attention profile comprises:
carrying out nonlinear mapping processing on the spatial attention feature map to obtain a third intermediate feature map;
obtaining a fourth intermediate feature map according to the product of the third intermediate feature map and the channel attention feature map;
and carrying out the nonlinear mapping processing on the fourth intermediate characteristic diagram to obtain the attention flow circulation characteristic diagram.
4. The method of claim 2, wherein obtaining a second intermediate feature map from the first intermediate feature map and a first learnable parameter matrix comprises:
obtaining a first product of the first intermediate feature map and a logistic regression function value thereof;
and acquiring the second intermediate characteristic diagram according to the mean value of the first product.
5. The method of claim 2, wherein obtaining a channel attention feature map from a second learnable parameter matrix, a third learnable parameter matrix, and the spatial attention feature map comprises:
multiplying the spatial attention feature map by the second learnable parameter matrix to obtain a second product;
and carrying out sparsification processing on the second product, and right-multiplying the third learnable parameter matrix to obtain the channel attention feature map.
6. The method of claim 1,
the convolution processing of the added channel comprises: performing convolution processing for increasing the channel by N times on the input characteristic diagram, and performing batch normalization processing on the convolution result, wherein N is a natural number;
the reduced channel convolution processing includes: and (4) carrying out convolution processing with the channel reduced to 1/N on the input feature map, and carrying out batch normalization processing on the convolution result.
7. The method of claim 6, wherein obtaining a second feature map according to the depth-wise convolution processing on the first feature map comprises:
and carrying out depth-by-depth convolution processing on the first feature map, and carrying out batch normalization processing on depth-by-depth convolution results to obtain the second feature map.
8. An apparatus for face recognition, the apparatus comprising:
the acquisition module is used for acquiring a first feature map of a face image to be recognized;
the convolution module is used for carrying out depth-by-depth convolution processing on the first feature map to obtain a second feature map;
the attention circulation module is used for carrying out attention circulation processing on the second feature map to obtain a third feature map;
and the mixed processing module is used for sequentially performing convolution processing of increasing channels, attention circulation processing, convolution processing of reducing channels and attention circulation processing on the third feature map to obtain a target feature map corresponding to the first feature map.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202211055844.7A 2022-08-31 2022-08-31 Face recognition method and device Pending CN115588218A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211055844.7A CN115588218A (en) 2022-08-31 2022-08-31 Face recognition method and device
PCT/CN2022/129343 WO2024045320A1 (en) 2022-08-31 2022-11-02 Facial recognition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211055844.7A CN115588218A (en) 2022-08-31 2022-08-31 Face recognition method and device

Publications (1)

Publication Number Publication Date
CN115588218A true CN115588218A (en) 2023-01-10

Family

ID=84772610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211055844.7A Pending CN115588218A (en) 2022-08-31 2022-08-31 Face recognition method and device

Country Status (2)

Country Link
CN (1) CN115588218A (en)
WO (1) WO2024045320A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894058A (en) * 2024-03-14 2024-04-16 山东远桥信息科技有限公司 Smart city camera face recognition method based on attention enhancement

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679085B2 (en) * 2017-10-31 2020-06-09 University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN111582044B (en) * 2020-04-15 2023-06-20 华南理工大学 Face recognition method based on convolutional neural network and attention model
CN112766279B (en) * 2020-12-31 2023-04-07 中国船舶重工集团公司第七0九研究所 Image feature extraction method based on combined attention mechanism
CN114782403A (en) * 2022-05-17 2022-07-22 河南大学 Pneumonia image detection method and device based on mixed space and inter-channel attention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894058A (en) * 2024-03-14 2024-04-16 山东远桥信息科技有限公司 Smart city camera face recognition method based on attention enhancement
CN117894058B (en) * 2024-03-14 2024-05-24 山东远桥信息科技有限公司 Smart city camera face recognition method based on attention enhancement

Also Published As

Publication number Publication date
WO2024045320A1 (en) 2024-03-07

Similar Documents

Publication Publication Date Title
CN110263909B (en) Image recognition method and device
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
CN109214543B (en) Data processing method and device
CN111539353A (en) Image scene recognition method and device, computer equipment and storage medium
CN110490295B (en) Data processing method and processing device
CN115588218A (en) Face recognition method and device
CN114692085B (en) Feature extraction method and device, storage medium and electronic equipment
CN113033580A (en) Image processing method, image processing device, storage medium and electronic equipment
CN110717405B (en) Face feature point positioning method, device, medium and electronic equipment
CN114612531B (en) Image processing method and device, electronic equipment and storage medium
US20230281956A1 (en) Method for generating objective function, apparatus, electronic device and computer readable medium
CN113284206A (en) Information acquisition method and device, computer readable storage medium and electronic equipment
CN117373064A (en) Human body posture estimation method based on self-adaptive cross-dimension weighting, computer equipment and storage medium
CN112862095A (en) Self-distillation learning method and device based on characteristic analysis and readable storage medium
CN112330671A (en) Method and device for analyzing cell distribution state, computer equipment and storage medium
CN114700957B (en) Robot control method and device with low computational power requirement of model
CN115953803A (en) Training method and device for human body recognition model
CN115965520A (en) Special effect prop, special effect image generation method, device, equipment and storage medium
CN113139490B (en) Image feature matching method and device, computer equipment and storage medium
US20210224632A1 (en) Methods, devices, chips, electronic apparatuses, and storage media for processing data
CN116310615A (en) Image processing method, device, equipment and medium
CN116912631B (en) Target identification method, device, electronic equipment and storage medium
CN115630182A (en) Method and device for determining representative pictures in picture set
CN116910566B (en) Target recognition model training method and device
CN114708625A (en) Face recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination