CN116563909A - Human face recognition method of visual semantic interaction module based on fusion attention mechanism - Google Patents

Human face recognition method of visual semantic interaction module based on fusion attention mechanism Download PDF

Info

Publication number
CN116563909A
CN116563909A CN202310243882.3A CN202310243882A CN116563909A CN 116563909 A CN116563909 A CN 116563909A CN 202310243882 A CN202310243882 A CN 202310243882A CN 116563909 A CN116563909 A CN 116563909A
Authority
CN
China
Prior art keywords
face recognition
network model
visual
module
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310243882.3A
Other languages
Chinese (zh)
Inventor
庞志刚
王波
杨巨成
王伟
国英龙
陈燕
贾智洋
孙笑
徐振宇
魏峰
赵婷婷
王嫄
潘旭冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baotou Yihui Information Technology Co ltd
Original Assignee
Baotou Yihui Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baotou Yihui Information Technology Co ltd filed Critical Baotou Yihui Information Technology Co ltd
Priority to CN202310243882.3A priority Critical patent/CN116563909A/en
Publication of CN116563909A publication Critical patent/CN116563909A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a face recognition method of a visual semantic interaction module based on a fusion attention mechanism, which comprises the following steps: s1: acquiring a face data set containing text description; s2: a visual semantic interaction module integrating an attention mechanism is adopted to construct a face recognition network model based on knowledge guidance; s3: initializing a face recognition network model established in the step S2, selecting an optimizer, and setting parameters of network training; s4: optimizing a face recognition network model by using the loss function and storing; s5: and loading an optimal face recognition network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result. The human face recognition method based on the visual semantic interaction module with the attention fusion mechanism extracts the visual features and the text features respectively, better extracts visual knowledge through visual semantic interaction, and improves the accuracy of human face recognition.

Description

Human face recognition method of visual semantic interaction module based on fusion attention mechanism
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a human face recognition method of a visual semantic interaction module based on a fusion attention mechanism.
Background
In the present informatization age, how to accurately identify a person and protect information security becomes a key social problem which must be solved. Traditional identity authentication is extremely easy to forge and lose, so that social requirements are more and more difficult to meet, and the most convenient and safe solution is definitely a biological characteristic image recognition technology at present. Compared with other biological feature recognition technologies such as fingerprints and irises, the face recognition technology has the advantages of intuitiveness, non-contact property, convenience in acquisition, strong interactivity and expandability, and becomes a very popular research field.
At present, the face recognition research based on deep learning is popular, and the process mainly comprises the steps of face preprocessing, feature learning and feature comparison. The feature learning is a key of face recognition, and the accuracy of face recognition mainly depends on the adopted network architecture and the loss function. The current mainstream network architecture is provided with a model VGGNet, ***Net, resNet and the like, and in addition, the selection of a proper loss function is favorable for distinguishing face images of different categories in a feature space, so that the face recognition accuracy is improved. Therefore, many scholars have developed a series of researches on the loss function of face recognition, for example, arcFace loss proposed by Deng et al in 2019, groupFace loss proposed by Yonghyun Kim et al in 2020, magFace loss proposed by Meng et al in 2021, and the like, so that the performance of face recognition is significantly improved.
Although deep learning-based face recognition has achieved significant improvements, there are challenges: firstly, the face recognition performance under various extreme conditions is still not ideal enough, such as an age span, a large gesture span, extreme illumination conditions, low resolution and the like; secondly, efficient training of CNNs requires extensive training data, and deep neural network-based models once trained and formed become an end-to-end regularized mapping function, lacking in interpretability and transparency. The deep learning model can not display a decision-making reasoning process in the identification process only by means of data driving and lack of 'knowledge' guidance, and the effectiveness and the robustness of the deep learning model are also to be improved.
The introduction of knowledge can effectively alleviate the problems faced by the data driving model, increase the interpretability of the model, reduce the dependence of the model on data quantity and increase the robustness of the model. Knowledge is divided into explicit knowledge and implicit knowledge, wherein general explicit knowledge comprises explicit knowledge such as word meaning interpretation, semantic relation and the like in knowledge bases such as semantic dictionary, semantic network, knowledge map and the like, and the knowledge is commonly called as common knowledge, world knowledge and the like; implicit knowledge refers to knowledge that is difficult to specify but is beneficial to model understanding, such as sample properties, context information, causal relationships, mined features, etc. used in the model training process, which is generally domain law, and needs to be generated and acted upon in a specific scenario.
Some research work has been done on knowledge-guided deep learning models. For example, in a mode of combining images and text information proposed in 2017, some encyclopedic text information is used as external auxiliary information, wherein a visual stream is a common deep convolutional neural network, a text stream is a process of extracting text features in a mode of combining CNN and RNN and mutually matching and learning with the features of the deep convolutional neural network, and the last two streams jointly act on fine-grained image classification. Face recognition is a hot spot of research at home and abroad in recent years, and although a large number of face recognition methods are proposed by researchers at present, the face recognition based on knowledge guidance is still rarely researched, and continuous exploration and perfection are required.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face recognition method based on a visual semantic interaction module fused with an attention mechanism so as to improve the accuracy and the robustness of a face recognition system.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a face recognition method of a visual semantic interaction module based on a fusion attention mechanism comprises the following steps:
s1: acquiring a face data set containing text description;
s2: a visual semantic interaction module integrating an attention mechanism is adopted to construct a face recognition network model based on knowledge guidance, and the face recognition network model comprises three parts of text feature extraction, visual feature extraction and visual semantic interaction;
s3: initializing a face recognition network model established in the step S2, selecting an optimizer, and setting parameters of network training;
s4: optimizing a face recognition network model by using the loss function and storing;
s5: and loading an optimal face recognition network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result.
Further, in step S1, a training set and a testing set are divided by using a large face image dataset Multi-Modal-CelebA-HQ. The data set selects high-resolution face images from CelebA data set, and each image has corresponding text description.
Further, in step S2, the text feature extraction uses a long-short-term memory network LSTM, the visual feature extraction uses a residual network res net, and the visual semantic interaction consists of a modal delivery and a plurality of attention modules.
Further, step S3 establishes a network model by using a pytorch framework, initializes the weight of the network model, selects a random gradient descent SGD optimizer for training, and initializes the learning rate.
Further, in step S4, a MagFace loss function is adopted in the visual feature extraction portion, a cross entropy loss function is adopted in the text feature extraction portion, and the model is jointly optimized through the two loss functions.
The invention also provides a human face recognition device based on the visual semantic interaction module fusing the attention mechanism, which comprises:
a data set module: acquiring a face data set containing text description;
the construction module comprises: a visual semantic interaction module integrating an attention mechanism is adopted to construct a face recognition network model based on knowledge guidance, and the face recognition network model comprises a text feature extraction module, a visual feature extraction module and a visual semantic interaction module;
an initialization module: initializing a face recognition network model established in the step S2, selecting an optimizer, and setting parameters of network training;
a loss function module: optimizing a face recognition network model by using the loss function and storing;
and an identification module: and loading an optimal face recognition network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result.
Furthermore, the data set module adopts a large-scale face image data set Multi-Modal-CelebA-HQ to divide a training set and a testing set. The data set selects high-resolution face images from CelebA data set, and each image has corresponding text description.
Furthermore, the text feature extraction module adopts a long-short-term memory network LSTM, the visual feature extraction module adopts a residual network ResNet, and the visual semantic interaction module consists of a modal transmission module and a plurality of attention modules.
Furthermore, a pytorch frame is adopted in the initialization module to build a network model, the weight of the network model is initialized, a random gradient descent SGD optimizer is selected for training, and the learning rate is initialized.
Furthermore, the loss function module adopts a MagFace loss function in the visual feature extraction part, and adopts a cross entropy loss function in the text feature extraction part, so that the model is jointly optimized through the two loss functions.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a human face recognition method based on a visual semantic interaction module fused with an attention mechanism, which is used for respectively extracting visual features and text features, better extracting visual knowledge through visual semantic interaction and improving the accuracy of human face recognition.
2. The visual semantic interaction provided by the invention is based on the attention mechanism, the visual knowledge is extracted in a segmented way by using a plurality of attention units, and finally the average value is obtained to obtain the visual information after text guidance, so that the accuracy and the robustness of the face recognition system can be further improved.
Drawings
Fig. 1 is a flowchart of a face recognition method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of the overall structure of face recognition according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an attention network of an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
For the purpose of making the objects and features of the present invention more comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the drawings are in a very simplified form and use non-precise ratios for convenience and clarity in assisting in the description of the embodiments of the invention.
Fig. 1 shows a flow chart of implementation of a face recognition method of a visual semantic interaction module based on a fusion attention mechanism, which comprises the following steps:
step 1: a face dataset is obtained that contains a textual description.
A dataset Multi-Modal-CelebA-HQ was used, which was 30000 high resolution face images selected from the CelebA dataset, each image having a corresponding textual description. The training set and the test set are divided according to the ratio of 7:3, wherein the training set comprises 21000 face images, the test set comprises 9000 face images, and 10 text descriptions are selected for each image.
Step 2: and a visual semantic interaction module fused with an attention mechanism is adopted to construct a face recognition model based on knowledge guidance.
As shown in fig. 2, the face recognition model based on knowledge guidance is constructed mainly by three parts: the system comprises a text feature extraction module, a visual feature extraction module and a visual semantic interaction module. The text feature extraction module adopts a long-short-term memory network (LSTM) to extract text features, and the network can learn long dependency relationships and better process time sequence tasks. Then, a discard rate r is defined to represent the probability of randomly discarding real samples, which can enable the network to randomly discard some information during training to improve model generalization capability.
The visual characteristic extraction module firstly carries out face detection and face alignment on an input image, then adopts a residual error network (ResNet-100) to extract visual characteristics, the ultra-deep network structure can improve the representation capability of the network, and meanwhile, the residual error module can prevent gradient disappearance or gradient explosion.
The visual semantic interaction module consists of a modal transfer and a plurality of attention modules, wherein a modal transfer function is realized by a two-layer fully-connected network with trainable parameters, and after the global visual characteristics of an image are given, the function can approximately describe the simulated text codes of the image content, so that asynchronous training and testing behaviors are realized. After the model is trained, if only the image is input, the text coding of the image content can be simulated through the mode transfer function, so that the image recognition can be performed without considering the text availability.
The scaling dot product attention mechanism is defined as equation (1):
wherein Q is a Query vector matrix (Query), K is a Key vector matrix (Key), K T Is the transposed matrix of K, V is the Value vector matrix (Value), d k As the dimension of the vector k,for the scaling factor, the product result is prevented from being too large.
The attention unit network of the visual semantic interaction module is shown in fig. 3, and is inspired by a scaling dot product attention mechanism, which is defined as a formula (2):
f=A(S T W q ,V T W k ,V T W v )W f (2)
wherein V is a matrix of visual features of a face image, V T Is the transposed matrix of V; s is a matrix of text features described by the corresponding sentence, S T Is the transposed matrix of S; w (W) q 、W k 、W v Is a learnable parameterA matrix mapping S and V to three different vector spaces Q, K, V, respectively. The query vector matrix Q and the key vector matrix K are subjected to scaling dot product operation to obtain scoresAnd then, carrying out softmax operation on the score to obtain a weight distribution matrix, and finally multiplying the weight distribution matrix by a value vector matrix V to obtain an output matrix A. W (W) f For a learnable parameter matrix, the output a can be mapped back to the original dimension to obtain the fusion feature f. Obtaining n fusion features f through n attention networks 1 …f n (n is an empirically selected radix factor) and then the n fused features are averaged to obtain a fused feature vector F. The fusion feature vector is defined as formula (3):
wherein, the liquid crystal display device comprises a liquid crystal display device,respectively f 1 …f n Delta represents global average pooling operation, and finally a fusion feature vector F with the dimension of 2 is obtained.
After the fusion feature vector F is obtained, the fusion feature vector F is input into a classifier formed by two layers of fully-connected neural networks, and a final output result is obtained.
Step 3: initializing a network model, selecting an optimizer, and setting parameters of network training.
And (3) establishing a network model by adopting a pytorch framework, initializing the weight of the network model, selecting a random gradient descent (SGD) optimizer for training, setting a mini-batch to 64, and dynamically reducing the learning rate from 0.001 to 0.00015.
Step 4: the network model is optimized and saved using the loss function.
The visual feature extraction module adopts a MagFace loss function, the loss function introduces a self-adaptive mechanism, and the compactness in the class is enhanced by pulling a high-quality sample to the class center and pushing a low-quality sample away, so that the face recognition capability is improved. The MagFace loss function is defined as equation (4):
wherein N is the number of face samples of a training batch, and exceeds a parameter lambda g For balancing classification loss and regularization loss. In the classification loss, s is the scaling parameter,is weight->And feature x i Included angle between (a) and m (a) i ) Is an angle edge penalty function, and the boundary can be dynamically adjusted according to the feature size. g (a) i ) As a regularization function, low quality samples can be pushed toward the boundaries of the feasible region, and high quality samples pulled toward the center of the class.
The text feature extraction module adopts a cross entropy loss function, which is defined as a formula (5):
where N is the number of samples, y i Tag for sample i (positive class 1, negative class 0), p i The probability of being a positive class is predicted for sample i.
This step optimizes the model shown in fig. 2 by the two above loss functions together, the total loss of which is defined as equation (6):
L=L Mag +λL PE (6)
wherein λ is the hyper-parameter.
Step 5: and loading an optimal network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result.
The test dataset was the post 9000 images and text in the Multi-Modal-CelebA-HQ dataset. During model test, only an image or an image-text pair can be input for test, and a corresponding face recognition result is obtained.
Step 6: an evaluation index is calculated to evaluate the performance of the network model.
And (5) calculating an evaluation index false recognition rate (FAR), a rejection rate (FRR) and an ROC curve according to the face recognition result generated in the step (5), and evaluating the performance of the network model.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. The face recognition method of the visual semantic interaction module based on the fusion attention mechanism is characterized by comprising the following steps of:
s1: acquiring a face data set containing text description;
s2: a visual semantic interaction module integrating an attention mechanism is adopted to construct a face recognition network model based on knowledge guidance, and the face recognition network model comprises three parts of text feature extraction, visual feature extraction and visual semantic interaction;
s3: initializing a face recognition network model established in the step S2, selecting an optimizer, and setting parameters of network training;
s4: optimizing a face recognition network model by using the loss function and storing;
s5: and loading an optimal face recognition network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result.
2. The method for recognizing human face based on the visual semantic interaction module of the fusion attention mechanism according to claim 1, wherein the step S1 adopts a large-scale human face image dataset Multi-Modal-CelebA-HQ to divide a training set and a testing set. The data set selects high-resolution face images from CelebA data set, and each image has corresponding text description.
3. The method for recognizing human face based on the visual semantic interaction module of the fusion attention mechanism according to claim 1, wherein in the step S2, the text feature extraction adopts a long-short-term memory network LSTM, the visual feature extraction adopts a residual network res net, and the visual semantic interaction consists of modal transmission and a plurality of attention modules.
4. The face recognition method of the visual semantic interaction module based on the fusion attention mechanism according to claim 1, wherein step S3 is characterized in that a pytorch framework is adopted to build a network model, the weight of the network model is initialized, a random gradient descent SGD optimizer is selected for training, and the learning rate is initialized.
5. The face recognition method of a visual semantic interaction module based on a fused attention mechanism according to claim 1, wherein step S4 uses MagFace loss functions in a visual feature extraction part, and uses cross entropy loss functions in a text feature extraction part, and the model is optimized by both loss functions.
6. A human face recognition device based on a visual semantic interaction module fusing attention mechanisms, which is characterized by comprising:
a data set module: acquiring a face data set containing text description;
the construction module comprises: a visual semantic interaction module integrating an attention mechanism is adopted to construct a face recognition network model based on knowledge guidance, and the face recognition network model comprises a text feature extraction module, a visual feature extraction module and a visual semantic interaction module;
an initialization module: initializing a face recognition network model established in the step S2, selecting an optimizer, and setting parameters of network training;
a loss function module: optimizing a face recognition network model by using the loss function and storing;
and an identification module: and loading an optimal face recognition network model generated in the training process, acquiring a test data set, inputting the test data set into the network model, and generating a corresponding face recognition result.
7. The human face recognition device based on the visual semantic interaction module of the fusion attention mechanism according to claim 6, wherein the data set module adopts a large-scale human face image data set Multi-Modal-CelebA-HQ to divide a training set and a testing set. The data set selects high-resolution face images from CelebA data set, and each image has corresponding text description.
8. The human face recognition device based on a visual semantic interaction module with a fused attention mechanism according to claim 6, wherein the text feature extraction module adopts a long-short-term memory network LSTM, the visual feature extraction module adopts a residual network ResNet, and the visual semantic interaction module consists of a modal transmission module and a plurality of attention modules.
9. The human face recognition device based on the visual semantic interaction module of the fusion attention mechanism, which is characterized by comprising a pytorch framework, wherein the initialization module is used for establishing a network model, initializing the weight of the network model, selecting a random gradient descent SGD optimizer for training, and initializing the learning rate.
10. The human face recognition device based on the visual semantic interaction module of the fusion attention mechanism according to claim 6, wherein the loss function module adopts a MagFace loss function in a visual feature extraction part, a text feature extraction part adopts a cross entropy loss function, and the model is optimized through the two loss functions together.
CN202310243882.3A 2023-03-15 2023-03-15 Human face recognition method of visual semantic interaction module based on fusion attention mechanism Pending CN116563909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310243882.3A CN116563909A (en) 2023-03-15 2023-03-15 Human face recognition method of visual semantic interaction module based on fusion attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310243882.3A CN116563909A (en) 2023-03-15 2023-03-15 Human face recognition method of visual semantic interaction module based on fusion attention mechanism

Publications (1)

Publication Number Publication Date
CN116563909A true CN116563909A (en) 2023-08-08

Family

ID=87490502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310243882.3A Pending CN116563909A (en) 2023-03-15 2023-03-15 Human face recognition method of visual semantic interaction module based on fusion attention mechanism

Country Status (1)

Country Link
CN (1) CN116563909A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
CN113128369A (en) * 2021-04-01 2021-07-16 重庆邮电大学 Lightweight network facial expression recognition method fusing balance loss
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method
CN113128369A (en) * 2021-04-01 2021-07-16 重庆邮电大学 Lightweight network facial expression recognition method fusing balance loss

Similar Documents

Publication Publication Date Title
CN110490946B (en) Text image generation method based on cross-modal similarity and antagonism network generation
Sarkhel et al. A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts
CN109992783B (en) Chinese word vector modeling method
Zhang et al. Dense residual network: enhancing global dense feature flow for character recognition
CN107704859A (en) A kind of character recognition method based on deep learning training framework
US11568140B2 (en) Optical character recognition using a combination of neural network models
Coquenet et al. Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition?
Cheng et al. A semi-supervised deep learning image caption model based on Pseudo Label and N-gram
CN111985525A (en) Text recognition method based on multi-mode information fusion processing
CN111209366A (en) Implicit discourse relation identification method based on TransS-driven mutual excitation neural network
CN113220891A (en) Unsupervised concept-to-sentence based generation confrontation network image description algorithm
CN115827954A (en) Dynamically weighted cross-modal fusion network retrieval method, system and electronic equipment
Noubigh et al. Densely connected layer to improve VGGnet-based CRNN for Arabic handwriting text line recognition
Inunganbi et al. Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray
CN114662586A (en) Method for detecting false information based on common attention multi-mode fusion mechanism
Chithambarathanu et al. Character recognition tamil language in printed images using convolutional neural network (cnn) analysis
CN116720519B (en) Seedling medicine named entity identification method
Li et al. SwordNet: Chinese character font style recognition network
CN117093692A (en) Multi-granularity image-text matching method and system based on depth fusion
CN114881038B (en) Chinese entity and relation extraction method and device based on span and attention mechanism
Chen et al. Reconstruction combined training for convolutional neural networks on character recognition
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
CN116563909A (en) Human face recognition method of visual semantic interaction module based on fusion attention mechanism
CN115577072A (en) Short text sentiment analysis method based on deep learning
Panda et al. Complex odia handwritten character recognition using deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination