CN112307939B - Video frame enhancement method using position mask attention mechanism - Google Patents

Video frame enhancement method using position mask attention mechanism Download PDF

Info

Publication number
CN112307939B
CN112307939B CN202011172682.6A CN202011172682A CN112307939B CN 112307939 B CN112307939 B CN 112307939B CN 202011172682 A CN202011172682 A CN 202011172682A CN 112307939 B CN112307939 B CN 112307939B
Authority
CN
China
Prior art keywords
feature map
video frame
matrix
attention mechanism
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011172682.6A
Other languages
Chinese (zh)
Other versions
CN112307939A (en
Inventor
马汝辉
王超逸
宋涛
华扬
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202011172682.6A priority Critical patent/CN112307939B/en
Publication of CN112307939A publication Critical patent/CN112307939A/en
Application granted granted Critical
Publication of CN112307939B publication Critical patent/CN112307939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A video frame enhancement method using a position mask attention mechanism inputs feature maps of two adjacent video frames, aligns the positions of the same pixel on different frames through position information, and thereby enhances the information content of a current frame by using the information of a previous frame, including two parts of position distance mask generation and position attention information fusion; generating a position distance mask by utilizing the distance between two adjacent frames of pixel points to generate a mask matched with the size of the characteristic image according to the size of the input characteristic image; and the position attention information fusion utilizes the generated position distance mask to guide the original attention mechanism to endow the aligned pixel points with larger weight, so that an enhanced feature map is generated to replace the original feature map of the current frame for subsequent processing. The method is based on the attention mechanism, does not need additional training parameters, can achieve higher convergence speed and better prediction result than the original attention mechanism, and can be widely applied to various video tasks.

Description

Video frame enhancement method using position mask attention mechanism
Technical Field
The invention relates to the field of video processing of computer vision directions, in particular to a method for enhancing a current frame in various video tasks by using an attention mechanism containing position information.
Background
Attention mechanism is one of the hot research problems in the field of deep learning. Attention mechanisms and variants thereof have attracted a great deal of attention in various fields and have advanced in great ways. In addition to Natural Language Processing (NLP), many attention-based approaches have achieved tremendous success in the field of Computer Vision (CV), such as object detection and instance segmentation.
In the video domain, attention mechanisms are often used for information enhancement of frames. Inputting two frames of feature maps processed by a feature extractor (feature extractor), converting the feature maps of the target frames into queries (query) by using convolution with three different kernel sizes being one by one, converting the feature maps of the reference frames into keys (key) and values (value), and obtaining a new feature map with the same size as the original feature map by using an attention mechanism to replace the feature map of the target frames for subsequent processing. The attention mechanism can learn the similarity of different pixel positions between two input frames during training and endow similar areas with larger weight. Therefore, the attention mechanism is a general method for solving the problems of occlusion, motion blur and the like in various video tasks.
The original attention mechanism is position-insensitive (position-insensitive), the output of which does not converge to different results with the rearrangement of the input sequence, and for some position-sensitive tasks, which contain some position-sensitive prior knowledge, such as video frame enhancement, it defaults that the position of the last frame pixel alignment between two adjacent frames is approximately close to the current frame pixel, so that the encoding position information can better model these tasks in the original attention mechanism.
The existing methods for encoding position information in the attention mechanism all use position embedding (position embedding). Location embedding defines a set of independent trainable parameters to apply to the relative location vector and apply the result to the similarity matrix resulting from the multiplication of query and key in softmax operation. Obviously, the position embedding method requires additional parameters in the training process, which results in additional memory cost, slow convergence speed and high training variance. In addition, the input size of the position embedding method must be fixed to ensure that the number of embedding parameters is unchanged in advance. In other words, the small difference in input size can make this approach unusable, limiting the model's migratability.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is to design a general video frame enhancement module that does not limit the input size and can encode position information in the attention mechanism, in combination with the original method for enhancing video frames in the attention mechanism. The module inputs the feature maps of two adjacent frames of videos, and replaces the original feature map with the output result, so that the module is a plug-and-play module which is universal for various video tasks. During the period, 2 technical difficulties need to be overcome:
(1) How to make the model focus more on the region with relatively high importance degree in the video; the importance of the frame in the video is different from that of the frame, the importance of a partial area in the video is higher than that of other areas, and the performance can be improved if the model can focus more on the area.
(2) How to design a representation of the encodable location information that does not require additional training; the original position embedding method adopts fixed parameter training position information, so that the input size is limited to be fixed, extra memory is needed for storing parameters, the convergence speed is reduced, the variance of a training result is increased, and the like.
The invention adopts the generation of a position distance mask and the fusion of attention information, wherein the position distance mask generates a pixel distance matrix for each pixel in a previous frame of feature map for each pixel in a current frame of feature map by utilizing Manhattan distance, and then the pixel distance matrices are combined into the position distance mask; and the position attention information fusion utilizes the generated position distance mask, and performs point multiplication on the product embedded by the adjacent two frames of feature maps through a learnable scale factor, so that the position information is encoded in the attention mechanism, and the generated enhanced feature map is endowed with higher weight to the adjacent position, thereby optimizing the original attention mechanism.
The method comprises the following steps:
step 1, inputting a video frame, and extracting a characteristic diagram through a pre-training convolutional neural network.
And 2, obtaining an enhanced feature map by using a feature map enhancement module.
And 3, performing subsequent processing and prediction by using the enhanced feature map.
And 4, outputting a prediction result.
Before using the feature map enhancement module, the feature map enhancement module needs to be trained, and the training step comprises the following steps:
step 2.1, initializing iterative counting;
step 2.2, if the iteration times are within N times, continuing, otherwise, ending the training;
2.3, inputting two adjacent frames of the video;
2.4, extracting a feature map by using a feature extractor;
step 2.5, embedding the two characteristic graphs into q, k and v respectively;
step 2.6, processing by using a multi-head attention mechanism;
step 2.7, calculating a position distance mask;
and 2.8, obtaining the added characteristic diagram to replace the original characteristic diagram, namely the characteristic diagram obtained in the step 2.4, carrying out subsequent processing, and turning to the step 2.2.
Further, in step 2.4, video frame features are extracted by using a pre-trained convolutional neural network.
Preferably, in step 2.4, the extracted video frame features are a feature map with more channels than the original image.
Preferably, in step 2.4, the feature extractor performs down-sampling, typically using ResNet, to obtain the feature map.
Preferably, in step 2.4, the number of feature map channels per frame is 1024.
Further, in step 2.5, q, k, v refer to query (query), key (key) and value (value), respectively. And performing channel compression on the feature map of the current frame by using convolution with convolution kernel size of 1 by 1 to serve as query (query), and performing channel compression on the feature map of the previous frame by using convolution with two different convolution kernels with convolution kernel sizes of 1 by 1 to respectively obtain a key (key) and a value (value).
Further, in step 2.6, the query, key and value are reshaped (reset) from a (batch, channel, height, width) tensor to a (batch, group, height width, sub _ channel) tensor as new query, key and value using a multi-head attention mechanism (multi-head attention).
Further, in step 2.7, the new query is multiplied by the transpose of the key using matrix multiplication to obtain a relationship matrix, and the activation function is applied to the relationship matrix.
And (3) inputting the height (height) and the width (width) of the original feature map, namely the feature map obtained in the step 2.4, calculating the distance between each pixel position and other positions by using the Manhattan distance, and generating a matrix with the size of height width at each position to obtain height width matrixes together. The matrices are reshaped and stitched together to obtain a position mask matrix of size (height width), a trainable scalar scale is broadcast, and an activation function is used.
Preferably, in step 2.7, tanh is used as the activation function to act on the relationship matrix, and sigmoid is used as the activation function to act on the position mask matrix.
Further, in step 2.8, element-level multiplication is performed on the relationship matrix passing through the activation function and the position mask matrix to obtain a weight matrix. And (4) performing softmax on the weight matrix along the last dimension, multiplying the obtained result by the new value (value) obtained in the step 2.6, reshaping the value to be the same as the original characteristic diagram, and obtaining an enhanced characteristic diagram to replace the current frame to complete subsequent processing and training.
Compared with the prior art, the invention has the following beneficial effects:
(1) Based on prior knowledge in video frame alignment, the invention uses a heuristic method to generate a position distance mask matched with the input size, better models the position relation in the video and obtains better performance in various tasks needing video frame enhancement.
(2) The invention uses a heuristic method to generate the position distance mask matched with the input size based on the prior knowledge in the video frame alignment, solves the problem of the requirement limitation of the prior position embedding method on the invariability of the input size, and is convenient for the model to be trained and transferred by using different input sizes.
(3) The invention uses a heuristic method to generate and input the position distance mask matched with the input size based on the prior knowledge in the video frame alignment, does not need additional parameters for training, reduces the limitation of model training on the memory, and can more quickly converge the test model to the optimal result.
Drawings
FIG. 1 is a functional block diagram of an embodiment of the present application;
FIG. 2 is a schematic of a training flow of an embodiment of the present application;
fig. 3 is a schematic operational flow diagram of an embodiment of the present application.
Detailed Description
The preferred embodiments of the present application will be described below with reference to the accompanying drawings for clarity and understanding of the technical contents thereof. The present application may be embodied in many different forms of embodiments and the scope of the present application is not limited to only the embodiments set forth herein.
The conception, the specific structure and the technical effects will be further described in order to fully understand the objects, the features and the effects of the present invention, but the present invention is not limited thereto.
An embodiment of the invention
As shown in fig. 1, the present embodiment provides two modules to implement the method of the present invention, one module is a feature extractor, and the other module is a video frame enhancement module.
The feature extractor includes a pre-trained convolutional neural network, which functions as: and receiving an input video frame, and extracting and outputting a feature map.
The video frame enhancement module has the functions of: and outputting the enhanced feature map through the attention information enhancement and the position distance mask.
As shown in fig. 3, the method for enhancing video frames by using a position mask attention mechanism according to the present invention includes the following steps:
step 1, inputting a video frame, and extracting a characteristic diagram through a pre-training convolutional neural network.
And 2, obtaining an enhanced feature map by using a feature map enhancement module.
And 3, performing subsequent processing and prediction by using the enhanced feature map.
And 4, outputting a prediction result.
Before using the feature map enhancement module, it needs to be trained, as shown in fig. 2, the training step includes:
step 2.1, initializing iterative counting;
step 2.2, if the iteration times are within N times, continuing, otherwise, ending the training;
2.3, inputting two adjacent frames of the video;
2.4, extracting a feature map by using a feature extractor;
step 2.5, embedding the two characteristic graphs into q, k and v respectively;
step 2.6, processing by using a multi-head attention mechanism;
step 2.7, calculating a position distance mask;
and 2.8, obtaining the added characteristic diagram, replacing the original characteristic diagram, namely the characteristic diagram obtained in the step 2.4, carrying out subsequent processing, and turning to the step 2.2.
In step 2.4, video frame features are extracted by using the pre-trained convolutional neural network, the extracted video frame features are usually feature maps with more channels than original images, and the feature extractor usually adopts ResNet to perform downsampling to obtain feature maps, so that the number of channels of each frame of feature maps is 1024.
In step 2.5, q, k, v refer to query, key and value, respectively. And performing channel compression on the feature map of the current frame by using convolution with convolution kernel size of 1 by 1 to serve as query (query), and performing channel compression on the feature map of the previous frame by using convolution with two different convolution kernels with convolution kernel sizes of 1 by 1 to respectively obtain a key (key) and a value (value).
In step 2.6, the query, key and value are reshaped (reset) from a (batch, channel, height, width) tensor to a (batch, group, height width, sub _ channel) tensor using a multi-head attention mechanism (multi-head attention) as new query, key and value.
In step 2.7, the new query is multiplied by the transpose of the key using matrix multiplication to obtain a relationship matrix, and tanh is used as an activation function to act on the relationship matrix.
And (3) inputting the height (height) and the width (width) of the original feature map, namely the feature map obtained in the step 2.4, calculating the distance between each pixel position and other positions by using the Manhattan distance, and generating a matrix with the size of height width at each position to obtain height width matrixes together. The matrices are reshaped and stitched together to obtain a position mask matrix with size (height width ), a trainable scalar scale is broadcast and sigmoid is used as an activation function.
It should be noted that in the above process, the mask matrix is calculated according to the size of the input feature map, and the position information parameter to be trained is only a scalar.
In step 2.8, element-level multiplication is performed on the relation matrix of the activation function and the position mask matrix to obtain a weight matrix. And (3) performing softmax on the weight matrix along the last dimension, multiplying the obtained result by the new value (value) obtained in the step 2.6, and reshaping to the size same as that of the original characteristic diagram to obtain an enhanced characteristic diagram to replace the current frame to complete subsequent processing and training.
The pseudo code of the main program of the training model algorithm is as follows:
Figure BDA0002747781440000051
the method utilizes prior information in video feature enhancement, avoids a large number of training parameters by a heuristic method, enables a model to be converged more quickly, and has an effect obviously superior to that of a traditional attention mechanism.
The foregoing detailed description of the preferred embodiments of the present application. It should be understood that numerous modifications and variations can be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the concepts of the present application should be within the scope of protection defined by the claims.

Claims (7)

1. A method for video frame enhancement using a position mask attention mechanism, comprising the steps of:
step 1, inputting a video frame, and extracting a characteristic diagram through a pre-training convolutional neural network;
step 2, obtaining an enhanced feature map by using a feature map enhancement module;
step 3, using the enhanced feature map to perform subsequent processing and prediction;
step 4, outputting a prediction result;
the step of training the feature map enhancement module comprises:
step 2.1, initializing iterative counting;
step 2.2, if the iteration times are within N times, continuing, otherwise, ending the training;
2.3, inputting two adjacent frames of the video;
2.4, extracting a feature map by using a feature extractor;
step 2.5, embedding the two characteristic graphs into a query, a key and a value respectively;
step 2.6, processing by using a multi-head attention mechanism;
step 2.7, calculating a position distance mask;
2.8, obtaining the enhanced characteristic diagram, replacing the original characteristic diagram for subsequent processing, and turning to the step 2.2;
in the step 2.5, the feature map of the current frame is subjected to channel compression by using a convolution with a convolution kernel size of 1 by 1 as the query, and the feature map of the previous frame is subjected to channel compression by using two different convolutions with convolution kernels sizes of 1 by 1 to respectively obtain the key and the value;
in the step 2.6, the query, the key and the value obtained in the step 2.5 are reshaped from a tensor with a size of (batch, channel, height, width) to a tensor with a size of (batch, group, height width, sub _ channel) by using a multi-head attention mechanism to serve as a new query, key and value;
in the step 2.7, multiplying the new query obtained in the step 2.6 by the transpose of the key by using matrix multiplication to obtain a relation matrix, and acting an activation function on the relation matrix;
inputting the height and width of the original feature map, calculating the distance between each pixel position and other positions by using the Manhattan distance, and generating a matrix with the height and width at each position to obtain height and width matrixes altogether; the matrices are reshaped and stitched together to obtain a position mask matrix of size (height width), a trainable scalar scale is broadcast, and an activation function is used.
2. The video frame enhancement method of claim 1, wherein in step 2.4, the feature extractor comprises a pre-trained convolutional neural network, and the video frame features are extracted by using the pre-trained convolutional neural network.
3. The method of claim 2, wherein in step 2.4, the extracted features of the video frame are a feature map of smaller size and more channels than the original image.
4. The video frame enhancement method of claim 2, wherein in step 2.4, the feature extractor performs downsampling using ResNet to obtain a feature map.
5. The video frame enhancement method of claim 2, wherein in step 2.4, the number of feature map channels per frame is 1024.
6. The video frame enhancement method of claim 1, wherein in step 2.7, tanh is used as an activation function to act on the relation matrix, and sigmoid is used as an activation function to act on the position mask matrix.
7. The video frame enhancement method according to claim 6, characterized in that in step 2.8, element-level multiplication is performed on the relationship matrix passing through the activation function and the position mask matrix to obtain a weight matrix; and (3) performing softmax on the weight matrix along the last dimension, multiplying the obtained result by the new value obtained in the step (2.6), and reshaping to the size same as that of the original characteristic diagram to obtain the enhanced characteristic diagram so as to replace the current frame to complete subsequent processing and training.
CN202011172682.6A 2020-10-28 2020-10-28 Video frame enhancement method using position mask attention mechanism Active CN112307939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011172682.6A CN112307939B (en) 2020-10-28 2020-10-28 Video frame enhancement method using position mask attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011172682.6A CN112307939B (en) 2020-10-28 2020-10-28 Video frame enhancement method using position mask attention mechanism

Publications (2)

Publication Number Publication Date
CN112307939A CN112307939A (en) 2021-02-02
CN112307939B true CN112307939B (en) 2022-10-04

Family

ID=74331565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011172682.6A Active CN112307939B (en) 2020-10-28 2020-10-28 Video frame enhancement method using position mask attention mechanism

Country Status (1)

Country Link
CN (1) CN112307939B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205005B (en) * 2021-04-12 2022-07-19 武汉大学 Low-illumination low-resolution face image reconstruction method
CN113793393B (en) * 2021-09-28 2023-05-09 中国人民解放军国防科技大学 Unmanned vehicle multi-resolution video generation method and device based on attention mechanism
CN114913273B (en) * 2022-05-30 2024-07-09 大连理工大学 Animation video line manuscript coloring method based on deep learning
CN116665110B (en) * 2023-07-25 2023-11-10 上海蜜度信息技术有限公司 Video action recognition method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413838A (en) * 2019-07-15 2019-11-05 上海交通大学 A kind of unsupervised video frequency abstract model and its method for building up

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079532B (en) * 2019-11-13 2021-07-13 杭州电子科技大学 Video content description method based on text self-encoder
CN111242837B (en) * 2020-01-03 2023-05-12 杭州电子科技大学 Face anonymity privacy protection method based on generation countermeasure network
CN111488474B (en) * 2020-03-21 2022-03-18 复旦大学 Fine-grained freehand sketch image retrieval method based on attention enhancement

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413838A (en) * 2019-07-15 2019-11-05 上海交通大学 A kind of unsupervised video frequency abstract model and its method for building up

Also Published As

Publication number Publication date
CN112307939A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112307939B (en) Video frame enhancement method using position mask attention mechanism
Zhang et al. Context encoding for semantic segmentation
Tian et al. Designing and training of a dual CNN for image denoising
CN109344288B (en) Video description combining method based on multi-modal feature combining multi-layer attention mechanism
CN109543502B (en) Semantic segmentation method based on deep multi-scale neural network
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN110458085B (en) Video behavior identification method based on attention-enhanced three-dimensional space-time representation learning
CN112990116B (en) Behavior recognition device and method based on multi-attention mechanism fusion and storage medium
CN113313173B (en) Human body analysis method based on graph representation and improved transducer
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN112801280A (en) One-dimensional convolution position coding method of visual depth self-adaptive neural network
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
Sethy et al. Off-line Odia handwritten numeral recognition using neural network: a comparative analysis
CN114494701A (en) Semantic segmentation method and device based on graph structure neural network
CN107239827B (en) Spatial information learning method based on artificial neural network
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN115713632A (en) Feature extraction method and device based on multi-scale attention mechanism
Zhen et al. Toward compact transformers for end-to-end object detection with decomposed chain tensor structure
CN109063555B (en) Multi-pose face recognition method based on low-rank decomposition and sparse representation residual error comparison
Liu et al. Deep dual-stream network with scale context selection attention module for semantic segmentation
CN111242216A (en) Image generation method for generating anti-convolution neural network based on conditions
CN113627436B (en) Unsupervised segmentation method for surface-stamped character image
CN112560712B (en) Behavior recognition method, device and medium based on time enhancement graph convolutional network
CN113011163A (en) Compound text multi-classification method and system based on deep learning model
Zhou et al. Lightweight Self-Attention Network for Semantic Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant