CN112070114A - Scene character recognition method and system based on Gaussian constraint attention mechanism network - Google Patents

Scene character recognition method and system based on Gaussian constraint attention mechanism network Download PDF

Info

Publication number
CN112070114A
CN112070114A CN202010767079.6A CN202010767079A CN112070114A CN 112070114 A CN112070114 A CN 112070114A CN 202010767079 A CN202010767079 A CN 202010767079A CN 112070114 A CN112070114 A CN 112070114A
Authority
CN
China
Prior art keywords
original
feature vector
dimensional
time step
hidden state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010767079.6A
Other languages
Chinese (zh)
Other versions
CN112070114B (en
Inventor
王伟平
乔峙
秦绪功
周宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202010767079.6A priority Critical patent/CN112070114B/en
Publication of CN112070114A publication Critical patent/CN112070114A/en
Application granted granted Critical
Publication of CN112070114B publication Critical patent/CN112070114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a scene character recognition method and a scene character recognition system based on a Gaussian constraint attention mechanism network, which relate to the field of image information recognition, and a two-dimensional feature map is obtained by extracting visual features of a picture to be recognized; converting the two-dimensional feature map into a one-dimensional feature sequence, and extracting global semantic information according to the one-dimensional feature sequence; inputting global semantic information into a first time step to initialize a decoding hidden state, calculating an original attention weight according to the hidden state and a two-dimensional feature map in each time step, and obtaining an original weighted feature vector by weighting and summing the weights; constructing a two-dimensional Gaussian distribution mask according to the hidden state and the original weighted feature vector, multiplying the mask by the original attention weight to obtain a corrected attention weight, and obtaining a corrected weighted feature vector according to the weight; and fusing the original weighted feature vector and the corrected weighted feature vector together to predict characters of the picture to be recognized, so that the condition of distraction can be solved.

Description

Scene character recognition method and system based on Gaussian constraint attention mechanism network
Technical Field
The invention relates to the field of image information identification, in particular to a scene character identification method and a scene character identification system based on a Gaussian constraint attention mechanism network.
Background
Text detection and recognition of scene images is a research hotspot in recent years, wherein character recognition is a core part of the whole process and has the task of transcribing characters in pictures into a form which can be directly edited by a computer. With the development of deep learning, the field is rapidly improved. Inspired by the field of machine translation, the current mainstream method is based on the structure of a coder decoder, the coder extracts rich visual features through a convolutional neural network and a cyclic neural network, and the decoder obtains required features through an attention mechanism to predict each character in a sequence according to the sequence of a text sequence.
However, the prior art has the following defects:
1. character recognition requires only a specific region of each character in a text picture at each time step of decoding, and existing methods do not fully utilize this feature of text recognition.
2. The existing method does not consider the constraint attention weight, but leads the model to freely predict the attention weight, and the problem of attention diffusion occurs in a part of pictures, namely the weight cannot be concentrated on a specific character.
3. Although there are some existing approaches that use gaussian-distributed labels for each character's position to supervise the attention weight, thereby implicitly constraining the attention weight. However, the problem of distraction still occurs in some pictures because there is no process to introduce a display constraint.
Disclosure of Invention
The invention aims to provide a scene character recognition method and a scene character recognition system based on a Gaussian constraint attention mechanism network, which can correct the original attention weight by introducing a display constraint in the process of calculating the attention weight, so that the corrected attention weight can be more concentrated in the region corresponding to a character, and the condition of attention dispersion can be solved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a scene character recognition method based on a Gaussian constraint attention mechanism network comprises the following steps:
extracting visual features of the picture to be identified to obtain a two-dimensional feature map;
converting the two-dimensional feature map into a one-dimensional feature sequence, and extracting global semantic information according to the one-dimensional feature sequence;
inputting global semantic information into a first time step to initialize a decoding hidden state, calculating an original attention weight according to the hidden state and a two-dimensional feature map in each time step, and obtaining an original weighted feature vector by weighting and summing the weights;
constructing a two-dimensional Gaussian distribution mask according to the hidden state and the original weighted feature vector, multiplying the mask by the original attention weight to obtain a corrected attention weight, and obtaining a corrected weighted feature vector according to the weight;
and fusing the original weighted feature vector and the corrected weighted feature vector together to predict characters of the picture to be recognized.
A scene character recognition system based on a Gaussian constraint attention mechanism network comprises:
the characteristic extraction module comprises a multilayer residual error network and is responsible for extracting visual characteristics of the picture to be identified to obtain a two-dimensional characteristic diagram;
the encoder module comprises a unidirectional double-layer long-short-time memory network (LSTM) and is responsible for converting the two-dimensional characteristic diagram into a one-dimensional characteristic sequence and inputting the one-dimensional characteristic sequence into the LSTM to extract global semantic information;
the decoder module comprises an attention mechanism-based unidirectional double-layer long-short time memory network AM-LSTM, and is responsible for updating the hidden state of the AM-LSTM at the first time step based on global semantic information, calculating the original attention weight according to the hidden state of the AM-LSTM and the two-dimensional feature map at each time step, and obtaining an original weighted feature vector by using the weighted sum; fusing the original weighted feature vector and the corrected weighted feature vector together to predict characters of the picture to be recognized;
and the correction module based on Gaussian constraint is responsible for constructing a two-dimensional Gaussian distribution mask according to the hidden state of the AM-LSTM and the original weighted feature vector, multiplying the mask by the original attention weight to obtain a corrected attention weight, and obtaining a corrected weighted feature vector according to the weight.
Further, the feature extraction module includes a 31-layer residual network.
Further, the encoder module is responsible for performing maximum pooling on the two-dimensional feature map and converting the two-dimensional feature map into a one-dimensional feature sequence.
Further, the decoder module is responsible for updating the hidden state of the AM-LSTM at each time step starting from the second time step based on the decoding result of the previous time step.
Furthermore, the correction module based on the Gaussian constraint is responsible for splicing the hidden state of the AM-LSTM and the original weighted feature vector, predicting a group of parameters of Gaussian distribution through a full connection layer, and constructing a two-dimensional Gaussian distribution mask by using the parameters, wherein the parameters comprise a mean value and a variance.
Further, the present system optimizes training by calculating a character recognition penalty optimized by calculating a cross entropy penalty between the predicted character probability and the recognition token and an attention weight penalty optimized by calculating an L1 regression penalty between the predicted character attention distribution and the character position token.
Compared with the existing method, the invention provides a brand-new correction module based on Gaussian constraint, and the module predicts a Gaussian mask to correct the original attention weight. Since characters in the character recognition task are usually regular shapes, the model predicts a Gaussian mask as a display constraint to correct the original attention weight. The attention weight after correction is more concentrated on the area corresponding to the character, so that the condition of attention dispersion can be solved. Experiments show that the invention can obtain more excellent performance on the existing data set, and the module provided by the invention is very flexible and can be used in the existing attention-based method.
Drawings
Fig. 1 is a schematic structural diagram of a scene character recognition network based on a gaussian constraint attention mechanism network according to an embodiment.
Fig. 2 is a schematic diagram of a decoder according to an embodiment.
Fig. 3 is a comparison graph of visualization of recognition results of the present invention and the prior art method.
Detailed Description
In order to make the technical solution of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The embodiment discloses a scene character recognition method and a scene character recognition system (GCAN) based on a Gaussian constraint attention mechanism network, and as shown in FIG. 1, the GCAN is a recognition model based on a two-dimensional attention mechanism and introduces a brand-new correction module (GSRM) based on Gaussian constraint. The GSRM has the input of the characteristics weighted by the original unconstrained attention weight and the output of the characteristics weighted by the corrected attention weight. The two feature vectors are fused and then used for predicting characters of a current decoding time step, wherein the time step refers to one step of predicting the number-th character in a word during iterative decoding, and the decoding is one-step and one-step iteration, and each step predicts corresponding characters. The system consists of four parts: the device comprises a feature extraction module, an encoder module, a decoder module and a correction module based on Gaussian constraint.
And the characteristic extraction module is composed of a residual error network with 31 layers, and the residual error network can extract abundant visual characteristics for the subsequent coding and decoding processes.
And the encoder module consists of a unidirectional double-layer long-time memory network (LSTM). Firstly, performing maximum pooling on a two-dimensional feature map output by a feature extraction module along the vertical direction to obtain a one-dimensional feature sequence. And then the one-dimensional characteristic sequence is input into the LSTM to extract context information to obtain global semantic information, and the output of the encoder module is in a hidden state at the last moment of the LSTM and is used as global semantic information to guide a decoder.
The decoder module is composed of a attention mechanism-based unidirectional double-layer long-short time memory network LSTM (AM-LSTM for short), and the structure of the decoder is shown in figure 2. The global semantic information finally output by the encoder is input at the first decoded time step, and then the decoding result of the last time step is input at each time step to update the hidden state of the AM-LSTM decoder. At each time step, the hidden state of the decoder AM-LSTM and the feature graph output by the feature extraction module calculate the original attention weight, and the feature graph is subjected to weighted summation based on the original attention weight to obtain an original weighted feature vector. Finally fusing the original weighted feature vector and the corrected weighted feature vector and predicting the character of the current time step
The correction module based on Gaussian constraint predicts a group of parameters (mean and variance) of Gaussian distribution through a full connection layer after splicing the hidden state of the decoder AM-LSTM corresponding time step and the original weighted feature vector at each decoding time step, constructs a two-dimensional Gaussian distribution as a mask by using the parameters, and finally multiplies the mask by the original attention weight to obtain a corrected attention weight, and calculates a new corrected weighted feature vector by using the corrected attention weight. Compared with the original attention weight, the corrected attention is more concentrated, so that the problem of attention dispersion can be solved.
The whole process of recognizing the scene characters by adopting the method and the system comprises the following steps:
1. and (4) extracting visual features of the input picture through a feature extraction module to obtain a two-dimensional feature map.
2. The extracted visual features are passed through an encoder module to extract global semantic information, which is then input into a decoder module.
3. The decoder module adopts an attention mechanism, calculates original attention weight through a characteristic diagram output by the hidden state and characteristic extraction module, and then obtains weighted original weighted characteristic vector.
4. And inputting the weighted original weighted feature vector and the hidden state into a correction module based on Gaussian constraint, and respectively correcting the original attention weight predicted by the decoder by using a two-dimensional Gaussian mask to obtain a corrected attention weight and then obtain a corrected weighted feature vector.
5. And fusing the corrected weighted feature vector and the original weighted feature vector together to predict a corresponding character.
6. The entire model optimizes training by calculating the loss of character recognition and attention weight. Wherein the character recognition penalty is optimized by calculating a cross entropy penalty between the predicted character probability and the recognition token, and the penalty for attention weight is optimized by calculating an L1 regression penalty between the predicted character attention distribution and the character position token.
Extensive experiments were conducted to evaluate the effectiveness of the GCAN of the present invention. The GCAN trains on two generated data Syn90K and SynthText, testing on several mainstream scene text data sets. 3000 images of IIIT 5K; mostly high quality horizontal images; the SVT has 647 images, and most of the SVT is horizontal text; there are 645 images of SVT-perspective (SVTP), in which most of the text has stronger deformation; ICDAR2013(IC13) has 1015 images, mostly high-quality horizontal text; ICDAR2015(IC15) had 1811 images, mostly arbitrarily shaped and low quality text images; the cut has 288 images, most of which are high quality curvilinear text.
Table 1 shows the comparison of the effects between the GCAN modules, and the result proves that the new module GCRM provided by the present invention can bring obvious improvement, and the performance of the existing method can not be obviously improved only by character supervision. Table 2 shows the comparison of the effect of the present invention on the test data set with other mainstream methods, and the present invention achieves the best performance on a plurality of data sets, thus proving the effectiveness of the present invention. Fig. 3 shows the visualization of the recognition result and the attention weight of the conventional method and the present invention, and for each recognized picture on the left, the first line on the right is the recognition result of the conventional method, the second line is the recognition result of the present invention, the white ring in the picture is marked as the recognition position, and the lower letter is the recognized character, so that it can be found that the present invention can effectively solve the phenomenon of attention distraction, and can obtain a better recognition result.
Table 1 comparative experiments on the respective modules
Figure BDA0002615115030000041
TABLE 2 comparison of GCAN with other methods on individual datasets
Figure BDA0002615115030000051
The above embodiments are only intended to illustrate the technical solution of the present invention, but not to limit it, and a person skilled in the art can modify the technical solution of the present invention or substitute it with an equivalent, and the protection scope of the present invention is subject to the claims.

Claims (10)

1. A scene character recognition method based on a Gaussian constraint attention mechanism network is characterized by comprising the following steps:
extracting visual features of the picture to be identified to obtain a two-dimensional feature map;
converting the two-dimensional feature map into a one-dimensional feature sequence, and extracting global semantic information according to the one-dimensional feature sequence;
inputting global semantic information into a first time step to initialize a decoding hidden state, calculating an original attention weight according to the hidden state and a two-dimensional feature map in each time step, and obtaining an original weighted feature vector by weighting and summing the weights;
constructing a two-dimensional Gaussian distribution mask according to the hidden state and the original weighted feature vector, multiplying the mask by the original attention weight to obtain a corrected attention weight, and obtaining a corrected weighted feature vector according to the weight;
and fusing the original weighted feature vector and the corrected weighted feature vector together to predict characters of the picture to be recognized.
2. The method of claim 1, wherein, starting at the second time step, the decoding result of the previous time step is inputted for each time step to update the hidden state.
3. A scene character recognition system based on a Gaussian constraint attention mechanism network is characterized by comprising:
the characteristic extraction module comprises a multilayer residual error network and is responsible for extracting visual characteristics of the picture to be identified to obtain a two-dimensional characteristic diagram;
the encoder module comprises a unidirectional double-layer long-short-time memory network (LSTM) and is responsible for converting the two-dimensional feature map into a one-dimensional feature sequence, inputting the one-dimensional feature sequence into the LSTM to extract global semantic information and outputting a hidden state of the LSTM at the last moment;
the decoder module comprises an attention mechanism-based unidirectional double-layer long-short time memory network AM-LSTM, and is responsible for updating the hidden state of the AM-LSTM at each time step based on global semantic information, calculating the original attention weight according to the hidden state of the AM-LSTM and the two-dimensional feature map at each time step, and obtaining an original weighted feature vector by using the weighted sum; fusing the original weighted feature vector and the corrected weighted feature vector together to predict characters of the picture to be recognized;
and the correction module based on Gaussian constraint is responsible for constructing a two-dimensional Gaussian distribution mask according to the hidden state of the AM-LSTM and the original weighted feature vector, multiplying the mask by the original attention weight to obtain a corrected attention weight, and obtaining a corrected weighted feature vector according to the weight.
4. The system of claim 3, wherein the feature extraction module comprises a 31-layer residual network.
5. The system of claim 3, wherein the encoder module is responsible for max pooling the two-dimensional feature map into a one-dimensional feature sequence.
6. The system of claim 3, wherein the decoder module is responsible for inputting the global semantic information at a first time step of decoding to obtain a decoding result at a next time step, and then updating the hidden state of the AM-LSTM at each time step according to the decoding result at the previous time step.
7. The system of claim 3, wherein the Gaussian constraint based unscrambling module is responsible for concatenating the hidden state of the AM-LSTM with the original weighted eigenvector and then predicting a set of parameters of Gaussian distribution through a full connection layer, and using the parameters to construct the two-dimensional Gaussian distribution mask.
8. The system of claim 7, wherein the parameters of the gaussian distribution include a mean and a variance.
9. The system of claim 1, wherein the system optimizes training by calculating a character recognition penalty and an attention weight penalty.
10. The system of claim 9, wherein the character recognition penalty is optimized by calculating a cross entropy penalty between the predicted character probability and the recognition token, and the attention weight penalty is optimized by calculating an L1 regression penalty between the predicted character attention distribution and the character position token.
CN202010767079.6A 2020-08-03 2020-08-03 Scene character recognition method and system based on Gaussian constraint attention mechanism network Active CN112070114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010767079.6A CN112070114B (en) 2020-08-03 2020-08-03 Scene character recognition method and system based on Gaussian constraint attention mechanism network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010767079.6A CN112070114B (en) 2020-08-03 2020-08-03 Scene character recognition method and system based on Gaussian constraint attention mechanism network

Publications (2)

Publication Number Publication Date
CN112070114A true CN112070114A (en) 2020-12-11
CN112070114B CN112070114B (en) 2023-05-16

Family

ID=73657592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010767079.6A Active CN112070114B (en) 2020-08-03 2020-08-03 Scene character recognition method and system based on Gaussian constraint attention mechanism network

Country Status (1)

Country Link
CN (1) CN112070114B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541501A (en) * 2020-12-18 2021-03-23 北京中科研究院 Scene character recognition method based on visual language modeling network
CN113065561A (en) * 2021-03-15 2021-07-02 国网河北省电力有限公司 Scene text recognition method based on fine character segmentation
CN113221874A (en) * 2021-06-09 2021-08-06 上海交通大学 Character recognition system based on Gabor convolution and linear sparse attention
CN113591546A (en) * 2021-06-11 2021-11-02 中国科学院自动化研究所 Semantic enhanced scene text recognition method and device
CN114463675A (en) * 2022-01-11 2022-05-10 北京市农林科学院信息技术研究中心 Underwater fish group activity intensity identification method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710833A (en) * 1995-04-20 1998-01-20 Massachusetts Institute Of Technology Detection, recognition and coding of complex objects using probabilistic eigenspace analysis
CN111428727A (en) * 2020-03-27 2020-07-17 华南理工大学 Natural scene text recognition method based on sequence transformation correction and attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710833A (en) * 1995-04-20 1998-01-20 Massachusetts Institute Of Technology Detection, recognition and coding of complex objects using probabilistic eigenspace analysis
CN111428727A (en) * 2020-03-27 2020-07-17 华南理工大学 Natural scene text recognition method based on sequence transformation correction and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何鎏一等: "基于深度学习的光照不均匀文本图像的识别***", 《计算机应用与软件》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541501A (en) * 2020-12-18 2021-03-23 北京中科研究院 Scene character recognition method based on visual language modeling network
CN112541501B (en) * 2020-12-18 2021-09-07 北京中科研究院 Scene character recognition method based on visual language modeling network
CN113065561A (en) * 2021-03-15 2021-07-02 国网河北省电力有限公司 Scene text recognition method based on fine character segmentation
CN113221874A (en) * 2021-06-09 2021-08-06 上海交通大学 Character recognition system based on Gabor convolution and linear sparse attention
CN113591546A (en) * 2021-06-11 2021-11-02 中国科学院自动化研究所 Semantic enhanced scene text recognition method and device
CN113591546B (en) * 2021-06-11 2023-11-03 中国科学院自动化研究所 Semantic enhancement type scene text recognition method and device
CN114463675A (en) * 2022-01-11 2022-05-10 北京市农林科学院信息技术研究中心 Underwater fish group activity intensity identification method and device

Also Published As

Publication number Publication date
CN112070114B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN111753827B (en) Scene text recognition method and system based on semantic enhancement encoder and decoder framework
CN112070114A (en) Scene character recognition method and system based on Gaussian constraint attention mechanism network
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN109948691B (en) Image description generation method and device based on depth residual error network and attention
CN110533044B (en) Domain adaptive image semantic segmentation method based on GAN
CN107239801A (en) Video attribute represents that learning method and video text describe automatic generation method
CN106570464A (en) Human face recognition method and device for quickly processing human face shading
CN111967471A (en) Scene text recognition method based on multi-scale features
CN114022882B (en) Text recognition model training method, text recognition device, text recognition equipment and medium
CN114998673B (en) Dam defect time sequence image description method based on local self-attention mechanism
CN113221879A (en) Text recognition and model training method, device, equipment and storage medium
CN113807340B (en) Attention mechanism-based irregular natural scene text recognition method
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN109766918A (en) Conspicuousness object detecting method based on the fusion of multi-level contextual information
CN115132201A (en) Lip language identification method, computer device and storage medium
CN116206314A (en) Model training method, formula identification method, device, medium and equipment
CN111144407A (en) Target detection method, system, device and readable storage medium
CN117058266B (en) Handwriting word generation method based on skeleton and outline
Li Research on methods of english text detection and recognition based on neural network detection model
CN111814508A (en) Character recognition method, system and equipment
CN111259197A (en) Video description generation method based on pre-coding semantic features
CN115496134A (en) Traffic scene video description generation method and device based on multi-modal feature fusion
CN113095319B (en) Multidirectional scene character detection method and device based on full convolution angular point correction network
CN113722536B (en) Video description method based on bilinear adaptive feature interaction and target perception
CN114495076A (en) Character and image recognition method with multiple reading directions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant