CN110717336A - Scene text recognition method based on semantic relevance prediction and attention decoding - Google Patents

Scene text recognition method based on semantic relevance prediction and attention decoding Download PDF

Info

Publication number
CN110717336A
CN110717336A CN201910898753.1A CN201910898753A CN110717336A CN 110717336 A CN110717336 A CN 110717336A CN 201910898753 A CN201910898753 A CN 201910898753A CN 110717336 A CN110717336 A CN 110717336A
Authority
CN
China
Prior art keywords
semantic
neural network
network model
deep neural
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910898753.1A
Other languages
Chinese (zh)
Inventor
陈晓雪
金连文
王天玮
毛慧芸
朱远志
罗灿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910898753.1A priority Critical patent/CN110717336A/en
Publication of CN110717336A publication Critical patent/CN110717336A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a scene text recognition method based on semantic relevance prediction and attention decoding, which comprises the following steps: s1, data acquisition: acquiring a synthetic training data set, a real evaluation data set and a common root statistical table; the common root statistical table is used as semantic guidance; s2, data processing: stretching and transforming the synthetic training data set and the real evaluation data set to a uniform standard; s3, deep neural network model training, S4, scene text recognition: and inputting the scene text image to be recognized into a deep neural network model, accurately recognizing the scene text image to be recognized by the deep neural network model, and returning a string of characters as a recognition result. The semantic relevancy prediction module of the invention takes the root of word statistical table as semantic guidance to provide more accurate high-order prior information guidance for a semantic attention mechanism, and the learned parameters can be more suitable for the image characteristics of real scene texts, and the recognition accuracy is higher.

Description

Scene text recognition method based on semantic relevance prediction and attention decoding
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a scene text recognition method based on semantic relevancy prediction and attention decoding.
Background
The text is rich in a large amount of accurate and rich semantic information, and the information is suitable for a plurality of practical application scenes, such as intelligent retrieval, automatic driving, auxiliary equipment for visually impaired people and the like. Thus, scene text recognition is one of the long-standing research topics in the field of computer vision. Unlike optical character recognition in scanned documents, scene text recognition is very challenging because of the variety of text fonts, low image resolution, and susceptibility of images to light and shadow variations. In recent years, with the rapid development of deep neural networks, the innovative application of artificial intelligence technology is greatly promoted. The deep neural network model, particularly the deep neural network model based on the attention mechanism, achieves better performance in scene text recognition. The recognition network based on the attention mechanism focuses on text regions, and high-order prior information of adjacent characters is embedded implicitly, so that a high-order statistical language model is provided for a subsequent transcription process, and the recognition performance is improved. However, the attention mechanism widely used in existing scene text recognition lacks the selectivity of high-order prior information. The method provides equally important prior information guidance for all recognition situations, so that the relevance of characters with strong semantics is not weakened, and the relevance of irrelevant characters is enhanced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a scene text recognition method based on semantic relevancy prediction and attention decoding, which has high recognition precision, no additional computational overhead in a test stage and high recognition speed.
The purpose of the invention is realized by the following technical scheme:
a scene text recognition method based on semantic relatedness prediction and attention decoding comprises the following steps:
s1, data acquisition: acquiring a synthetic training data set, a real evaluation data set and a common root statistical table; the common root statistical table is used as semantic guidance;
s2, data processing: stretching and transforming the synthetic training data set and the real evaluation data set to a uniform standard;
s3, deep neural network model training: inputting a unified and standard synthetic training data set, corresponding tagged text data and a common root statistical table into a deep neural network model for training, wherein the tagged text data and semantic guidance are adopted for supervised parameter learning in the training process; the deep neural network model comprises a semantic relevancy prediction module and a semantic attention mechanism decoding module;
s4, scene text recognition: and inputting the scene text image to be recognized into a deep neural network model, accurately recognizing the scene text image to be recognized by the deep neural network model, and returning a string of characters as a recognition result.
Preferably, the scene texts in the synthetic training data set and the real evaluation data set occupy more than two thirds of the area of the scene text image, the text part of the synthetic training data set comprises N different font styles, N is more than or equal to 2, and the real evaluation data set is obtained by shooting by a camera; the common root statistical table comprises 707 common roots, and the root length range is between 2 and 10 characters.
Preferably, the operation of stretch-transforming in step S2 is a bilinear interpolation or downsampling operation.
Preferably, step S3 includes:
s31, constructing a deep neural network model;
s32, setting parameters during the deep neural network model training; wherein, the iteration times are as follows: 1,000,000, optimizer: adapelta, learning rate: 1.0;
and S33, training the deep neural network under the set initialization parameters.
Preferably, the model structure of the deep neural network model is table 1:
TABLE 1 model Structure of deep neural network model
Figure BDA0002211119430000031
TABLE 2 model Structure of residual layer
Figure BDA0002211119430000032
The model structure of the residual error layer in the model structure of the deep neural network model is shown in table 2, and nonlinear layers in the residual error layer all adopt a ReLU activation function; the downsampling layer is implemented by a convolution layer and a batch normalization layer.
Preferably, step S4 includes: the method comprises the steps that a scene text image to be recognized obtains high-level feature expression with robustness through a deep convolutional neural network model, and a semantic relevancy prediction module predicts to obtain semantic relevancy parameters of adjacent characters by taking a common root statistical table as semantic guidance; and the semantic attention mechanism decoding module performs transcription and correction according to the adjacent character semantic relatedness parameter and the high-level feature expression of the text image to obtain a string of characters as a recognition result.
Preferably, the steps S3 and S4 further include: testing a deep neural network model; the deep neural network model test comprises the following steps: inputting the real evaluation data set into a deep neural network model, accurately identifying the real evaluation data set by the deep neural network model, and returning a string of characters as an identification result; and if the recognition result is consistent with the labeled text data corresponding to the real evaluation data set, the recognition capability of the deep neural network model reaches the preset requirement.
Compared with the prior art, the invention has the following advantages:
(1) the deep neural network model comprises a semantic correlation degree prediction module and a semantic attention mechanism decoding module; the semantic relevancy prediction module takes the root statistical table as semantic guidance, obtains semantic relevancy parameters of adjacent characters through prediction, provides more accurate high-order prior information guidance for a semantic attention mechanism, obtains the parameters through learning more suitable for image characteristics of real scene texts, and is higher in recognition accuracy.
(2) The semantic attention mechanism only depends on a common root statistical table as semantic guidance, and the semantic relevancy marking information does not need a manual marking process, so that a large amount of manpower and material resources are saved; in practical application, the identification accuracy can be effectively improved.
(3) And a back propagation algorithm is adopted, and the convolution kernel parameters are automatically adjusted, so that a more robust filter is obtained, and the method can adapt to application scenes such as image blurring, perspective transformation, light change and the like.
(4) Compared with a manual mode, the scheme can automatically complete scene text recognition, and manpower and material resources can be saved.
(5) Compared with the traditional attention mechanism method based on computer vision, the method selectively constructs semantic relevance, and has the characteristics of simplicity in implementation, high recognition precision, no additional computational cost in a test stage, high recognition speed and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating a scene text recognition method based on semantic relatedness prediction and attention decoding according to the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
Referring to fig. 1, a scene text recognition method based on semantic relatedness prediction and attention decoding is characterized by comprising:
s1, data acquisition: acquiring a synthetic training data set, a real evaluation data set and a common root statistical table; the common root statistical table is used as semantic guidance; the scene texts in the synthetic training data set and the real evaluation data set occupy more than two thirds of the area of the scene text image, the text part of the synthetic training data set comprises N different font styles, N is more than or equal to 2, and the synthetic training data set is allowed to cover certain degree of light and shadow change and resolution change. The real evaluation data set is obtained by shooting through a camera; in the shooting process, the text in the normalized scene text image occupies more than two thirds of the image area, and certain inclination and blurring are allowed. The common root statistical table comprises 707 common roots, and the root length range is between 2 and 10 characters. The training data set and the real evaluation data set both cover various different font styles, light and shadow changes and resolution changes;
the natural scene picture or image refers to a picture or image obtained by an electronic device such as a mobile phone, for example, a street view image such as a street sign or a signboard. Scene character recognition refers to recognizing character information in a natural scene picture. Because characters in natural scene pictures are rich in display forms, the image background is complex, the resolution ratio is low and the like, the difficulty is far higher than that of character recognition in traditional scanned document images.
S2, data processing: stretching and transforming the synthetic training data set and the real evaluation data set to a uniform standard with the size of 32 x 100, so that the depth neural network model can be parallelized conveniently; the operation of stretch transform in step S2 is a bilinear interpolation or downsampling operation.
S3, deep neural network model training: inputting a unified and standard synthetic training data set, corresponding tagged text data and a common root statistical table into a deep neural network model for training, wherein the tagged text data and semantic guidance are adopted for supervised parameter learning in the training process; the deep neural network model comprises a semantic relevancy prediction module and a semantic attention mechanism decoding module; the semantic relevancy prediction module takes a root statistical table as semantic guidance to predict semantic relevancy parameters of adjacent characters; and more accurate high-order prior information guidance is provided for the semantic attention mechanism.
In step S3, the corresponding annotation text data indicates an annotation to a text included in an image in the synthetic training data set. For example, if a street view image contains the word "china", the annotation text data of the image is "china". Each image corresponds to a particular line of annotated text data.
The steps between S3 and S4 further include: testing a deep neural network model; the deep neural network model test comprises the following steps: inputting the real evaluation data set into a deep neural network model, accurately identifying the real evaluation data set by the deep neural network model, and returning a string of characters as an identification result; and if the recognition result is consistent with the labeled text data corresponding to the real evaluation data set, the recognition capability of the deep neural network model reaches the preset requirement.
S4, scene text recognition: and inputting the scene text image to be recognized into a deep neural network model, accurately recognizing the scene text image to be recognized by the deep neural network model, and returning a string of characters as a recognition result.
It should be noted that, the steps of testing the deep neural network model and recognizing the scene text are consistent, and the difference between the two steps is that the images input into the deep neural network model are different. The deep neural network model test inputs a text image of a real evaluation data set, wherein texts in the text image of the real evaluation data set are known in advance. And if the recognition result is consistent with the text in the text image known in advance after the deep neural network model is recognized, the recognition capability of the deep neural network model is good. The input of the scene text recognition is a scene text image to be recognized, the scene text image to be recognized is input into a deep neural network model with good testing recognition capability, the deep neural network model recognizes the scene text image to be recognized, and a string of characters is returned to serve as a text in the scene text image to be recognized.
Further, step S4 includes: the method comprises the steps that a scene text image to be recognized obtains high-level feature expression with robustness through a deep convolutional neural network model, and a semantic relevancy prediction module predicts to obtain semantic relevancy parameters of adjacent characters by taking a common root statistical table as semantic guidance; and the semantic attention mechanism decoding module performs transcription and correction according to the adjacent character semantic relatedness parameter and the high-level feature expression of the text image to obtain a string of characters as a recognition result.
In the present embodiment, step S3 includes:
s31, constructing a deep neural network model;
s32, setting parameters during the deep neural network model training; wherein, the iteration times are as follows: 1,000,000, optimizer: adapelta, learning rate: 1.0;
and S33, training the deep neural network under the set initialization parameters.
The model structure of the deep neural network model is shown in table 1:
TABLE 1 model Structure of deep neural network model
Figure BDA0002211119430000081
TABLE 2 model Structure of residual layer
The model structure of the residual error layer in the model structure of the deep neural network model is shown in table 2, and nonlinear layers in the residual error layer all adopt a ReLU activation function; the downsampling layer is implemented by a convolution layer and a batch normalization layer. And finally, the step size of the 3 layers of residual layers is changed from 2 x 2 to 2 x 1, so that the method is more suitable for the aspect ratio requirement of the scene text image and is convenient for extracting robust spatial features.
The semantic relevancy prediction module takes a common root statistical table as semantic guidance and provides more accurate high-order prior information guidance for a semantic attention mechanism. After statistics, the common root statistical table after removing the repeated roots and the single letter roots contains 707 common roots in total. The root length is mainly distributed between 2-10 characters, wherein the root ratio of 3-4 characters is the largest, and is about 71.99%, and typical roots are like 'ing' and 'ane'. Very few roots exceed 8 characters.
Given the input picture I and the real annotation information g ═ g (g)1,g2...gL) By means of symbols
Figure BDA0002211119430000091
Score gamma representing higher order prior informationtThe real annotation information.The value of (d) represents the semantic relevance between adjacent characters. Vector quantity
Figure BDA0002211119430000093
The length is L-1. Then gamma istThe construction process of the labeling information is as follows:
let the scene text picture label information be "information", and the character length be 11 characters, so
Figure BDA0002211119430000094
The character length is 10 characters. If two adjacent characters form the root, thenIncreases by 1 and vice versa by 0. The annotation information "contains 7 roots in total, which are 'at', 'position', 'or', 'for', 'form', 'in' and 'ion', respectively, and the above process is repeated to obtain the final high-level semantic vector
Figure BDA0002211119430000096
Is [1, 0, 2, 3, 1,0, 2, 1, 2]. In the course of the deep neural network training process,
Figure BDA0002211119430000097
is normalized to the interval [0, 1 ]]. The process does not require manual labeling.
Further defining a semantic prior loss function LpIn order to realize the purpose,
Figure BDA0002211119430000098
where mselos represents the mean square error between the predicted value and the true tag.
And the semantic attention mechanism decoding module carries out targeted transcription and correction according to the semantic relevancy parameters and the high-level feature expression of the text image obtained through deep convolutional neural network processing to obtain a string of character recognition results.
By Fe(I)=(h1,h2...hn) Representing a deep convolutional neural network encoding process, a semantic attention mechanism-based decoding module is used to convert the prediction sequence y to (y)1,y2...yT) And true notation g ═ g (g)1,g2...gL) And (4) aligning. The letter T represents the maximum decoding step length, and at the time T, the output y of the depth recognition modeltCan be expressed as a number of times as,
yt=Softmax(Wost+bo), (2)
wherein s istRepresenting the Gated secure unit (GRU) hidden layer state at time t. GRU is a variant of recurrent neural networks, often used to model long-term semantic dependencies of text sequences. stThe way in which (a) is calculated is expressed as,
st=GRU((p′t,ct),st-1). (3)
p′trepresenting the last bit output yt-1Is different from p 'of the traditional attention mechanism and the semantic attention mechanism'tAs shown in the following description, alternatively,
p′t=γtpt, (4)
wherein gamma istReflects the adjacent character string ytAnd yt-1The degree of correlation of (c). Gamma raytA larger value of (A) represents a stronger semantic correlation between adjacent characters, whereas γtThe smaller the value of (a), the weaker the semantic correlation between adjacent characters. When gamma istWhen 0, it means that there is no semantic correlation between adjacent characters. Accordingly, γtThe way of calculating (a) is as follows,
γt=femb(ct,ct-1), (5)
further, a priori function fembThe calculation method is that,
femb(ct,ct-1)=σ(VcTanh(Wpct-1+Wcct+bc), (6)
wherein, σ is an activation function Sigmoid function and symbol ctRepresenting a semantic vector, represented by the weighted sum of features,
Figure BDA0002211119430000111
the symbol N represents the length of the feature vector. Alpha is alphat,jIs a weight vector for the attention mechanism, generally denoted,
et,j=fattn(st-1,hj). (9)
wherein the alignment function fattnThe calculation method is as follows: a
fattn(st-1,hj)=VaTanh(Wsst-1+Wfhj+b). (10)
W mentioned aboveo,bo,Va,Ws,Wf,b,Vc,Wp,WcAnd bcAre all learnable parameters. When the recognition model predicts the terminator EOS, the semantic attention mechanism decoding module finishes the transcriptionThe process.
Notation L for attention mechanism loss functionattnIt is expressed in the following way,
Figure BDA0002211119430000121
where θ represents all learnable parameters of the deep neural network model.
Semantic prior loss function L provided by combining semantic relevance prediction modulepAnd the final optimization function of the deep network identification model is defined as,
L=Lattn+λLp. (12)
wherein the hyper-parameter λ is used to balance the attention mechanism loss function and the semantic prior loss function. The constant 1 was set during the experiment.
In the network model training, a back propagation algorithm is adopted, and all parameters of the network model are updated by calculating a transfer gradient from the last layer and transferring layer by layer. The training strategy adopts a supervision mode: and training a universal deep network recognition model by utilizing the artificially synthesized image data, the corresponding annotation information and the root table. The input of the recognition model is a standard scene text image, the output is a character sequence in the image, and the training loss function is the aforementioned L.
The scene text recognition of the scheme can be used for automatic recognition of the guideboard, intelligent retrieval, storage of image data and the like.
The method for recognizing the scene text based on the semantic relevance prediction and the attention decoding fully utilizes the semantic guidance capability of the common root table, is based on the antagonistic learning capability of the deep network model and the physical significance of the back propagation residual error, and provides an accurate scene text recognition method based on the semantic relevance prediction and the attention decoding through the distribution of learning data samples. The method has the characteristics of simple realization, high identification precision, no additional computational cost in the test stage, high identification speed and the like, and has better practical value.
The above-mentioned embodiments are preferred embodiments of the present invention, and the present invention is not limited thereto, and any other modifications or equivalent substitutions that do not depart from the technical spirit of the present invention are included in the scope of the present invention.

Claims (7)

1. A scene text recognition method based on semantic relatedness prediction and attention decoding is characterized by comprising the following steps:
s1, data acquisition: acquiring a synthetic training data set, a real evaluation data set and a common root statistical table;
s2, data processing: stretching and transforming the synthetic training data set and the real evaluation data set to a uniform standard;
s3, deep neural network model training: inputting a unified and standardized synthetic training data set, corresponding label text data and a common root statistical table into a deep neural network model for training, wherein the deep neural network model comprises a semantic correlation degree prediction module and a semantic attention mechanism decoding module; the semantic relevancy prediction module takes a root statistical table as semantic guidance to predict semantic relevancy parameters of adjacent characters;
s4, scene text recognition: and inputting the scene text image to be recognized into a deep neural network model, accurately recognizing the scene text image to be recognized by the deep neural network model, and returning a string of characters as a recognition result.
2. The scene text recognition method based on semantic relatedness prediction and attention decoding as claimed in claim 1, wherein: scene texts in the synthetic training data set and the real evaluation data set occupy more than two thirds of the area of a scene text image, the text part of the synthetic training data set comprises N different font styles, N is more than or equal to 2, and the real evaluation data set is obtained by shooting by a camera; the common root statistical table comprises 707 common roots, and the root length range is between 2 and 10 characters.
3. The scene text recognition method based on semantic relatedness prediction and attention decoding as claimed in claim 1, wherein: the operation of stretch transform in step S2 is a bilinear interpolation or downsampling operation.
4. The scene text recognition method based on semantic relatedness prediction and attention decoding as claimed in claim 1, wherein step S3 comprises:
s31, constructing a deep neural network model;
s32, setting parameters during the deep neural network model training; wherein, the iteration times are as follows: 1,000,000, optimizer: adapelta, learning rate: 1.0;
and S33, training the deep neural network under the set initialization parameters.
5. The scene text recognition method based on semantic relatedness prediction and attention decoding as claimed in claim 4, wherein the model structure of the deep neural network model is as shown in Table 1:
TABLE 1 model Structure of deep neural network model
Figure FDA0002211119420000021
TABLE 2 model Structure of residual layer
Figure FDA0002211119420000022
The model structure of the residual error layer in the model structure of the deep neural network model is shown in table 2, and nonlinear layers in the residual error layer all adopt a ReLU activation function; the downsampling layer is implemented by a convolution layer and a batch normalization layer.
6. The scene text recognition method based on semantic relatedness prediction and attention decoding as claimed in claim 1, wherein step S4 comprises:
the method comprises the steps that a scene text image to be recognized obtains high-level feature expression with robustness through a deep convolutional neural network model, and a semantic relevancy prediction module predicts to obtain semantic relevancy parameters of adjacent characters by taking a common root statistical table as semantic guidance; and the semantic attention mechanism decoding module performs transcription and correction according to the adjacent character semantic relatedness parameter and the high-level feature expression of the text image to obtain a string of characters as a recognition result.
7. The method for scene text recognition based on semantic relevance prediction and attention decoding as claimed in claim 1, further comprising between steps S3 and S4: testing a deep neural network model;
the deep neural network model test comprises the following steps: inputting the real evaluation data set into a deep neural network model, accurately identifying the real evaluation data set by the deep neural network model, and returning a string of characters as an identification result; and if the recognition result is consistent with the labeled text data corresponding to the real evaluation data set, the recognition capability of the deep neural network model reaches the preset requirement.
CN201910898753.1A 2019-09-23 2019-09-23 Scene text recognition method based on semantic relevance prediction and attention decoding Pending CN110717336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910898753.1A CN110717336A (en) 2019-09-23 2019-09-23 Scene text recognition method based on semantic relevance prediction and attention decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910898753.1A CN110717336A (en) 2019-09-23 2019-09-23 Scene text recognition method based on semantic relevance prediction and attention decoding

Publications (1)

Publication Number Publication Date
CN110717336A true CN110717336A (en) 2020-01-21

Family

ID=69210752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910898753.1A Pending CN110717336A (en) 2019-09-23 2019-09-23 Scene text recognition method based on semantic relevance prediction and attention decoding

Country Status (1)

Country Link
CN (1) CN110717336A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428593A (en) * 2020-03-12 2020-07-17 北京三快在线科技有限公司 Character recognition method and device, electronic equipment and storage medium
CN111783705A (en) * 2020-07-08 2020-10-16 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN111860116A (en) * 2020-06-03 2020-10-30 南京邮电大学 Scene identification method based on deep learning and privilege information
CN111967471A (en) * 2020-08-20 2020-11-20 华南理工大学 Scene text recognition method based on multi-scale features
CN111967470A (en) * 2020-08-20 2020-11-20 华南理工大学 Text recognition method and system based on decoupling attention mechanism
CN112990196A (en) * 2021-03-16 2021-06-18 北京大学 Scene character recognition method and system based on hyper-parameter search and two-stage training
CN113553885A (en) * 2020-04-26 2021-10-26 复旦大学 Natural scene text recognition method based on generation countermeasure network
CN113673507A (en) * 2020-08-10 2021-11-19 广东电网有限责任公司 Electric power professional equipment nameplate recognition algorithm
CN113743291A (en) * 2021-09-02 2021-12-03 南京邮电大学 Method and device for detecting text in multiple scales by fusing attention mechanism
CN118072973A (en) * 2024-04-15 2024-05-24 慧医谷中医药科技(天津)股份有限公司 Intelligent inquiry method and system based on medical knowledge base

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428593A (en) * 2020-03-12 2020-07-17 北京三快在线科技有限公司 Character recognition method and device, electronic equipment and storage medium
CN113553885A (en) * 2020-04-26 2021-10-26 复旦大学 Natural scene text recognition method based on generation countermeasure network
CN111860116A (en) * 2020-06-03 2020-10-30 南京邮电大学 Scene identification method based on deep learning and privilege information
CN111783705A (en) * 2020-07-08 2020-10-16 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN111783705B (en) * 2020-07-08 2023-11-14 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN113673507A (en) * 2020-08-10 2021-11-19 广东电网有限责任公司 Electric power professional equipment nameplate recognition algorithm
CN111967471A (en) * 2020-08-20 2020-11-20 华南理工大学 Scene text recognition method based on multi-scale features
CN111967470A (en) * 2020-08-20 2020-11-20 华南理工大学 Text recognition method and system based on decoupling attention mechanism
CN112990196A (en) * 2021-03-16 2021-06-18 北京大学 Scene character recognition method and system based on hyper-parameter search and two-stage training
CN112990196B (en) * 2021-03-16 2023-10-24 北京大学 Scene text recognition method and system based on super-parameter search and two-stage training
CN113743291A (en) * 2021-09-02 2021-12-03 南京邮电大学 Method and device for detecting text in multiple scales by fusing attention mechanism
CN113743291B (en) * 2021-09-02 2023-11-07 南京邮电大学 Method and device for detecting texts in multiple scales by fusing attention mechanisms
CN118072973A (en) * 2024-04-15 2024-05-24 慧医谷中医药科技(天津)股份有限公司 Intelligent inquiry method and system based on medical knowledge base

Similar Documents

Publication Publication Date Title
CN110717336A (en) Scene text recognition method based on semantic relevance prediction and attention decoding
CN110083831B (en) Chinese named entity identification method based on BERT-BiGRU-CRF
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111967471A (en) Scene text recognition method based on multi-scale features
Mathew et al. Benchmarking scene text recognition in Devanagari, Telugu and Malayalam
CN111967470A (en) Text recognition method and system based on decoupling attention mechanism
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN111950528B (en) Graph recognition model training method and device
CN112819686A (en) Image style processing method and device based on artificial intelligence and electronic equipment
CN112257716A (en) Scene character recognition method based on scale self-adaption and direction attention network
CN110472248A (en) A kind of recognition methods of Chinese text name entity
CN114492646A (en) Image-text matching method based on cross-modal mutual attention mechanism
Wu et al. STR transformer: a cross-domain transformer for scene text recognition
Selvam et al. A transformer-based framework for scene text recognition
CN113886615A (en) Hand-drawn image real-time retrieval method based on multi-granularity association learning
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN111242114B (en) Character recognition method and device
CN110909645B (en) Crowd counting method based on semi-supervised manifold embedding
CN117292126A (en) Building elevation analysis method and system using repeated texture constraint and electronic equipment
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN115546553A (en) Zero sample classification method based on dynamic feature extraction and attribute correction
CN114694133A (en) Text recognition method based on combination of image processing and deep learning
CN113362088A (en) CRNN-based telecommunication industry intelligent customer service image identification method and system
CN114298047A (en) Chinese named entity recognition method and system based on stroke volume and word vector
CN113361277A (en) Medical named entity recognition modeling method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200121

RJ01 Rejection of invention patent application after publication