CN113065432A - Handwritten Mongolian recognition method based on data enhancement and ECA-Net - Google Patents

Handwritten Mongolian recognition method based on data enhancement and ECA-Net Download PDF

Info

Publication number
CN113065432A
CN113065432A CN202110306372.7A CN202110306372A CN113065432A CN 113065432 A CN113065432 A CN 113065432A CN 202110306372 A CN202110306372 A CN 202110306372A CN 113065432 A CN113065432 A CN 113065432A
Authority
CN
China
Prior art keywords
mongolian
image
handwritten
channel
data enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110306372.7A
Other languages
Chinese (zh)
Inventor
仁庆道尔吉
麻泽蕊
尹玉娟
程坤
李媛
苏依拉
李雷孝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202110306372.7A priority Critical patent/CN113065432A/en
Publication of CN113065432A publication Critical patent/CN113065432A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

A handwritten Mongolian recognition method based on data enhancement and ECA-Net is characterized in that data enhancement is carried out on an existing Mongolian handwritten recognition database by utilizing elastic deformation data enhancement and/or random erasure data enhancement to obtain an enhanced database; acquiring a picture of handwritten Mongolian as an input image, and performing feature extraction on the input image by using a residual error network comprising a high-efficiency channel attention module to obtain a feature map; vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database. According to the method, the Mongolian handwriting recognition database with richer forms is obtained by utilizing elastic deformation data enhancement and random erasure data enhancement, and meanwhile, the robustness of the model to shielding is improved. By using the efficient channel attention module, dimension reduction is avoided, cross-channel interaction information is effectively captured, and finally the recognition system trains the ACE loss function to be combined with GRU, so that inference and back propagation can be faster.

Description

Handwritten Mongolian recognition method based on data enhancement and ECA-Net
Technical Field
The invention belongs to the technical field of character recognition, and particularly relates to a handwritten Mongolian recognition method based on data enhancement and ECA-Net.
Background
With the rapid development of the internet and artificial intelligence, education informatization has started to influence and change the traditional education mode, the scenes of human-computer interaction such as online answering and the like are more and more common, and the handwriting recognition problem becomes a research direction in the field of computer vision. It is a simple matter for humans to recognize handwritten text, but this is very complicated for computers. In recent years, the development of deep convolutional neural networks brings revolutionary changes to the field of computer vision, and the combination of convolutional neural networks and cyclic neural networks has achieved great success in the problem of image-based sequence recognition, and the development of the field of handwriting recognition is promoted. As an important research area of pattern recognition, handwriting recognition has received extensive research and attention from academia. Handwriting recognition research in popular languages (e.g., chinese, english, japanese) has progressed from simple isolated word recognition to the fields of text line recognition, unconstrained handwriting recognition, document recognition, and scene character recognition.
However, the handwriting recognition of Mongolian languages and other small languages starts late, related research is less, Mongolian has the characteristics of huge vocabulary, free writing, serious character deformation and the like, and great challenges are brought to the handwriting recognition of Mongolian.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a handwritten Mongolian recognition method based on data enhancement and ECA-Net, which adopts a random erasing and/or elastic deformation method to perform data enhancement on a handwritten Mongolian database, generates training images with different shielding degrees in the process, can further improve the generalization capability of a neural network, reduces the risk of overfitting, enables a model to have robustness to shielding (erasing), solves the problem of small scale of the directly used Mongolian handwritten recognition database, extracts input image features by ECA-Net to obtain a feature map, vectorizes the feature map finally, and recognizes the handwritten Mongolian by utilizing an enhanced database.
In order to achieve the purpose, the invention adopts the technical scheme that:
a handwritten Mongolian recognition method based on data enhancement and ECA-Net comprises the following steps:
the first step is as follows: performing data enhancement on the existing Mongolian handwriting recognition database by utilizing elastic deformation data enhancement and/or random erasure data enhancement to obtain an enhanced database;
the second step is that: acquiring a picture of handwritten Mongolian as an input image, and performing feature extraction on the input image by using a deep convolutional neural network to obtain a feature map, wherein the deep convolutional neural network is a residual error network containing a high-efficiency channel attention module, namely ECA-Net;
the third step: vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database.
The elastic deformation data enhancement is the spatial data enhancement of elastic deformation of an image of a handwritten Mongolian character, and once every enhancement, an enhanced handwritten Mongolian character image is generated, namely, the handwritten Mongolian character subjected to elastic deformation is generated, and the Mongolian corresponding to the original character image is used as a data label of the enhanced Mongolian character image.
The elastic deformation data enhancement is to divide an image into N small blocks on average, initialize 2(N +1) reference points p along the top and bottom boundaries of the image, set a circular transformation area with a radius R by taking the reference points p as the center of a circle, and enhance the image by randomly moving the reference points p to q within the radius R through a similarity transformation based on moving least squares, wherein for any point u in the image, the transformation follows the following expression:
T(u)=(u-p*)M+q*
m is a linear transformation matrix, which is constrained to have M for some scalar λTM=λ2The nature of I;
p*and q is*Weighted centroids of reference point p and reference point q, respectively:
Figure BDA0002987877120000021
pidenotes the ith initialization reference point, qiRepresenting the ith moving reference point, i.e. piA randomly moved reference point; w is aiRepresents the weight of any point u in the image, and the formula is as follows:
Figure BDA0002987877120000031
a is set to a fixed value of 1 when u is close to piThe weight is increased, meaning that u depends mainly on the motion of the nearest reference point.
The random erasing data enhancement is the data enhancement of random erasing to the image of a handwritten Mongolian character, an enhanced handwritten Mongolian character image is generated once enhancement, namely, a handwritten Mongolian character which is randomly erased is generated, and the Mongolian corresponding to the original character image is used as the data label of the enhanced Mongolian character image.
The random erasure data enhancement is a random selection of a rectangular area I in the image IeAnd erasing pixels of the image by using random values to generate training images with different shielding degrees, wherein the steps are as follows:
step 1: inputting an image I with the size S, wherein S is W H, W and H are respectively the width and the height of the image I, and setting an erasing area ratio range [ S [ S ] ]l,Sh]And erase aspect ratio range [ r1,r2]Initializing the erasing probability p to be 0-1;
step 2: randomly selecting rectangular areas I in an image IeAnd erasing its pixels with random values, wherein the rectangular area I is divided intoeArea of (2) is randomly initialized to SeRandomly initializing the erase aspect ratio to re
Figure BDA0002987877120000033
In [ S ]l,Sh]Within the range of reIn [ r ]1,r2]In the range of IeThe area size of (d) is calculated by the following formula:
Figure BDA0002987877120000032
Weand HeIs a randomly erased rectangular area IeLength and width of (d);
and step 3: randomly initializing a point P ═ x in the image Ie,ye),xeAnd yeIs a randomly initialized point coordinate;
and 4, step 4: making a decision on the erased portion if xe+We≤W,ye+HeH is less than or equal to H, then area (x)e,ye,xe+We,ye+He) Set as the selected rectangular area Ie(ii) a Otherwise, repeating the above process until selecting the rectangular area I meeting the requirementeFor a selected rectangular area IeWherein each pixel is assigned to [0, 255 respectively]Is the random value of (1).
The efficient channel attention module executes fast 1D convolution with convolution kernel size k so as to generate each channel weight of the input image, and the convolution kernel size k, namely the coverage of a channel during local cross-channel interaction, is used for determining a channel yiThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, local cross-channel interaction information is captured by considering each channel of the input image and its k neighbors, a parameter matrix WkRepresents the learned channel attention weight, WkExpression (2)The following were used:
Figure BDA0002987877120000041
Wkrelating to k x C parameters, C representing the size of the input image feature matrix, i.e. channel dimension, image channel yiOnly y need to be considered for the weight ofiAnd the information interaction of k adjacent channels, the weight calculation formula is as follows:
Figure BDA0002987877120000042
wherein,
Figure BDA0002987877120000043
denotes yiThe j-th adjacent channel of (a),
Figure BDA0002987877120000044
to represent
Figure BDA0002987877120000045
The weight of (a) is determined,
Figure BDA0002987877120000046
denotes yiK is in direct proportion to C, and the relationship is as follows:
C=φ(k)=2(γ+k-b)
given C, k is adaptively adjusted by the following equation:
Figure BDA0002987877120000047
in the formula, the calculation formula | xoddThe nearest odd number representing x, γ and b, are set to fixed constants, by mapping ψ, and using a non-linear mapping, the high-dimensional channel has a longer range of interaction, while the low-dimensional channel experiences a shorter range of interaction.
The ECA-Net is used for realizing the feature extraction of the input image, so that the dimension reduction can be effectively avoided, and the cross-channel interaction information can be captured. The method is beneficial to obtaining higher precision of character recognition and reducing the complexity of the model. I.e. channel attention can be learned in a more efficient way.
In the third step, a gated cycle unit is combined with an aggregation cross entropy loss function to realize vectorization of the characteristic diagram, and the coding processing process of the gated cycle unit is as follows:
step 1: hidden state h transmitted by last node<t-1>Acquiring two gating states of a reset gate and an update gate from an input x of the current node, and normalizing the acquired information through a sigmoid function to enable the acquired information to serve as a gating signal;
step 2: when the activation operation of the jth hidden unit is performed, the operation of the reset gate is as follows:
rj=σ([Wrx]j+[Urh<t-1>]j)
wherein σ is a logical sigmoid function, [ alpha ], []jThe jth element representing the vector, i.e. the jth hidden unit, WrAnd UrIs the weight matrix learned in the reset gate;
and step 3: the calculation method for updating the gate is as follows:
zj=σ([Wzx]j+[Uzh<t-1>]j)
Wzand UzIs the weight matrix learned in the update gate;
and 4, step 4: the hidden state of the jth hidden unit at the current time slice t is calculated by the following formula:
Figure BDA0002987877120000051
wherein,
Figure BDA0002987877120000052
including the input of the current node and also the last oneAdding the hidden state of the jth hidden unit of the time slice into the current hidden state, wherein the calculation formula is as follows:
Figure BDA0002987877120000053
w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice.
In the GRU, when the update gate is close to 0, the hidden state is forced to ignore the previous hidden state and update with only the current input. This effectively allows the hidden state to discard any information found to be irrelevant in the future, allowing for a more compact representation.
In the third step, a gated cycle unit is combined with an aggregation cross entropy loss function to realize vectorization of the feature diagram, and the decoding processing process of the gated cycle unit is as follows:
step 1: first, the decoder is operative to apply the feature vector c, the output y of the previous time slice<t-1>And the previous hidden node h<t-1>As input, get h<t>The calculation formula is as follows:
h<t>=f(h<t-1>,y<t-1>,c)
f represents the activation function given that a significant probability, e.g. softmax, must be generated.
Step 2: output y of decoder at time t<t>From one to c, y<t-1>And h<t>Is determined as follows:
P(yt∣yt-1,yt-2,…,y1,c)=g(h<t>,yt-1,c)
g represents the given activation function, which must produce a valid probability, e.g. softmax.
The aggregate cross-entropy loss function includes the following stages:
(1) aggregating the probabilities for each label category along a time dimension;
the number of characters in each category predicted by the network is taken as a probability distribution
Figure BDA0002987877120000061
The following were used:
Figure BDA0002987877120000062
ykthe number of kth characters in a prediction result is indicated, and T is the number of all characters;
regarding the number of characters in each category of the actual label as another probability distribution
Figure BDA0002987877120000063
The following were used:
Figure BDA0002987877120000064
Nkthe number of kth type characters in the actual label is indicated;
(2) combining label labeling, and standardizing the aggregation result into probability distribution of all categories;
(3) comparing two probability distributions using cross entropy
The degree of similarity between the predicted result distribution and the actual label distribution is expressed by using an aggregate cross-entropy function, and the function is taken as a loss function of the Mongolian handwriting recognition model, and the loss function is as follows:
Figure BDA0002987877120000065
the aggregation cross entropy loss function is the optimization of CTC, is simpler to realize and can be well suitable for 2D prediction. It can be flattened into a 1D prediction with 2D prediction and applied directly to 2D prediction as input.
Assuming that the output 2-dimensional prediction map has a height H and a width W (not equal to the original size through CNN), the prediction output of the H-th row and W-th column is represented as
Figure BDA0002987877120000066
The probability distribution for each class of character number predicted by the network is as follows:
Figure BDA0002987877120000071
the loss function is expressed as follows:
Figure BDA0002987877120000072
as shown in the above equation, the loss can be calculated by straightening the original 2-dimensional prediction to a 1-dimensional prediction result using the ACE loss function.
Compared with the prior art, the invention has the beneficial effects that:
(1) the Mongolian handwriting recognition database with richer forms can be obtained by utilizing elastic deformation data enhancement and random erasure data enhancement according to the existing Mongolian handwriting database, and meanwhile, the robustness of the model to shielding is improved.
(2) The ECA-Net is selected and used by the feature extraction network, and the efficient attention mechanism module is used, so that dimension reduction is avoided, and cross-channel interaction information is effectively captured.
(3) In training the final Mongolian handwriting recognition system, choosing to use the ACE loss function in combination with GRU, inference and back propagation can be faster, and the ACE loss function can adapt to the 2D prediction problem by flattening the 2D prediction to a 1D prediction.
Drawings
FIG. 1 is a block diagram of a data enhancement and ECA-Net based method for Mongolian handwriting recognition according to the present invention.
FIG. 2 is a schematic diagram of the ECA-Net structure.
FIG. 3 is a schematic diagram of a long-term memory network (LSTM) structure.
Fig. 4 is a schematic diagram of a gated cycle unit (GRU) structure.
FIG. 5 is a schematic diagram of the structure of an implementation of the Aggregate Cross Entropy (ACE) loss function.
FIG. 6 is a test set image of a handwritten Mongolian portion of the invention
FIG. 7 shows partial test results of the handwritten Mongolian recognition model
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
As shown in FIG. 1, the invention is a handwritten Mongolian recognition method based on data enhancement and ECA-Net, comprising the following steps:
the first step is as follows: the method is characterized in that the limited data are enhanced from the perspective of spatial transformation and/or noise addition, namely, the data of the existing Mongolian handwriting recognition database is enhanced by utilizing elastic deformation data enhancement and/or random erasing data enhancement to obtain an enhanced database, Mongolian handwriting recognition data with richer forms is obtained, and a data basis is provided for improving the recognition capability of a model and the robustness of shielding.
1. Elastic deformation data enhancement
The elastic deformation data enhancement starts from the spatial transformation of an image, the existing data is enhanced, the principle of the method is that the text image generated by elastic deformation adopts the similar deformation based on the moving least square method as a transformation strategy, namely, the similar transformation is carried out, and the purpose is to increase the diversity of each character in a text character string. Specifically, the image of a handwritten Mongolian character is subjected to elastic deformation spatial data enhancement, and once enhanced, an enhanced handwritten Mongolian character image is generated, namely, the handwritten Mongolian character subjected to elastic deformation is generated, and the Mongolian corresponding to the original character image is used as the data label of the enhanced Mongolian character image.
The elastic deformation data enhancement needs more customized reference points to carry out elastic deformation, and the specific implementation process is as follows:
equally dividing the image into N small blocks, initializing 2(N +1) reference points p along the top and bottom boundaries of the image, setting a circular transformation area with radius R around the reference point p, and enhancing the image by randomly moving the reference point p to q within the radius R through a similarity transformation based on moving least squares, wherein for any point u in the image, the transformation follows the following expression:
T(u)=(u-p*)M+q*
m is a linear transformation matrix, which is constrained to have M for some scalar λTM=λ2The nature of I;
p*and q is*Weighted centroids of reference point p and reference point q, respectively:
Figure BDA0002987877120000081
pidenotes the ith initialization reference point, qiRepresenting the ith moving reference point, i.e. piA randomly moved reference point; w is aiRepresents the weight of any point u in the image, and the formula is as follows:
Figure BDA0002987877120000091
a is set to a fixed value of 1 when u is close to piThe weight is increased, meaning that u depends mainly on the motion of the nearest reference point.
2. Random erasure data enhancement
The random erasure data enhancement starts with adding noise points, enhances the existing data, enables the model to have robustness to occlusion (erasure), and reduces the overfitting risk. The method is characterized in that random erasing data enhancement is carried out on an image of a handwritten Mongolian character, an enhanced handwritten Mongolian character image is generated once enhancement is carried out, namely, a handwritten Mongolian character which is randomly erased is generated, and Mongolian corresponding to an original character image is used as a data tag of the enhanced Mongolian character image.
The specific implementation process of random erasure data enhancement is as follows:
randomly selecting a rectangular area I in an image IeAnd erasing pixels of the image by using random values to generate training images with different shielding degrees, wherein the steps are as follows:
step 1: inputting an image I with the size S, wherein S is W H, W and H are respectively the width and the height of the image I, and setting an erasing area ratio range [ S [ S ] ]l,Sh]And erase aspect ratio range [ r1,r2]Initializing the erasing probability p to be 0-1;
step 2: randomly selecting rectangular areas I in an image IeAnd erasing its pixels with random values, wherein the rectangular area I is divided intoeArea of (2) is randomly initialized to SeRandomly initializing the erase aspect ratio to re
Figure BDA0002987877120000092
In [ S ]l,Sh]Within the range of reIn [ r ]1,r2]In the range of IeThe area size of (d) is calculated by the following formula:
Figure BDA0002987877120000093
Ie=We*He
Weand HeIs a randomly erased rectangular area IeLength and width of (d);
and step 3: randomly initializing a point P ═ x in the image Ie,ye),xeAnd yeIs a randomly initialized point coordinate;
and 4, step 4: making a decision on the erased portion if xe+We≤W,ye+HeH is less than or equal to H, then area (x)e,ye,xe+We,ye+He) Set as the selected rectangular area Ie(ii) a Otherwise, repeating the above process until selecting the rectangular area I meeting the requirementeFor a selected rectangular area IeWherein each pixel is assigned to [0, 255 respectively]Is the random value of (1).
The Mongolian handwriting image is enhanced by adopting the two operation modes, and the Mongolian handwriting image with richer and more diversified forms can be generated.
The second step is that: the method comprises the steps of obtaining a picture of the handwritten Mongolian as an input image, and extracting features of the input image by using a deep convolutional neural network to obtain a feature map, wherein the deep convolutional neural network is a residual error network (ECA-Net) containing an Efficient Channel Attention (ECA) module. In the feature extraction stage, the efficient channel attention module is introduced to avoid dimension reduction and effectively capture cross-channel interaction information. The method is beneficial to obtaining higher precision of character recognition, and simultaneously reduces the complexity of the model, namely, effective channel attention can be learned in a more effective mode.
Specifically, referring to fig. 2, the efficient channel attention module performs fast 1D convolution with a convolution kernel size k, which is the coverage of a channel during local cross-channel interaction, to generate each channel weight of the input image, and determines a channel yiThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, the efficient channel attention module captures local cross-channel interaction information, a parameter matrix W, by considering each channel of the input image and its k neighborskRepresents the learned channel attention weight, WkThe expression of (a) is as follows:
Figure BDA0002987877120000101
Wkrelating to k x C parameters, C representing the size of the input image feature matrix, i.e. the channel dimension, and image channels y for avoiding complete independence of different channelsiOnly y need to be considered for the weight ofiAnd the information interaction of k adjacent channels, the weight calculation formula is as follows:
Figure BDA0002987877120000111
wherein,
Figure BDA0002987877120000112
denotes yiThe j-th adjacent channel of (a),
Figure BDA0002987877120000113
to represent
Figure BDA0002987877120000114
The weight of (a) is determined,
Figure BDA0002987877120000115
denotes yiK is in direct proportion to C, and the relationship is as follows:
C=φ(k)=2(γ+k-b)
given C, k is adaptively adjusted by the following equation:
Figure BDA0002987877120000116
in the formula, the calculation formula | xoddThe nearest odd number representing x, γ and b, are set to fixed constants, by mapping ψ, and using a non-linear mapping, the high-dimensional channel has a longer range of interaction, while the low-dimensional channel experiences a shorter range of interaction.
The ECA-Net is used for realizing the feature extraction of the input image, so that the dimension reduction can be effectively avoided, and the cross-channel interaction information can be captured. The method is beneficial to obtaining higher precision of character recognition and reducing the complexity of the model. I.e. channel attention can be learned in a more efficient way.
The third step: vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database.
Specifically, the invention combines a gated cycle unit (GRU) with An (ACE) loss function to realize vectorization of a characteristic diagram, and further completes the recognition task of the Mongolian script of the handwritten body, namely, the gated cycle unit (GRU) is combined with Aggregation Cross Entropy (ACE) to construct a sequence recognition neural network, and the characteristic serialization and the sequence recognition are completed. In the process of character recognition, the GRU is used for easier training, so that the training difficulty can be reduced to a great extent, and the training efficiency is improved; the aggregation cross entropy loss function mainly aims at sequence identification, and is optimization of CTC (ConnectionTestumorallClassification) and an attention mechanism, the aggregation cross entropy loss function does not consider the sequence among characters in a sequence, only considers the occurrence frequency of a certain class of characters in a character string, and meanwhile, the ACE loss function is simple to realize and can be well suitable for the 2D prediction problem by flattening the 2D prediction into the 1D prediction.
Referring to fig. 3 and 4, GRUs have unique gating states compared to normal RNNs. The update gate (z) functions like the forget and input gate of the LSTM. It decides which information to discard and which new information to add. In combination with the above, the operation of this step is to forget the transmitted h(t-1)And adding some dimension information input by the current node.
The reset gate (r) is another gate used to decide how much past information to forget, reset the hidden state and combine it with the input of the current time slice, and perform normalization. The reset gate purposely adds the current input to the current hidden state, corresponding to "remembering the state at the current time". Similar to the selective memory phase of LSTM.
1. The coding process of the gated cyclic unit is as follows:
step 1: hidden state h transmitted by last node<t-1>Acquiring two gating states of a reset gate and an update gate from an input x of the current node, and normalizing the acquired information through a sigmoid function to enable the acquired information to serve as a gating signal;
step 2: when the activation operation of the jth hidden unit is performed, the operation of the reset gate is as follows:
rj=σ([Wrx]j+[Urh<t-1>]j)
where σ is a logical sigmoid function,[ ]jThe jth element representing the vector, i.e. the jth hidden unit, WrAnd UrIs the weight matrix learned in the reset gate;
and step 3: the calculation method for updating the gate is as follows:
zj=σ([Wzx]j+[Uzh<t-1>]j)
Wzand UzIs the weight matrix learned in the update gate;
and 4, step 4: the hidden state of the jth hidden unit at the current time slice t is calculated by the following formula:
Figure BDA0002987877120000121
wherein,
Figure BDA0002987877120000122
the hidden state of the jth hidden unit of the previous time slice is added to the current hidden state in a targeted manner, and the calculation formula is as follows:
Figure BDA0002987877120000123
w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice. In the GRU, when the update gate is close to 0, the hidden state is forced to ignore the previous hidden state and update with only the current input. This effectively allows the hidden state to discard any information found to be irrelevant in the future, allowing for a more compact representation.
The decoding process of the gated cyclic unit is as follows:
step 1: first, the decoder is operative to apply the feature vector c, the output y of the previous time slice<t-1>And the previous hidden node h<t-1>As input, get h<t>The calculation formula is as follows:
h<t〉=f(h<t-1>,y<t-1>,c)
f represents the activation function given that a significant probability, e.g. softmax, must be generated.
Step 2: output y of decoder at time t<t>From one to c, y<t-1>And h<t>Is determined by the conditional distribution of (2), and y with the highest conditional probability is selected<t>As an output of the current time, the calculation formula is as follows:
P(yt∣yt-1,yt-2,…,y1,c)=g(h<t>,yt-1,c)
g represents the given activation function, which must produce a valid probability, e.g. softmax.
2. Referring to fig. 5, the aggregate cross-entropy loss function includes the following stages:
(1) aggregating probabilities for each label category along a time dimension
The number of characters in each category predicted by the network is taken as a probability distribution
Figure BDA0002987877120000131
The following were used:
Figure BDA0002987877120000132
ykthe number of kth characters in a prediction result is indicated, and T is the number of all characters;
regarding the number of characters in each category of the actual label as another probability distribution
Figure BDA0002987877120000133
The following were used:
Figure BDA0002987877120000134
Nkthe number of kth type characters in the actual label is indicated;
(2) normalizing the aggregated results to probability distributions of all categories in combination with label labeling
(3) Comparing two probability distributions using cross entropy
The degree of similarity between the predicted result distribution and the actual label distribution is expressed by using an aggregate cross-entropy function, and the function is taken as a loss function of the Mongolian handwriting recognition model, and the loss function is as follows:
Figure BDA0002987877120000141
furthermore, the aggregate cross-entropy loss function can be flattened into a 1D prediction with the 2D prediction and applied directly to the 2D prediction as input.
Assuming that the output 2-dimensional prediction map has a height H and a width W (not equal to the original size through CNN), the prediction output of the H-th row and W-th column is represented as
Figure BDA0002987877120000142
The probability distribution for each class of character number predicted by the network is as follows:
Figure BDA0002987877120000143
the loss function is expressed as follows:
Figure BDA0002987877120000144
as shown in the above equation, the loss can be calculated by straightening the original 2-dimensional prediction to a 1-dimensional prediction result using the ACE loss function.
Referring to fig. 6 and 7, a specific handwritten montage recognition case is depicted.
In the construction of the handwritten Mongolian recognition model, a 20 ten thousand word Mongolian handwriting recognition database was used. The partial handwritten Mongolian test set image is shown in FIG. 6, and the experimental results are shown in FIG. 7. In the experimental result, the first column is a Mongolian label, the second column is a model identification output, and the rightmost column is the accuracy rate of single character/word identification. It can be found that the experimental identification accuracy is higher, and the model training efficiency is improved to some extent. In general, the recognition effect is better.

Claims (10)

1. A handwritten Mongolian recognition method based on data enhancement and ECA-Net is characterized by comprising the following steps:
the first step is as follows: performing data enhancement on the existing Mongolian handwriting recognition database by utilizing elastic deformation data enhancement and/or random erasure data enhancement to obtain an enhanced database;
the second step is that: acquiring a picture of handwritten Mongolian as an input image, and performing feature extraction on the input image by using a deep convolutional neural network to obtain a feature map, wherein the deep convolutional neural network is a residual error network containing a high-efficiency channel attention module, namely ECA-Net;
the third step: vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database.
2. The method for handwritten Mongolian recognition based on data enhancement and ECA-Net as claimed in claim 1, wherein said elastic deformation data enhancement is a spatial data enhancement of elastic deformation of an image of a handwritten Mongolian character, each time of enhancement, an enhanced handwritten Mongolian digital image is generated, i.e. an elastically deformed handwritten Mongolian character is generated, and the Mongolian corresponding to the original digital image is used as its data tag for the enhanced Mongolian image.
3. The method for handwriting Mongolian recognition based on data enhancement and ECA-Net according to claim 2, wherein said elastic deformation data enhancement is to divide the image into N small blocks on average, initialize 2(N +1) reference points p along the top and bottom boundaries of the image, set a circular transformation area with radius R around the reference point p, enhance the image by moving the reference point p to q within radius R at random based on the similarity transformation of moving least squares, wherein for any point u in the image, the transformation follows the following expression:
T(u)=(u-p*)M+q*
m is a linear transformation matrix, which is constrained to have M for some scalar λTM=λ2The nature of I;
p*and q is*Weighted centroids of reference point p and reference point q, respectively:
Figure FDA0002987877110000021
pidenotes the ith initialization reference point, qiRepresenting the ith moving reference point, i.e. piA randomly moved reference point; w is aiRepresents the weight of any point u in the image, and the formula is as follows:
Figure FDA0002987877110000022
a is set to a fixed value of 1 when u is close to piThe weight is increased, meaning that u depends mainly on the motion of the nearest reference point.
4. The method of claim 1, wherein the enhancement of random erasure data is a random erasure of an image of a handwritten Mongolian character, and wherein each enhancement generates a random erasure of a digital image of the handwritten Mongolian character, and wherein the enhanced image of the Mongolian character uses the corresponding Mongolian from the original digital image as its data label.
5. The method of claim 4, wherein the enhancement of the randomly erased data is a random selection of a rectangular area I in the image IeAnd erasing its pixels with random values to generate different shadesTraining images of the gear level, comprising the following steps:
step 1: inputting an image I with the size S, wherein S is W H, W and H are respectively the width and the height of the image I, and setting an erasing area ratio range [ S [ S ] ]l,Sh]And erase aspect ratio range [ r1,r2]Initializing the erasing probability p to be 0-1;
step 2: randomly selecting rectangular areas I in an image IeAnd erasing its pixels with random values, wherein the rectangular area I is divided intoeArea of (2) is randomly initialized to SeRandomly initializing the erase aspect ratio to re
Figure FDA0002987877110000023
In [ S ]l,Sh]Within the range of reIn [ r ]1,r2]In the range of IeThe area size of (d) is calculated by the following formula:
Figure FDA0002987877110000024
Ie=We*He
Weand HeIs a randomly erased rectangular area IeLength and width of (d);
and step 3: randomly initializing a point P ═ x in the image Ie,ye),xeAnd yeIs a randomly initialized point coordinate;
and 4, step 4: making a decision on the erased portion if xe+We≤W,ye+HeH is less than or equal to H, then area (x)e,ye,xe+We,ye+He) Set as the selected rectangular area Ie(ii) a Otherwise, repeating the above process until selecting the rectangular area I meeting the requirementeFor a selected rectangular area IeWherein each pixel is assigned to [0, 255 respectively]Is the random value of (1).
6. The data-based enhancement and EC of claim 1The A-Net handwritten Mongolian recognition method is characterized in that the high-efficiency channel attention module executes fast 1D convolution with convolution kernel size k so as to generate each channel weight of an input image, and the convolution kernel size k, namely the coverage of a channel during local cross-channel interaction, is used for determining a channel yiThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, local cross-channel interaction information is captured by considering each channel of the input image and its k neighbors, a parameter matrix WkRepresents the learned channel attention weight, WkThe expression of (a) is as follows:
Figure FDA0002987877110000031
Wkand k is C, and C represents the size of the characteristic matrix of the input image, namely the channel dimension.
7. The method of claim 6, wherein the image channel y is a Mongolian handwriting recognition method based on data enhancement and ECA-NetiOnly y need to be considered for the weight ofiAnd information interaction of its k adjacent channels, yiThe weight calculation formula of (c) is as follows:
Figure FDA0002987877110000032
wherein,
Figure FDA0002987877110000033
denotes yiThe j-th adjacent channel of (a),
Figure FDA0002987877110000034
to represent
Figure FDA0002987877110000035
The weight of (a) is determined,
Figure FDA0002987877110000036
denotes yiK is in direct proportion to C, and the relationship is as follows:
C=φ(k)=2(γ+k-b)
given C, k is adaptively adjusted by the following equation:
Figure FDA0002987877110000037
in the formula, the calculation formula | xoddThe nearest odd number representing x, γ and b, are set to fixed constants, by mapping ψ, and using a non-linear mapping, the high-dimensional channel has a longer range of interaction, while the low-dimensional channel experiences a shorter range of interaction.
8. The method for identifying handwritten Mongolian based on data enhancement and ECA-Net as claimed in claim 1, wherein in said third step, gate-controlled loop units are combined with aggregate cross entropy loss function to realize vectorization of feature map.
9. The method for identifying handwritten Mongolian based on data enhancement and ECA-Net according to claim 8, wherein said coding process of said gated round robin unit is as follows:
step 1: hidden state h transmitted by last node(t-1)Acquiring two gating states of a reset gate and an update gate from an input x of the current node, and normalizing the acquired information through a sigmoid function to enable the acquired information to serve as a gating signal;
step 2: when the activation operation of the jth hidden unit is performed, the operation of the reset gate is as follows:
rj=σ([Wrx]j+[Urh<t-1>]j)
wherein σ is a logical sigmoid function, [ alpha ], []jTo representThe jth element of the vector, i.e. the jth hidden unit, WrAnd UrIs the weight matrix learned in the reset gate;
and step 3: the calculation method for updating the gate is as follows:
zj=σ([Wzx]j+[Uzh<t-1>]j)
Wzand UzIs the weight matrix learned in the update gate;
and 4, step 4: the hidden state of the jth hidden unit at the current time slice t is calculated by the following formula:
Figure FDA0002987877110000041
wherein,
Figure FDA0002987877110000042
the hidden state of the jth hidden unit of the previous time slice is added to the current hidden state in a targeted manner, and the calculation formula is as follows:
Figure FDA0002987877110000043
w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice;
the decoding process of the gating cycle unit is as follows:
step 1: the decoder outputs the feature vector c, the output y of the previous time slice<t-1>And the previous hidden node h<t-1>As input, get h<t>The calculation formula is as follows:
h<t>=f(h<t-1>,y<t-1>,c)
step 2: output y of decoder at time t<t>From one to c, y<t-1>And h<t>Is determined as follows:
P(yt∣yt-1,yt-2,…,y1,c)=g(h<t>,yt-1,c)
both f and g represent a given activation function.
10. The data enhancement and ECA-Net based handwritten Mongolian recognition method according to claim 8, characterized in that said aggregate cross-entropy loss function comprises the following stages:
(1) aggregating the probabilities for each label category along a time dimension;
the number of characters in each category predicted by the network is taken as a probability distribution
Figure FDA0002987877110000051
The following were used:
Figure FDA0002987877110000052
ykthe number of kth characters in a prediction result is indicated, and T is the number of all characters;
regarding the number of characters in each category of the actual label as another probability distribution
Figure FDA0002987877110000053
The following were used:
Figure FDA0002987877110000054
Nkthe number of kth type characters in the actual label is indicated;
(2) combining label labeling, and standardizing the aggregation result into probability distribution of all categories;
(3) comparing two probability distributions using cross entropy
The degree of similarity between the predicted result distribution and the actual label distribution is expressed by using an aggregate cross-entropy function, and the function is taken as a loss function of the Mongolian handwriting recognition model, and the loss function is as follows:
Figure FDA0002987877110000055
the output 2-dimensional prediction graph height H, width W, and prediction output of the H row and W column are expressed as
Figure FDA0002987877110000056
The probability distribution for each class of character number predicted by the network is as follows:
Figure FDA0002987877110000061
the loss function is expressed as follows:
Figure FDA0002987877110000062
CN202110306372.7A 2021-03-23 2021-03-23 Handwritten Mongolian recognition method based on data enhancement and ECA-Net Pending CN113065432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110306372.7A CN113065432A (en) 2021-03-23 2021-03-23 Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110306372.7A CN113065432A (en) 2021-03-23 2021-03-23 Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Publications (1)

Publication Number Publication Date
CN113065432A true CN113065432A (en) 2021-07-02

Family

ID=76562965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110306372.7A Pending CN113065432A (en) 2021-03-23 2021-03-23 Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Country Status (1)

Country Link
CN (1) CN113065432A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469123A (en) * 2021-07-21 2021-10-01 内蒙古工业大学 Traditional Mongolian letter recognition method based on improved VGG-16 model
CN113887328A (en) * 2021-09-10 2022-01-04 天津理工大学 Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330379A (en) * 2017-06-13 2017-11-07 内蒙古大学 A kind of Mongol hand-written recognition method and device
CN108447062A (en) * 2018-02-01 2018-08-24 浙江大学 A kind of dividing method of the unconventional cell of pathological section based on multiple dimensioned mixing parted pattern
WO2018194456A1 (en) * 2017-04-20 2018-10-25 Universiteit Van Amsterdam Optical music recognition omr : converting sheet music to a digital format
CN109325243A (en) * 2018-10-22 2019-02-12 内蒙古大学 Mongolian word cutting method and its word cutting system of the character level based on series model
CN109492232A (en) * 2018-10-22 2019-03-19 内蒙古工业大学 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN110414498A (en) * 2019-06-14 2019-11-05 华南理工大学 A kind of natural scene text recognition method based on intersection attention mechanism
CN110443127A (en) * 2019-06-28 2019-11-12 天津大学 In conjunction with the musical score image recognition methods of residual error convolutional coding structure and Recognition with Recurrent Neural Network
CN110598221A (en) * 2019-08-29 2019-12-20 内蒙古工业大学 Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN111104884A (en) * 2019-12-10 2020-05-05 电子科技大学 Chinese lip language identification method based on two-stage neural network model
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN111476793A (en) * 2020-03-10 2020-07-31 西北大学 Dynamic enhanced magnetic resonance imaging processing method, system, storage medium and terminal
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN111695527A (en) * 2020-06-15 2020-09-22 内蒙古大学 Mongolian online handwriting recognition method
CN111738169A (en) * 2020-06-24 2020-10-02 北方工业大学 Handwriting formula recognition method based on end-to-end network model
CN111783705A (en) * 2020-07-08 2020-10-16 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN112101241A (en) * 2020-09-17 2020-12-18 西南科技大学 Lightweight expression recognition method based on deep learning
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112364668A (en) * 2020-11-10 2021-02-12 内蒙古工业大学 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
WO2018194456A1 (en) * 2017-04-20 2018-10-25 Universiteit Van Amsterdam Optical music recognition omr : converting sheet music to a digital format
CN107330379A (en) * 2017-06-13 2017-11-07 内蒙古大学 A kind of Mongol hand-written recognition method and device
CN108447062A (en) * 2018-02-01 2018-08-24 浙江大学 A kind of dividing method of the unconventional cell of pathological section based on multiple dimensioned mixing parted pattern
CN109325243A (en) * 2018-10-22 2019-02-12 内蒙古大学 Mongolian word cutting method and its word cutting system of the character level based on series model
CN109492232A (en) * 2018-10-22 2019-03-19 内蒙古工业大学 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN110414498A (en) * 2019-06-14 2019-11-05 华南理工大学 A kind of natural scene text recognition method based on intersection attention mechanism
CN110443127A (en) * 2019-06-28 2019-11-12 天津大学 In conjunction with the musical score image recognition methods of residual error convolutional coding structure and Recognition with Recurrent Neural Network
CN110598221A (en) * 2019-08-29 2019-12-20 内蒙古工业大学 Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN111104884A (en) * 2019-12-10 2020-05-05 电子科技大学 Chinese lip language identification method based on two-stage neural network model
CN111476793A (en) * 2020-03-10 2020-07-31 西北大学 Dynamic enhanced magnetic resonance imaging processing method, system, storage medium and terminal
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN111695527A (en) * 2020-06-15 2020-09-22 内蒙古大学 Mongolian online handwriting recognition method
CN111738169A (en) * 2020-06-24 2020-10-02 北方工业大学 Handwriting formula recognition method based on end-to-end network model
CN111783705A (en) * 2020-07-08 2020-10-16 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN112101241A (en) * 2020-09-17 2020-12-18 西南科技大学 Lightweight expression recognition method based on deep learning
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium
CN112364668A (en) * 2020-11-10 2021-02-12 内蒙古工业大学 Mongolian Chinese machine translation method based on model independent element learning strategy and differentiable neural machine
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
CANJIE LUO 等: "Learn to Augment-Joint Data Augmentation and Network Optimization for Text Recognition", 《AIXIV:2003.06606V1》 *
CHEN666CONG: "序列识别问题的聚合交叉熵损失函数(ACE损失函数)", 《HTTPS://BLOG.CSDN.NET/CHEN666CONG/ARTICLE/DETAILS/94392249》 *
PATRICE Y.SIMARD 等: "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis", 《PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION》 *
QILONG WANG 等: "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks", 《ARXIV:1910.03151V4》 *
ZECHENG XIE 等: "Aggregation Cross-Entropy for Sequence Recognition", 《AIXIV:1904.08364V2》 *
ZHUN ZHONG 等: "Random Erasing Data Augmentation", 《AIXIV:1708.04896V2》 *
刘聪: "大词汇量脱机手写蒙古文整词识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
大师兄: "RNN Encoder-Decoder and GRU", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/42722623》 *
张振 等: "跨语言多任务学习深层神经网络在蒙汉机器翻译的应用", 《计算机应用与软件》 *
纪明轩: "基于改进自注意力的机器翻译新模型", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
范道尔吉: "蒙古文脱机手写识别研究", 《中国博士学位论文全文数据库 信息科技辑》 *
高学 等: "基于CNN和随机弹性形变的相似手写汉字识别", 《华南理工大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469123A (en) * 2021-07-21 2021-10-01 内蒙古工业大学 Traditional Mongolian letter recognition method based on improved VGG-16 model
CN113887328A (en) * 2021-09-10 2022-01-04 天津理工大学 Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN

Similar Documents

Publication Publication Date Title
Guo et al. A survey on deep learning based face recognition
US11256960B2 (en) Panoptic segmentation
Hu et al. Multi-view linear discriminant analysis network
Guo et al. Locally supervised deep hybrid model for scene recognition
US9489568B2 (en) Apparatus and method for video sensor-based human activity and facial expression modeling and recognition
Ouyang et al. A discriminative deep model for pedestrian detection with occlusion handling
Chandio et al. Cursive text recognition in natural scene images using deep convolutional recurrent neural network
CN109753897B (en) Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning
Liang et al. A new image classification method based on modified condensed nearest neighbor and convolutional neural networks
CN112215280A (en) Small sample image classification method based on meta-backbone network
Lei et al. Scene text recognition using residual convolutional recurrent neural network
CN113065432A (en) Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN111666873A (en) Training method, recognition method and system based on multitask deep learning network
Dash et al. Odia character recognition: a directional review
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
Huang et al. DropRegion training of inception font network for high-performance Chinese font recognition
CN114565789B (en) Text detection method, system, device and medium based on set prediction
Peng et al. Deep boosting: joint feature selection and analysis dictionary learning in hierarchy
CN111259938B (en) Manifold learning and gradient lifting model-based image multi-label classification method
Yang et al. Bag of shape descriptor using unsupervised deep learning for non-rigid shape recognition
CN113762261A (en) Method, device, equipment and medium for recognizing characters of image
Farooqui et al. Offline hand written Urdu word spotting using random data generation
KR101066343B1 (en) Method and apparatus of recognizing patterns using maximization of mutual information based code selection for local binary patterns, and recoding medium thereof
Li et al. Discriminative weighted sparse partial least squares for human detection
Ma et al. Deep image feature learning with fuzzy rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210702

RJ01 Rejection of invention patent application after publication