CN113065432A

CN113065432A - Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Info

Publication number: CN113065432A
Application number: CN202110306372.7A
Authority: CN
Inventors: 仁庆道尔吉; 麻泽蕊; 尹玉娟; 程坤; 李媛; 苏依拉; 李雷孝
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-07-02

Abstract

A handwritten Mongolian recognition method based on data enhancement and ECA-Net is characterized in that data enhancement is carried out on an existing Mongolian handwritten recognition database by utilizing elastic deformation data enhancement and/or random erasure data enhancement to obtain an enhanced database; acquiring a picture of handwritten Mongolian as an input image, and performing feature extraction on the input image by using a residual error network comprising a high-efficiency channel attention module to obtain a feature map; vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database. According to the method, the Mongolian handwriting recognition database with richer forms is obtained by utilizing elastic deformation data enhancement and random erasure data enhancement, and meanwhile, the robustness of the model to shielding is improved. By using the efficient channel attention module, dimension reduction is avoided, cross-channel interaction information is effectively captured, and finally the recognition system trains the ACE loss function to be combined with GRU, so that inference and back propagation can be faster.

Description

Handwritten Mongolian recognition method based on data enhancement and ECA-Net

Technical Field

The invention belongs to the technical field of character recognition, and particularly relates to a handwritten Mongolian recognition method based on data enhancement and ECA-Net.

Background

With the rapid development of the internet and artificial intelligence, education informatization has started to influence and change the traditional education mode, the scenes of human-computer interaction such as online answering and the like are more and more common, and the handwriting recognition problem becomes a research direction in the field of computer vision. It is a simple matter for humans to recognize handwritten text, but this is very complicated for computers. In recent years, the development of deep convolutional neural networks brings revolutionary changes to the field of computer vision, and the combination of convolutional neural networks and cyclic neural networks has achieved great success in the problem of image-based sequence recognition, and the development of the field of handwriting recognition is promoted. As an important research area of pattern recognition, handwriting recognition has received extensive research and attention from academia. Handwriting recognition research in popular languages (e.g., chinese, english, japanese) has progressed from simple isolated word recognition to the fields of text line recognition, unconstrained handwriting recognition, document recognition, and scene character recognition.

However, the handwriting recognition of Mongolian languages and other small languages starts late, related research is less, Mongolian has the characteristics of huge vocabulary, free writing, serious character deformation and the like, and great challenges are brought to the handwriting recognition of Mongolian.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a handwritten Mongolian recognition method based on data enhancement and ECA-Net, which adopts a random erasing and/or elastic deformation method to perform data enhancement on a handwritten Mongolian database, generates training images with different shielding degrees in the process, can further improve the generalization capability of a neural network, reduces the risk of overfitting, enables a model to have robustness to shielding (erasing), solves the problem of small scale of the directly used Mongolian handwritten recognition database, extracts input image features by ECA-Net to obtain a feature map, vectorizes the feature map finally, and recognizes the handwritten Mongolian by utilizing an enhanced database.

In order to achieve the purpose, the invention adopts the technical scheme that:

a handwritten Mongolian recognition method based on data enhancement and ECA-Net comprises the following steps:

the first step is as follows: performing data enhancement on the existing Mongolian handwriting recognition database by utilizing elastic deformation data enhancement and/or random erasure data enhancement to obtain an enhanced database;

the second step is that: acquiring a picture of handwritten Mongolian as an input image, and performing feature extraction on the input image by using a deep convolutional neural network to obtain a feature map, wherein the deep convolutional neural network is a residual error network containing a high-efficiency channel attention module, namely ECA-Net;

the third step: vectorizing the feature map, and recognizing the handwritten Mongolian by using the enhanced database.

The elastic deformation data enhancement is the spatial data enhancement of elastic deformation of an image of a handwritten Mongolian character, and once every enhancement, an enhanced handwritten Mongolian character image is generated, namely, the handwritten Mongolian character subjected to elastic deformation is generated, and the Mongolian corresponding to the original character image is used as a data label of the enhanced Mongolian character image.

The elastic deformation data enhancement is to divide an image into N small blocks on average, initialize 2(N +1) reference points p along the top and bottom boundaries of the image, set a circular transformation area with a radius R by taking the reference points p as the center of a circle, and enhance the image by randomly moving the reference points p to q within the radius R through a similarity transformation based on moving least squares, wherein for any point u in the image, the transformation follows the following expression:

T(u)＝(u-p_*)M+q_*

m is a linear transformation matrix, which is constrained to have M for some scalar λ^TM＝λ²The nature of I;

p_*and q is_*Weighted centroids of reference point p and reference point q, respectively:

p_idenotes the ith initialization reference point, q_iRepresenting the ith moving reference point, i.e. p_iA randomly moved reference point; w is a_iRepresents the weight of any point u in the image, and the formula is as follows:

a is set to a fixed value of 1 when u is close to p_iThe weight is increased, meaning that u depends mainly on the motion of the nearest reference point.

The random erasing data enhancement is the data enhancement of random erasing to the image of a handwritten Mongolian character, an enhanced handwritten Mongolian character image is generated once enhancement, namely, a handwritten Mongolian character which is randomly erased is generated, and the Mongolian corresponding to the original character image is used as the data label of the enhanced Mongolian character image.

The random erasure data enhancement is a random selection of a rectangular area I in the image I_eAnd erasing pixels of the image by using random values to generate training images with different shielding degrees, wherein the steps are as follows:

step 1: inputting an image I with the size S, wherein S is W H, W and H are respectively the width and the height of the image I, and setting an erasing area ratio range [ S [ S ] ]_l，S_h]And erase aspect ratio range [ r₁，r₂]Initializing the erasing probability p to be 0-1;

step 2: randomly selecting rectangular areas I in an image I_eAnd erasing its pixels with random values, wherein the rectangular area I is divided into_eArea of (2) is randomly initialized to S_eRandomly initializing the erase aspect ratio to r_e，

In [ S ]_l，S_h]Within the range of r_eIn [ r ]₁，r₂]In the range of I_eThe area size of (d) is calculated by the following formula:

W_eand H_eIs a randomly erased rectangular area I_eLength and width of (d);

and step 3: randomly initializing a point P ═ x in the image I_e，y_e)，x_eAnd y_eIs a randomly initialized point coordinate;

and 4, step 4: making a decision on the erased portion if x_e+W_e≤W，y_e+H_eH is less than or equal to H, then area (x)_e，y_e，x_e+W_e，y_e+H_e) Set as the selected rectangular area I_e(ii) a Otherwise, repeating the above process until selecting the rectangular area I meeting the requirement_eFor a selected rectangular area I_eWherein each pixel is assigned to [0, 255 respectively]Is the random value of (1).

The efficient channel attention module executes fast 1D convolution with convolution kernel size k so as to generate each channel weight of the input image, and the convolution kernel size k, namely the coverage of a channel during local cross-channel interaction, is used for determining a channel y_iThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, local cross-channel interaction information is captured by considering each channel of the input image and its k neighbors, a parameter matrix W_kRepresents the learned channel attention weight, W_kExpression (2)The following were used:

W_krelating to k x C parameters, C representing the size of the input image feature matrix, i.e. channel dimension, image channel y_iOnly y need to be considered for the weight of_iAnd the information interaction of k adjacent channels, the weight calculation formula is as follows:

wherein,

denotes y_iThe j-th adjacent channel of (a),

to represent

The weight of (a) is determined,

denotes y_iK is in direct proportion to C, and the relationship is as follows:

C＝φ(k)＝2^(γ+k-b)

given C, k is adaptively adjusted by the following equation:

in the formula, the calculation formula | x_oddThe nearest odd number representing x, γ and b, are set to fixed constants, by mapping ψ, and using a non-linear mapping, the high-dimensional channel has a longer range of interaction, while the low-dimensional channel experiences a shorter range of interaction.

The ECA-Net is used for realizing the feature extraction of the input image, so that the dimension reduction can be effectively avoided, and the cross-channel interaction information can be captured. The method is beneficial to obtaining higher precision of character recognition and reducing the complexity of the model. I.e. channel attention can be learned in a more efficient way.

In the third step, a gated cycle unit is combined with an aggregation cross entropy loss function to realize vectorization of the characteristic diagram, and the coding processing process of the gated cycle unit is as follows:

step 1: hidden state h transmitted by last node_<t-1>Acquiring two gating states of a reset gate and an update gate from an input x of the current node, and normalizing the acquired information through a sigmoid function to enable the acquired information to serve as a gating signal;

step 2: when the activation operation of the jth hidden unit is performed, the operation of the reset gate is as follows:

r_j＝σ([W_rx]_j+[U_rh_<t-1>]_j)

wherein σ is a logical sigmoid function, [ alpha ], []_jThe jth element representing the vector, i.e. the jth hidden unit, W_rAnd U_rIs the weight matrix learned in the reset gate;

and step 3: the calculation method for updating the gate is as follows:

z_j＝σ([W_zx]_j+[U_zh_<t-1>]_j)

W_zand U_zIs the weight matrix learned in the update gate;

and 4, step 4: the hidden state of the jth hidden unit at the current time slice t is calculated by the following formula:

wherein,

including the input of the current node and also the last oneAdding the hidden state of the jth hidden unit of the time slice into the current hidden state, wherein the calculation formula is as follows:

w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice.

In the GRU, when the update gate is close to 0, the hidden state is forced to ignore the previous hidden state and update with only the current input. This effectively allows the hidden state to discard any information found to be irrelevant in the future, allowing for a more compact representation.

In the third step, a gated cycle unit is combined with an aggregation cross entropy loss function to realize vectorization of the feature diagram, and the decoding processing process of the gated cycle unit is as follows:

step 1: first, the decoder is operative to apply the feature vector c, the output y of the previous time slice_<t-1>And the previous hidden node h_<t-1>As input, get h_<t>The calculation formula is as follows:

h_<t>＝f(h_<t-1>,y_<t-1>,c)

f represents the activation function given that a significant probability, e.g. softmax, must be generated.

Step 2: output y of decoder at time t_<t>From one to c, y_<t-1>And h_<t>Is determined as follows:

P(y_t∣y_t-1,y_t-2,…,y₁,c)＝g(h_<t>,y_t-1,c)

g represents the given activation function, which must produce a valid probability, e.g. softmax.

The aggregate cross-entropy loss function includes the following stages:

(1) aggregating the probabilities for each label category along a time dimension;

the number of characters in each category predicted by the network is taken as a probability distribution

The following were used:

y_kthe number of kth characters in a prediction result is indicated, and T is the number of all characters;

regarding the number of characters in each category of the actual label as another probability distribution

The following were used:

N_kthe number of kth type characters in the actual label is indicated;

(2) combining label labeling, and standardizing the aggregation result into probability distribution of all categories;

(3) comparing two probability distributions using cross entropy

The degree of similarity between the predicted result distribution and the actual label distribution is expressed by using an aggregate cross-entropy function, and the function is taken as a loss function of the Mongolian handwriting recognition model, and the loss function is as follows:

the aggregation cross entropy loss function is the optimization of CTC, is simpler to realize and can be well suitable for 2D prediction. It can be flattened into a 1D prediction with 2D prediction and applied directly to 2D prediction as input.

Assuming that the output 2-dimensional prediction map has a height H and a width W (not equal to the original size through CNN), the prediction output of the H-th row and W-th column is represented as

The probability distribution for each class of character number predicted by the network is as follows:

the loss function is expressed as follows:

as shown in the above equation, the loss can be calculated by straightening the original 2-dimensional prediction to a 1-dimensional prediction result using the ACE loss function.

Compared with the prior art, the invention has the beneficial effects that:

(1) the Mongolian handwriting recognition database with richer forms can be obtained by utilizing elastic deformation data enhancement and random erasure data enhancement according to the existing Mongolian handwriting database, and meanwhile, the robustness of the model to shielding is improved.

(2) The ECA-Net is selected and used by the feature extraction network, and the efficient attention mechanism module is used, so that dimension reduction is avoided, and cross-channel interaction information is effectively captured.

(3) In training the final Mongolian handwriting recognition system, choosing to use the ACE loss function in combination with GRU, inference and back propagation can be faster, and the ACE loss function can adapt to the 2D prediction problem by flattening the 2D prediction to a 1D prediction.

Drawings

FIG. 1 is a block diagram of a data enhancement and ECA-Net based method for Mongolian handwriting recognition according to the present invention.

FIG. 2 is a schematic diagram of the ECA-Net structure.

FIG. 3 is a schematic diagram of a long-term memory network (LSTM) structure.

Fig. 4 is a schematic diagram of a gated cycle unit (GRU) structure.

FIG. 5 is a schematic diagram of the structure of an implementation of the Aggregate Cross Entropy (ACE) loss function.

FIG. 6 is a test set image of a handwritten Mongolian portion of the invention

FIG. 7 shows partial test results of the handwritten Mongolian recognition model

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

As shown in FIG. 1, the invention is a handwritten Mongolian recognition method based on data enhancement and ECA-Net, comprising the following steps:

the first step is as follows: the method is characterized in that the limited data are enhanced from the perspective of spatial transformation and/or noise addition, namely, the data of the existing Mongolian handwriting recognition database is enhanced by utilizing elastic deformation data enhancement and/or random erasing data enhancement to obtain an enhanced database, Mongolian handwriting recognition data with richer forms is obtained, and a data basis is provided for improving the recognition capability of a model and the robustness of shielding.

1. Elastic deformation data enhancement

The elastic deformation data enhancement starts from the spatial transformation of an image, the existing data is enhanced, the principle of the method is that the text image generated by elastic deformation adopts the similar deformation based on the moving least square method as a transformation strategy, namely, the similar transformation is carried out, and the purpose is to increase the diversity of each character in a text character string. Specifically, the image of a handwritten Mongolian character is subjected to elastic deformation spatial data enhancement, and once enhanced, an enhanced handwritten Mongolian character image is generated, namely, the handwritten Mongolian character subjected to elastic deformation is generated, and the Mongolian corresponding to the original character image is used as the data label of the enhanced Mongolian character image.

The elastic deformation data enhancement needs more customized reference points to carry out elastic deformation, and the specific implementation process is as follows:

equally dividing the image into N small blocks, initializing 2(N +1) reference points p along the top and bottom boundaries of the image, setting a circular transformation area with radius R around the reference point p, and enhancing the image by randomly moving the reference point p to q within the radius R through a similarity transformation based on moving least squares, wherein for any point u in the image, the transformation follows the following expression:

T(u)＝(u-p_*)M+q_*

2. Random erasure data enhancement

The random erasure data enhancement starts with adding noise points, enhances the existing data, enables the model to have robustness to occlusion (erasure), and reduces the overfitting risk. The method is characterized in that random erasing data enhancement is carried out on an image of a handwritten Mongolian character, an enhanced handwritten Mongolian character image is generated once enhancement is carried out, namely, a handwritten Mongolian character which is randomly erased is generated, and Mongolian corresponding to an original character image is used as a data tag of the enhanced Mongolian character image.

The specific implementation process of random erasure data enhancement is as follows:

randomly selecting a rectangular area I in an image I_eAnd erasing pixels of the image by using random values to generate training images with different shielding degrees, wherein the steps are as follows:

I_e＝W_e*H_e

W_eand H_eIs a randomly erased rectangular area I_eLength and width of (d);

The Mongolian handwriting image is enhanced by adopting the two operation modes, and the Mongolian handwriting image with richer and more diversified forms can be generated.

The second step is that: the method comprises the steps of obtaining a picture of the handwritten Mongolian as an input image, and extracting features of the input image by using a deep convolutional neural network to obtain a feature map, wherein the deep convolutional neural network is a residual error network (ECA-Net) containing an Efficient Channel Attention (ECA) module. In the feature extraction stage, the efficient channel attention module is introduced to avoid dimension reduction and effectively capture cross-channel interaction information. The method is beneficial to obtaining higher precision of character recognition, and simultaneously reduces the complexity of the model, namely, effective channel attention can be learned in a more effective mode.

Specifically, referring to fig. 2, the efficient channel attention module performs fast 1D convolution with a convolution kernel size k, which is the coverage of a channel during local cross-channel interaction, to generate each channel weight of the input image, and determines a channel y_iThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, the efficient channel attention module captures local cross-channel interaction information, a parameter matrix W, by considering each channel of the input image and its k neighbors_kRepresents the learned channel attention weight, W_kThe expression of (a) is as follows:

W_krelating to k x C parameters, C representing the size of the input image feature matrix, i.e. the channel dimension, and image channels y for avoiding complete independence of different channels_iOnly y need to be considered for the weight of_iAnd the information interaction of k adjacent channels, the weight calculation formula is as follows:

wherein,

denotes y_iThe j-th adjacent channel of (a),

to represent

The weight of (a) is determined,

denotes y_iK is in direct proportion to C, and the relationship is as follows:

C＝φ(k)＝2^(γ+k-b)

given C, k is adaptively adjusted by the following equation:

Specifically, the invention combines a gated cycle unit (GRU) with An (ACE) loss function to realize vectorization of a characteristic diagram, and further completes the recognition task of the Mongolian script of the handwritten body, namely, the gated cycle unit (GRU) is combined with Aggregation Cross Entropy (ACE) to construct a sequence recognition neural network, and the characteristic serialization and the sequence recognition are completed. In the process of character recognition, the GRU is used for easier training, so that the training difficulty can be reduced to a great extent, and the training efficiency is improved; the aggregation cross entropy loss function mainly aims at sequence identification, and is optimization of CTC (ConnectionTestumorallClassification) and an attention mechanism, the aggregation cross entropy loss function does not consider the sequence among characters in a sequence, only considers the occurrence frequency of a certain class of characters in a character string, and meanwhile, the ACE loss function is simple to realize and can be well suitable for the 2D prediction problem by flattening the 2D prediction into the 1D prediction.

Referring to fig. 3 and 4, GRUs have unique gating states compared to normal RNNs. The update gate (z) functions like the forget and input gate of the LSTM. It decides which information to discard and which new information to add. In combination with the above, the operation of this step is to forget the transmitted h_(t-1)And adding some dimension information input by the current node.

The reset gate (r) is another gate used to decide how much past information to forget, reset the hidden state and combine it with the input of the current time slice, and perform normalization. The reset gate purposely adds the current input to the current hidden state, corresponding to "remembering the state at the current time". Similar to the selective memory phase of LSTM.

1. The coding process of the gated cyclic unit is as follows:

r_j＝σ([W_rx]_j+[U_rh_<t-1>]_j)

where σ is a logical sigmoid function，[ ]_jThe jth element representing the vector, i.e. the jth hidden unit, W_rAnd U_rIs the weight matrix learned in the reset gate;

and step 3: the calculation method for updating the gate is as follows:

z_j＝σ([W_zx]_j+[U_zh_<t-1>]_j)

W_zand U_zIs the weight matrix learned in the update gate;

wherein,

the hidden state of the jth hidden unit of the previous time slice is added to the current hidden state in a targeted manner, and the calculation formula is as follows:

w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice. In the GRU, when the update gate is close to 0, the hidden state is forced to ignore the previous hidden state and update with only the current input. This effectively allows the hidden state to discard any information found to be irrelevant in the future, allowing for a more compact representation.

The decoding process of the gated cyclic unit is as follows:

h_<t〉＝f(h_<t-1>,y_<t-1>,c)

Step 2: output y of decoder at time t_<t>From one to c, y_<t-1>And h_<t>Is determined by the conditional distribution of (2), and y with the highest conditional probability is selected_<t>As an output of the current time, the calculation formula is as follows:

P(y_t∣y_t-1,y_t-2,…,y₁,c)＝g(h_<t>,y_t-1,c)

2. Referring to fig. 5, the aggregate cross-entropy loss function includes the following stages:

(1) aggregating probabilities for each label category along a time dimension

The following were used:

The following were used:

N_kthe number of kth type characters in the actual label is indicated;

(2) normalizing the aggregated results to probability distributions of all categories in combination with label labeling

(3) Comparing two probability distributions using cross entropy

furthermore, the aggregate cross-entropy loss function can be flattened into a 1D prediction with the 2D prediction and applied directly to the 2D prediction as input.

the loss function is expressed as follows:

Referring to fig. 6 and 7, a specific handwritten montage recognition case is depicted.

In the construction of the handwritten Mongolian recognition model, a 20 ten thousand word Mongolian handwriting recognition database was used. The partial handwritten Mongolian test set image is shown in FIG. 6, and the experimental results are shown in FIG. 7. In the experimental result, the first column is a Mongolian label, the second column is a model identification output, and the rightmost column is the accuracy rate of single character/word identification. It can be found that the experimental identification accuracy is higher, and the model training efficiency is improved to some extent. In general, the recognition effect is better.

Claims

1. A handwritten Mongolian recognition method based on data enhancement and ECA-Net is characterized by comprising the following steps:

2. The method for handwritten Mongolian recognition based on data enhancement and ECA-Net as claimed in claim 1, wherein said elastic deformation data enhancement is a spatial data enhancement of elastic deformation of an image of a handwritten Mongolian character, each time of enhancement, an enhanced handwritten Mongolian digital image is generated, i.e. an elastically deformed handwritten Mongolian character is generated, and the Mongolian corresponding to the original digital image is used as its data tag for the enhanced Mongolian image.

3. The method for handwriting Mongolian recognition based on data enhancement and ECA-Net according to claim 2, wherein said elastic deformation data enhancement is to divide the image into N small blocks on average, initialize 2(N +1) reference points p along the top and bottom boundaries of the image, set a circular transformation area with radius R around the reference point p, enhance the image by moving the reference point p to q within radius R at random based on the similarity transformation of moving least squares, wherein for any point u in the image, the transformation follows the following expression:

T(u)＝(u-p_*)M+q_*

4. The method of claim 1, wherein the enhancement of random erasure data is a random erasure of an image of a handwritten Mongolian character, and wherein each enhancement generates a random erasure of a digital image of the handwritten Mongolian character, and wherein the enhanced image of the Mongolian character uses the corresponding Mongolian from the original digital image as its data label.

5. The method of claim 4, wherein the enhancement of the randomly erased data is a random selection of a rectangular area I in the image I_eAnd erasing its pixels with random values to generate different shadesTraining images of the gear level, comprising the following steps:

I_e＝W_e*H_e

W_eand H_eIs a randomly erased rectangular area I_eLength and width of (d);

6. The data-based enhancement and EC of claim 1The A-Net handwritten Mongolian recognition method is characterized in that the high-efficiency channel attention module executes fast 1D convolution with convolution kernel size k so as to generate each channel weight of an input image, and the convolution kernel size k, namely the coverage of a channel during local cross-channel interaction, is used for determining a channel y_iThe weighted k adjacent channels, i.e. how many neighbors around the channel participate in the attention prediction of this channel; after channel-level global pooling without dimensionality reduction, local cross-channel interaction information is captured by considering each channel of the input image and its k neighbors, a parameter matrix W_kRepresents the learned channel attention weight, W_kThe expression of (a) is as follows:

W_kand k is C, and C represents the size of the characteristic matrix of the input image, namely the channel dimension.

7. The method of claim 6, wherein the image channel y is a Mongolian handwriting recognition method based on data enhancement and ECA-Net_iOnly y need to be considered for the weight of_iAnd information interaction of its k adjacent channels, y_iThe weight calculation formula of (c) is as follows:

wherein,

denotes y_iThe j-th adjacent channel of (a),

to represent

The weight of (a) is determined,

denotes y_iK is in direct proportion to C, and the relationship is as follows:

C＝φ(k)＝2^(γ+k-b)

given C, k is adaptively adjusted by the following equation:

8. The method for identifying handwritten Mongolian based on data enhancement and ECA-Net as claimed in claim 1, wherein in said third step, gate-controlled loop units are combined with aggregate cross entropy loss function to realize vectorization of feature map.

9. The method for identifying handwritten Mongolian based on data enhancement and ECA-Net according to claim 8, wherein said coding process of said gated round robin unit is as follows:

step 1: hidden state h transmitted by last node_(t-1)Acquiring two gating states of a reset gate and an update gate from an input x of the current node, and normalizing the acquired information through a sigmoid function to enable the acquired information to serve as a gating signal;

r_j＝σ([W_rx]_j+[U_rh_<t-1>]_j)

wherein σ is a logical sigmoid function, [ alpha ], []_jTo representThe jth element of the vector, i.e. the jth hidden unit, W_rAnd U_rIs the weight matrix learned in the reset gate;

and step 3: the calculation method for updating the gate is as follows:

z_j＝σ([W_zx]_j+[U_zh_<t-1>]_j)

W_zand U_zIs the weight matrix learned in the update gate;

wherein,

w represents the weight of the current input, and U represents the weight of the hidden state of the jth hidden unit in the last time slice;

the decoding process of the gating cycle unit is as follows:

step 1: the decoder outputs the feature vector c, the output y of the previous time slice_<t-1>And the previous hidden node h_<t-1>As input, get h_<t>The calculation formula is as follows:

h_<t>＝f(h_<t-1>,y_<t-1>,c)

P(y_t∣y_t-1,y_t-2,…,y₁,c)＝g(h_<t>,y_t-1,c)

both f and g represent a given activation function.

10. The data enhancement and ECA-Net based handwritten Mongolian recognition method according to claim 8, characterized in that said aggregate cross-entropy loss function comprises the following stages:

The following were used:

The following were used:

N_kthe number of kth type characters in the actual label is indicated;

(3) comparing two probability distributions using cross entropy

the output 2-dimensional prediction graph height H, width W, and prediction output of the H row and W column are expressed as

the loss function is expressed as follows: