CN109977950A

CN109977950A - A kind of character recognition method based on mixing CNN-LSTM network

Info

Publication number: CN109977950A
Application number: CN201910222217.XA
Authority: CN
Inventors: 袁三男; 沈兆轩; 刘虹; 孙哲; 刘志超
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power; University of Shanghai for Science and Technology
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2019-07-05

Abstract

The present invention relates to a kind of character recognition methods based on mixing CNN-LSTM network, comprising steps of the picture comprising text 1) is obtained, by picture gray processing, normalization；2) picture is divided into training sample and test sample two parts, the corresponding text of training sample picture is transcoded into binary set sequence as label according to dictionary；3) picture of processed training sample is mixed with label input and carries out data training in CNN-LSTM network, obtain identification model；4) treated image and label are inputted into identification model, exports eigenmatrix；5) eigenmatrix is subjected to CTC_loss calculating using gradient descent method, obtains loss function result；6) repeat 1)~3), until loss function result is minimum, its corresponding text label sequence is obtained as prediction data；7) by prediction data, according to dictionary, inversion code obtains text again, obtains Text region result.Compared with prior art, the present invention has many advantages, such as simplified operation, enhancing recognition effect.

Description

A kind of character recognition method based on mixing CNN-LSTM network

Technical field

The present invention relates to deep learning field and field of character recognition, more particularly, to one kind based on mixing CNN-LSTM net The character recognition method of network.

Background technique

It develops rapidly recently as deep learning field, is achieved in multiple fields such as speech recognition, Text regions Excellent achievement.The more popular usual layer of neural network design method for text identification and speech recognition in the prior art Number is shallower, can not preferably extract high dimensional feature, cause recognition effect poor, and uses the neural network knot based on deep learning The calculating process of structure usually requires the calculation resources of flood tide, is unfavorable for the exploitation of mobile terminal.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind based on mixing CNN- The character recognition method of LSTM network.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of character recognition method based on mixing CNN-LSTM network, this method include the following steps:

S1: obtaining the picture comprising text, by picture gray processing, normalization；

S2: being divided into training sample and test sample two parts for picture, and by the corresponding text of training sample picture according to Dictionary is transcoded into binary set sequence as label；

S3: the picture of training sample processed in step S2 is mixed in CNN-LSTM network with label input and is counted According to training, identification model is obtained after the completion of training；

Mix CNN-LSTM network be comprising convolutional neural networks and convolution LSTM (Long Short-Term Memory, Shot and long term memory) network depth structure neural network, convolutional neural networks are to extract high dimensional feature, convolution LSTM network Feature and long-term information are extracted the high dimensional feature extracted is continued through convolution.Wherein, convolution LSTM network is volume The product shot and long term recirculating network neural network structure constituted with bypass.

The specific structure for mixing CNN-LSTM network includes the convolution block set gradually, light weight block, light weight block, convolution length Phase block, light weight block, convolution block, convolution shot and long term block, convolution block, light weight block, convolution block.The light weight block is wide convolution, depth Spend separable convolution, point-by-point convolution connects the lightweight structure to be formed.

S4: by the identification model in step S1, S2 treated image and label input step S3, eigenmatrix is exported；

S5: the eigenmatrix in step S4 is subjected to CTC_loss (Connectionist using gradient descent method Temporal Classification loss, the loss of connectionism chronological classification) it calculates, obtain loss function result；It is preferred that Ground carries out CTC_loss calculating using Adma gradient descent algorithm, obtains loss function result.

S6: repeating step S1 to step S3, until the loss function result in step S5 is minimum, the smallest loss of acquisition The corresponding text label sequence of function is as prediction data；Prediction data is 0,1 two values matrix, and 1 position is dictionary in matrix The position of middle text.

S7: by prediction data, according to dictionary, inversion code obtains text again, obtains the Text region result of test sample.Tool Hold in vivo are as follows:

Finally obtained eigenmatrix is decoded using CTC decoder, 1 position is text in word in eigenmatrix The position of allusion quotation exports specific Text region result after searching dictionary.

Compared with prior art, the invention has the following advantages that

1) use that depth separates convolution in lightweight structure block in the present invention reduces parameter needed for sequential operation, simplifies Operation, light-weighted network can be used for mobile terminal exploitation；

2) convolution shot and long term structure is used in combination the present invention with lightweight structure block, can construct the network structure of deeper To extract high-dimensional feature, enhancing recognition effect；

3) CNN and convolution LSTM network structure are used in mixed way by the present invention, and this design scheme is effectively extracting data The long-rang dependence that word or sentence to be identified can be effectively analyzed while feature more has the identification of length, difficult phrase and sentence Effect.

Detailed description of the invention

Fig. 1 is the flow diagram of the convolution shot and long term block structure in the method for the present invention；

Fig. 2 is the flow diagram of the light weight block structure in the method for the present invention；

Fig. 3 is CNN-LSTM neural network general construction schematic diagram in the method for the present invention.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.Obviously, described embodiment is this A part of the embodiment of invention, rather than whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, all should belong to the scope of protection of the invention.

The present invention relates to a kind of character recognition method based on mixing CNN-LSTM network, the specific steps packet realized It includes:

Step 1: input includes the picture of text, it is (100,32,1) by picture gray processing, size normalization.By picture It is divided into training sample and test sample two parts, and the corresponding text of training sample picture is transcoded into binary set according to dictionary Sequence is as label.

Step 2: by the picture of the training sample after gray processing, normalization and label input mixing CNN-LSTM network, figure Piece enters the 1st convolutional layer and carries out convolutional calculation and Nonlinear Mapping, it is preliminary extract feature and reduce image array size obtain Size is the three-dimensional matrice of (50,16,32).

It is calculated Step 3: the matrix in step 2 is sent into first light weight block, the coefficient of wide convolution at this time is 1, therefore image channel is not risen and tie up, carry out subchannel convolutional calculation and point-by-point convolutional calculation only to extract feature and reduce Matrix size is (25,8,16).

Step 4: the matrix in step 3 is sent into second light weight block, a liter dimension-convolution-squeeze operation is carried out, it is wide Convolution coefficient is 6, therefore matrix channel dimension is promoted 6 times first, and convolution meter is then carried out in the space in different channels It calculates, is finally being compressed matrix size using point-by-point convolution.Matrix size variation be (25,8,24) → (25,8,144) → (25, 8,24).

Step 5: the matrix in step 4 is sent to extraction time dependence-producing property in convolution shot and long term block, matrix size It is constant.

Step 6: the matrix in step 5 is sent to third light weight block, a liter dimension-convolution-squeeze operation, wide volume are carried out Product coefficient is 6, and matrix size variation is (25,8,32) → (25,8,192) → (25,8,32).

Step 7: the matrix in step 6, which is sent to convolutional layer, carries out convolutional calculation and Nonlinear Mapping, while reducing square Battle array size, matrix size become (25,4,128).

Step 8: the matrix in step 7 is sent to extraction time dependence-producing property in convolution shot and long term block, matrix size It is constant.

It is calculated Step 9: the matrix in step 8 is sent to convolutional layer, matrix size becomes (25,2,256).

It is calculated Step 10: the matrix in step 9 is sent in light weight block, matrix size becomes (25,2,192).

It is calculated Step 11: the matrix in step 10 is sent in convolution block, matrix size is (25,1,512).

Step 12: the matrix in step 11 is carried out CTC_loss calculating, damage of obtaining a result using gradient descent method Lose function result.Preferably, CTC_loss calculating is carried out using Adma gradient descent algorithm, obtains loss function result.

Step 13: repeating step 1 to step 12, until the loss function result in step 12 is minimum, end is instructed Practice.

Step 14: starting to identify after training, matrix obtained in step 11 is sent into CTC decoder, into Row decoding, obtains result.The feature that matrix obtained in step 11 obtains is 0,1 two values matrix, and matrix uses CTC decoder It decodes, 1 position is exactly text in the position of dictionary in matrix, can export specific Text region result after searching dictionary.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any The staff for being familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of character recognition method based on mixing CNN-LSTM network, which is characterized in that method includes the following steps:

1) picture (w, h, n) comprising text is obtained, by picture gray processing, normalization；

2) picture is divided into training sample and test sample two parts, and the corresponding text of training sample picture is turned according to dictionary Code is at binary set sequence as label；

3) picture of training sample processed in step 2) is mixed into progress data instruction in CNN-LSTM network with label input Practice, identification model is obtained after the completion of training；

4) by step 1), 2) treated image and label input step 3) in identification model, export eigenmatrix；

5) eigenmatrix in step 4) is subjected to CTC_loss calculating using gradient descent method, obtains loss function result；

6) step 1) is repeated to step 3), until the loss function result minimum in step 5), the smallest loss function of acquisition Corresponding text label sequence is as prediction data；

7) by prediction data, according to dictionary, inversion code obtains text again, obtains the Text region result of test sample.

2. a kind of character recognition method based on mixing CNN-LSTM network according to claim 1, which is characterized in that institute The mixing CNN-LSTM network stated includes convolutional neural networks to extract high dimensional feature and to by the higher-dimension extracted spy Sign continues through convolution to extract the convolution LSTM block of feature and long-term information.

3. a kind of character recognition method based on mixing CNN-LSTM network according to claim 2, which is characterized in that institute The neural network structure that the convolution LSTM block stated is made of convolution shot and long term recirculating network and bypass.

4. a kind of character recognition method based on mixing CNN-LSTM network according to claim 3, which is characterized in that institute The mixing CNN-LSTM network stated is to set gradually convolution block, light weight block, light weight block, convolution shot and long term block, light weight block, convolution Block, convolution shot and long term block, convolution block, light weight block, convolution block depth structure neural network.

5. a kind of character recognition method based on mixing CNN-LSTM network according to claim 4, which is characterized in that institute The light weight block stated is wide convolution, depth separates convolution, point-by-point convolution connects the lightweight structure to be formed.

6. a kind of character recognition method based on mixing CNN-LSTM network according to claim 1, which is characterized in that institute The prediction data stated is 0,1 two values matrix, and 1 position is the position of text in dictionary in matrix.

7. a kind of character recognition method based on mixing CNN-LSTM network according to claim 6, which is characterized in that step Rapid particular content 7) are as follows:

Finally obtained eigenmatrix is decoded using CTC decoder, 1 position is text in dictionary in eigenmatrix Position exports specific Text region result after searching dictionary.

8. a kind of character recognition method based on CNN-LSTM neural network according to claim 1, which is characterized in that adopt CTC_loss calculating is carried out with Adma gradient descent algorithm, obtains loss function result.