CN109977950A - A kind of character recognition method based on mixing CNN-LSTM network - Google Patents

A kind of character recognition method based on mixing CNN-LSTM network Download PDF

Info

Publication number
CN109977950A
CN109977950A CN201910222217.XA CN201910222217A CN109977950A CN 109977950 A CN109977950 A CN 109977950A CN 201910222217 A CN201910222217 A CN 201910222217A CN 109977950 A CN109977950 A CN 109977950A
Authority
CN
China
Prior art keywords
convolution
cnn
picture
text
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910222217.XA
Other languages
Chinese (zh)
Inventor
袁三男
沈兆轩
刘虹
孙哲
刘志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Electric Power
University of Shanghai for Science and Technology
Original Assignee
Shanghai University of Electric Power
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Electric Power filed Critical Shanghai University of Electric Power
Priority to CN201910222217.XA priority Critical patent/CN109977950A/en
Publication of CN109977950A publication Critical patent/CN109977950A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to a kind of character recognition methods based on mixing CNN-LSTM network, comprising steps of the picture comprising text 1) is obtained, by picture gray processing, normalization;2) picture is divided into training sample and test sample two parts, the corresponding text of training sample picture is transcoded into binary set sequence as label according to dictionary;3) picture of processed training sample is mixed with label input and carries out data training in CNN-LSTM network, obtain identification model;4) treated image and label are inputted into identification model, exports eigenmatrix;5) eigenmatrix is subjected to CTC_loss calculating using gradient descent method, obtains loss function result;6) repeat 1)~3), until loss function result is minimum, its corresponding text label sequence is obtained as prediction data;7) by prediction data, according to dictionary, inversion code obtains text again, obtains Text region result.Compared with prior art, the present invention has many advantages, such as simplified operation, enhancing recognition effect.

Description

A kind of character recognition method based on mixing CNN-LSTM network
Technical field
The present invention relates to deep learning field and field of character recognition, more particularly, to one kind based on mixing CNN-LSTM net The character recognition method of network.
Background technique
It develops rapidly recently as deep learning field, is achieved in multiple fields such as speech recognition, Text regions Excellent achievement.The more popular usual layer of neural network design method for text identification and speech recognition in the prior art Number is shallower, can not preferably extract high dimensional feature, cause recognition effect poor, and uses the neural network knot based on deep learning The calculating process of structure usually requires the calculation resources of flood tide, is unfavorable for the exploitation of mobile terminal.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind based on mixing CNN- The character recognition method of LSTM network.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of character recognition method based on mixing CNN-LSTM network, this method include the following steps:
S1: obtaining the picture comprising text, by picture gray processing, normalization;
S2: being divided into training sample and test sample two parts for picture, and by the corresponding text of training sample picture according to Dictionary is transcoded into binary set sequence as label;
S3: the picture of training sample processed in step S2 is mixed in CNN-LSTM network with label input and is counted According to training, identification model is obtained after the completion of training;
Mix CNN-LSTM network be comprising convolutional neural networks and convolution LSTM (Long Short-Term Memory, Shot and long term memory) network depth structure neural network, convolutional neural networks are to extract high dimensional feature, convolution LSTM network Feature and long-term information are extracted the high dimensional feature extracted is continued through convolution.Wherein, convolution LSTM network is volume The product shot and long term recirculating network neural network structure constituted with bypass.
The specific structure for mixing CNN-LSTM network includes the convolution block set gradually, light weight block, light weight block, convolution length Phase block, light weight block, convolution block, convolution shot and long term block, convolution block, light weight block, convolution block.The light weight block is wide convolution, depth Spend separable convolution, point-by-point convolution connects the lightweight structure to be formed.
S4: by the identification model in step S1, S2 treated image and label input step S3, eigenmatrix is exported;
S5: the eigenmatrix in step S4 is subjected to CTC_loss (Connectionist using gradient descent method Temporal Classification loss, the loss of connectionism chronological classification) it calculates, obtain loss function result;It is preferred that Ground carries out CTC_loss calculating using Adma gradient descent algorithm, obtains loss function result.
S6: repeating step S1 to step S3, until the loss function result in step S5 is minimum, the smallest loss of acquisition The corresponding text label sequence of function is as prediction data;Prediction data is 0,1 two values matrix, and 1 position is dictionary in matrix The position of middle text.
S7: by prediction data, according to dictionary, inversion code obtains text again, obtains the Text region result of test sample.Tool Hold in vivo are as follows:
Finally obtained eigenmatrix is decoded using CTC decoder, 1 position is text in word in eigenmatrix The position of allusion quotation exports specific Text region result after searching dictionary.
Compared with prior art, the invention has the following advantages that
1) use that depth separates convolution in lightweight structure block in the present invention reduces parameter needed for sequential operation, simplifies Operation, light-weighted network can be used for mobile terminal exploitation;
2) convolution shot and long term structure is used in combination the present invention with lightweight structure block, can construct the network structure of deeper To extract high-dimensional feature, enhancing recognition effect;
3) CNN and convolution LSTM network structure are used in mixed way by the present invention, and this design scheme is effectively extracting data The long-rang dependence that word or sentence to be identified can be effectively analyzed while feature more has the identification of length, difficult phrase and sentence Effect.
Detailed description of the invention
Fig. 1 is the flow diagram of the convolution shot and long term block structure in the method for the present invention;
Fig. 2 is the flow diagram of the light weight block structure in the method for the present invention;
Fig. 3 is CNN-LSTM neural network general construction schematic diagram in the method for the present invention.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.Obviously, described embodiment is this A part of the embodiment of invention, rather than whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, all should belong to the scope of protection of the invention.
The present invention relates to a kind of character recognition method based on mixing CNN-LSTM network, the specific steps packet realized It includes:
Step 1: input includes the picture of text, it is (100,32,1) by picture gray processing, size normalization.By picture It is divided into training sample and test sample two parts, and the corresponding text of training sample picture is transcoded into binary set according to dictionary Sequence is as label.
Step 2: by the picture of the training sample after gray processing, normalization and label input mixing CNN-LSTM network, figure Piece enters the 1st convolutional layer and carries out convolutional calculation and Nonlinear Mapping, it is preliminary extract feature and reduce image array size obtain Size is the three-dimensional matrice of (50,16,32).
It is calculated Step 3: the matrix in step 2 is sent into first light weight block, the coefficient of wide convolution at this time is 1, therefore image channel is not risen and tie up, carry out subchannel convolutional calculation and point-by-point convolutional calculation only to extract feature and reduce Matrix size is (25,8,16).
Step 4: the matrix in step 3 is sent into second light weight block, a liter dimension-convolution-squeeze operation is carried out, it is wide Convolution coefficient is 6, therefore matrix channel dimension is promoted 6 times first, and convolution meter is then carried out in the space in different channels It calculates, is finally being compressed matrix size using point-by-point convolution.Matrix size variation be (25,8,24) → (25,8,144) → (25, 8,24).
Step 5: the matrix in step 4 is sent to extraction time dependence-producing property in convolution shot and long term block, matrix size It is constant.
Step 6: the matrix in step 5 is sent to third light weight block, a liter dimension-convolution-squeeze operation, wide volume are carried out Product coefficient is 6, and matrix size variation is (25,8,32) → (25,8,192) → (25,8,32).
Step 7: the matrix in step 6, which is sent to convolutional layer, carries out convolutional calculation and Nonlinear Mapping, while reducing square Battle array size, matrix size become (25,4,128).
Step 8: the matrix in step 7 is sent to extraction time dependence-producing property in convolution shot and long term block, matrix size It is constant.
It is calculated Step 9: the matrix in step 8 is sent to convolutional layer, matrix size becomes (25,2,256).
It is calculated Step 10: the matrix in step 9 is sent in light weight block, matrix size becomes (25,2,192).
It is calculated Step 11: the matrix in step 10 is sent in convolution block, matrix size is (25,1,512).
Step 12: the matrix in step 11 is carried out CTC_loss calculating, damage of obtaining a result using gradient descent method Lose function result.Preferably, CTC_loss calculating is carried out using Adma gradient descent algorithm, obtains loss function result.
Step 13: repeating step 1 to step 12, until the loss function result in step 12 is minimum, end is instructed Practice.
Step 14: starting to identify after training, matrix obtained in step 11 is sent into CTC decoder, into Row decoding, obtains result.The feature that matrix obtained in step 11 obtains is 0,1 two values matrix, and matrix uses CTC decoder It decodes, 1 position is exactly text in the position of dictionary in matrix, can export specific Text region result after searching dictionary.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any The staff for being familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims (8)

1. a kind of character recognition method based on mixing CNN-LSTM network, which is characterized in that method includes the following steps:
1) picture (w, h, n) comprising text is obtained, by picture gray processing, normalization;
2) picture is divided into training sample and test sample two parts, and the corresponding text of training sample picture is turned according to dictionary Code is at binary set sequence as label;
3) picture of training sample processed in step 2) is mixed into progress data instruction in CNN-LSTM network with label input Practice, identification model is obtained after the completion of training;
4) by step 1), 2) treated image and label input step 3) in identification model, export eigenmatrix;
5) eigenmatrix in step 4) is subjected to CTC_loss calculating using gradient descent method, obtains loss function result;
6) step 1) is repeated to step 3), until the loss function result minimum in step 5), the smallest loss function of acquisition Corresponding text label sequence is as prediction data;
7) by prediction data, according to dictionary, inversion code obtains text again, obtains the Text region result of test sample.
2. a kind of character recognition method based on mixing CNN-LSTM network according to claim 1, which is characterized in that institute The mixing CNN-LSTM network stated includes convolutional neural networks to extract high dimensional feature and to by the higher-dimension extracted spy Sign continues through convolution to extract the convolution LSTM block of feature and long-term information.
3. a kind of character recognition method based on mixing CNN-LSTM network according to claim 2, which is characterized in that institute The neural network structure that the convolution LSTM block stated is made of convolution shot and long term recirculating network and bypass.
4. a kind of character recognition method based on mixing CNN-LSTM network according to claim 3, which is characterized in that institute The mixing CNN-LSTM network stated is to set gradually convolution block, light weight block, light weight block, convolution shot and long term block, light weight block, convolution Block, convolution shot and long term block, convolution block, light weight block, convolution block depth structure neural network.
5. a kind of character recognition method based on mixing CNN-LSTM network according to claim 4, which is characterized in that institute The light weight block stated is wide convolution, depth separates convolution, point-by-point convolution connects the lightweight structure to be formed.
6. a kind of character recognition method based on mixing CNN-LSTM network according to claim 1, which is characterized in that institute The prediction data stated is 0,1 two values matrix, and 1 position is the position of text in dictionary in matrix.
7. a kind of character recognition method based on mixing CNN-LSTM network according to claim 6, which is characterized in that step Rapid particular content 7) are as follows:
Finally obtained eigenmatrix is decoded using CTC decoder, 1 position is text in dictionary in eigenmatrix Position exports specific Text region result after searching dictionary.
8. a kind of character recognition method based on CNN-LSTM neural network according to claim 1, which is characterized in that adopt CTC_loss calculating is carried out with Adma gradient descent algorithm, obtains loss function result.
CN201910222217.XA 2019-03-22 2019-03-22 A kind of character recognition method based on mixing CNN-LSTM network Pending CN109977950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910222217.XA CN109977950A (en) 2019-03-22 2019-03-22 A kind of character recognition method based on mixing CNN-LSTM network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910222217.XA CN109977950A (en) 2019-03-22 2019-03-22 A kind of character recognition method based on mixing CNN-LSTM network

Publications (1)

Publication Number Publication Date
CN109977950A true CN109977950A (en) 2019-07-05

Family

ID=67080046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910222217.XA Pending CN109977950A (en) 2019-03-22 2019-03-22 A kind of character recognition method based on mixing CNN-LSTM network

Country Status (1)

Country Link
CN (1) CN109977950A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674777A (en) * 2019-09-30 2020-01-10 电子科技大学 Optical character recognition method in patent text scene
CN110674825A (en) * 2019-09-27 2020-01-10 安徽咪鼠科技有限公司 Character recognition method, device and system applied to intelligent voice mouse and storage medium
CN111428718A (en) * 2020-03-30 2020-07-17 南京大学 Natural scene text recognition method based on image enhancement
CN112185543A (en) * 2020-09-04 2021-01-05 南京信息工程大学 Construction method of medical induction data flow classification model
CN112836702A (en) * 2021-01-04 2021-05-25 浙江大学 Text recognition method based on multi-scale feature extraction
CN113065352A (en) * 2020-06-29 2021-07-02 国网浙江省电力有限公司杭州供电公司 Operation content identification method for power grid dispatching work text
CN113221871A (en) * 2021-05-31 2021-08-06 支付宝(杭州)信息技术有限公司 Character recognition method, device, equipment and medium
CN114757969A (en) * 2022-04-08 2022-07-15 华南理工大学 Character and image writing track recovery method based on global tracking decoding

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN108388896A (en) * 2018-02-09 2018-08-10 杭州雄迈集成电路技术有限公司 A kind of licence plate recognition method based on dynamic time sequence convolutional neural networks
CN108399419A (en) * 2018-01-25 2018-08-14 华南理工大学 Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks
CN108427953A (en) * 2018-02-26 2018-08-21 北京易达图灵科技有限公司 A kind of character recognition method and device
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109492679A (en) * 2018-10-24 2019-03-19 杭州电子科技大学 Based on attention mechanism and the character recognition method for being coupled chronological classification loss

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN108399419A (en) * 2018-01-25 2018-08-14 华南理工大学 Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks
CN108388896A (en) * 2018-02-09 2018-08-10 杭州雄迈集成电路技术有限公司 A kind of licence plate recognition method based on dynamic time sequence convolutional neural networks
CN108427953A (en) * 2018-02-26 2018-08-21 北京易达图灵科技有限公司 A kind of character recognition method and device
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109492679A (en) * 2018-10-24 2019-03-19 杭州电子科技大学 Based on attention mechanism and the character recognition method for being coupled chronological classification loss

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
江帆等: "基于CNN-GRNN模型的图像识别", 《计算机工程》 *
靳振伟: "基于CTPN的网店工商信息提取***的研究和实现", 《现代信息科技》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674825A (en) * 2019-09-27 2020-01-10 安徽咪鼠科技有限公司 Character recognition method, device and system applied to intelligent voice mouse and storage medium
CN110674777A (en) * 2019-09-30 2020-01-10 电子科技大学 Optical character recognition method in patent text scene
CN111428718A (en) * 2020-03-30 2020-07-17 南京大学 Natural scene text recognition method based on image enhancement
CN113065352A (en) * 2020-06-29 2021-07-02 国网浙江省电力有限公司杭州供电公司 Operation content identification method for power grid dispatching work text
CN113065352B (en) * 2020-06-29 2022-07-19 国网浙江省电力有限公司杭州供电公司 Method for identifying operation content of power grid dispatching work text
CN112185543A (en) * 2020-09-04 2021-01-05 南京信息工程大学 Construction method of medical induction data flow classification model
CN112836702A (en) * 2021-01-04 2021-05-25 浙江大学 Text recognition method based on multi-scale feature extraction
CN113221871A (en) * 2021-05-31 2021-08-06 支付宝(杭州)信息技术有限公司 Character recognition method, device, equipment and medium
CN113221871B (en) * 2021-05-31 2024-02-02 支付宝(杭州)信息技术有限公司 Character recognition method, device, equipment and medium
CN114757969A (en) * 2022-04-08 2022-07-15 华南理工大学 Character and image writing track recovery method based on global tracking decoding

Similar Documents

Publication Publication Date Title
CN109977950A (en) A kind of character recognition method based on mixing CNN-LSTM network
CN109086678B (en) Pedestrian detection method for extracting image multilevel features based on deep supervised learning
CN109948691B (en) Image description generation method and device based on depth residual error network and attention
CN109993164A (en) A kind of natural scene character recognition method based on RCRNN neural network
CN110134946B (en) Machine reading understanding method for complex data
CN110209801A (en) A kind of text snippet automatic generation method based on from attention network
CN112417139A (en) Abstract generation method based on pre-training language model
CN111460807A (en) Sequence labeling method and device, computer equipment and storage medium
CN109582952A (en) Poem generation method, device, computer equipment and medium
CN112000771B (en) Judicial public service-oriented sentence pair intelligent semantic matching method and device
CN112487812A (en) Nested entity identification method and system based on boundary identification
CN111242033A (en) Video feature learning method based on discriminant analysis of video and character pairs
CN111783478B (en) Machine translation quality estimation method, device, equipment and storage medium
CN112070114A (en) Scene character recognition method and system based on Gaussian constraint attention mechanism network
CN112990196B (en) Scene text recognition method and system based on super-parameter search and two-stage training
CN116343190B (en) Natural scene character recognition method, system, equipment and storage medium
CN112232070A (en) Natural language processing model construction method, system, electronic device and storage medium
CN112329766A (en) Character recognition method and device, electronic equipment and storage medium
CN108417220B (en) Voice signal coding and decoding methods based on agent model Volterra modeling
CN111563161A (en) Sentence recognition method, sentence recognition device and intelligent equipment
CN112926323B (en) Chinese named entity recognition method based on multistage residual convolution and attention mechanism
CN111523325A (en) Chinese named entity recognition method based on strokes
CN115544260A (en) Comparison optimization coding and decoding model and method for text emotion analysis
CN115358227A (en) Open domain relation joint extraction method and system based on phrase enhancement
CN111814508A (en) Character recognition method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705