CN111507348A - Character segmentation and identification method based on CTC deep neural network - Google Patents
Character segmentation and identification method based on CTC deep neural network Download PDFInfo
- Publication number
- CN111507348A CN111507348A CN202010294624.4A CN202010294624A CN111507348A CN 111507348 A CN111507348 A CN 111507348A CN 202010294624 A CN202010294624 A CN 202010294624A CN 111507348 A CN111507348 A CN 111507348A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- ctc
- recognition
- neural network
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The invention provides a character segmentation and recognition method based on a CTC deep neural network, which comprises the following steps of a1. extracting features from an input image by using CNN, a2. carrying out CE LL segmentation on the features extracted by a1, fixing the height and width of CE LL, determining the number of the CE LL by the image length, a3. directly segmenting and classifying each CE LL with the determined features and outputting segmentation signals, a4. calculating the loss between the real segmentation signals and the segmentation signals output by a model by using CTC L OSS, feeding back the loss condition and training the whole model, a5. segmenting a text by using the segmentation signals output by a3 and carrying out CNN + softmax classification recognition on a single character, wherein the real segmentation signals are mapped by a labeled text, and the CTC L OSS can automatically solve the problem of text alignment.
Description
Technical Field
The invention relates to the technical field of character segmentation and recognition, in particular to a character segmentation and recognition method based on a CTC deep neural network.
Background
The OCR (Optical Character Recognition) is an image processing technology for detecting, recognizing and structuring image characters, the current OCR technology is divided into three modules of detection, Recognition and structuring, and the detection and Recognition have two frames, namely 1 single Character detection and single Character Recognition frames, wherein the core task of the detection module is to detect each independent Character region of an image, the Recognition module is responsible for recognizing characters of each cut Character region image, the basic frame of the existing Recognition model is CNN + softmax, 2 text line detection and whole line Recognition frames, the core task of the detection module is to detect a text region in the image, and the Recognition module is responsible for recognizing texts of the cut text region image, and the basic frame of the existing Recognition model is CNN + L + STM + CTC.
The method mainly comprises the following steps of I extracting features from pictures, II enumerating a large number of rectangles to try to regress corresponding objects, III classifying the enumerated rectangles into 2 types, including a positive sample and other negative samples which are large in intersection, IV cutting the positive sample from a feature map, then removing the boundary of a regression target according to the feature map, and text line recognition, wherein the depth cycle network is used for carrying out word string recognition, combining CNN and RNN, extracting image features through CNN, transversely slicing the feature map, then adopting a typical RNN structure L STM cycle network to carry out text inference, and finally adopting a CTC loss function to calculate the difference between predicted characters and labels to finish end-to-end training.
The text information in the image content is structured based on a template and rule logic, the existing frames have certain disadvantages, detection needs to carry out drawing frame marking on each character position under the 1 st frame, marking cost is extremely high, and meanwhile, the structuring difficulty is greatly improved, so that the specific target of a general detection task is to detect text lines instead of detecting individual characters independently; in the framework of type 2, the identification module takes a lot of time. Aiming at the problems of the existing frame, a method for segmenting and identifying characters, which is used for optimizing the identification frame and reducing the time consumption of an identification module, is urgently needed.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
The invention aims to provide a character segmentation and identification method based on a CTC deep neural network, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a character segmentation and identification method based on a CTC deep neural network comprises the following steps:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step a1, wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and c, segmenting the text by using the segmentation signal output by the step a3, and performing CNN + softmax classification recognition on the single character.
Further, the real segmentation signal is mapped from the annotation text.
Further, the CTC L OSS may automatically resolve text alignment issues.
Further, the CTC L OSS calculation formula is as follows:
where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.
Further, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.
Further, the probability of the single alignment scheme is calculated as follows:
compared with the prior art, the invention has the following beneficial effects: 1. compared with the prior art, the method greatly improves the speed of OCR recognition, and recognition optimization can be targeted after the OCR recognition is cut into single characters, so that the final precision is improved; 2. compared with the prior art, the method improves the recognition frame, and separates the recognition process into two steps of character segmentation and single character recognition, so that optimization can be carried out separately and pertinently. 3. Compared with the prior art, the method has the advantages of unique concept, novel idea and operability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a process diagram of the CTC deep neural network-based text segmentation and recognition method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiment is only one embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is further described with reference to the following drawings and detailed description:
as shown in fig. 1, the method for character segmentation and recognition based on CTC deep neural network includes the following steps:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step a1, wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and c, segmenting the text by using the segmentation signal output by the step a3, and performing CNN + softmax classification recognition on the single character.
The method fundamentally and greatly improves the speed of OCR recognition, and the recognition optimization can have pertinence after the OCR recognition is cut into single characters, so that the final precision is improved, and meanwhile, the recognition frame is improved, and the recognition process is separated into two steps of character segmentation and single character recognition, so that the optimization can be separately carried out with pertinence.
According to the above, the real segmentation signal is mapped from the annotation text.
In accordance with the above, the CTC L OSS may automatically resolve text alignment issues.
According to the above, the CTC L OSS calculation formula is as follows:
where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.
According to the above, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.
According to the above, the probability of the single alignment scheme is calculated as follows:
verifying that the real segmentation signals are mapped by the label text, so that the text contents of ' I ', ' I ' and ' Chinese are respectively input, and the real segmentation signals ' 101 ', ' 10101 ' and ' 101010101 ' can be obtained. The core role of CTC is to automatically solve the alignment problem, so that the difference between the segmented signal output by the model and the real signal mapped based on the text length in the above case can be calculated.
The word "state" is taken as an example to illustrate its associated definition and computational logic:
by using the above method for character segmentation and recognition based on the CTC deep neural network, the CTC L OSS aims to maximize the probability value of formula 1, where x in formula 1 is a feature generated after an image is extracted by using CNN, L is a real signal, and pi represents a single correct alignment scheme, where all in formula 3 are correct alignment schemes.
The probability of a single correct alignment scheme is calculated using the above calculation formula 2.
And calculating the loss between the real segmentation signal and the segmentation signal output by the model through a formula 1, a formula 2 and a formula 3, feeding back the loss condition and training the whole model.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that various changes, modifications and substitutions can be made without departing from the spirit and scope of the invention as defined by the appended claims. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. The character segmentation and identification method based on the CTC deep neural network is characterized by comprising the following steps of:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step (a1), wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and (d) segmenting the text by utilizing the segmentation signal output by the step (a3), and performing CNN + softmax classification recognition on the single character.
2. The method of CTC deep neural network-based word segmentation and recognition according to claim 1, wherein the true segmentation signal is mapped from an annotation text.
3. The CTC deep neural network-based word segmentation and recognition method of claim 1, wherein the CTC L OSS may automatically solve a text alignment problem.
5. The method of CTC deep neural network-based text segmentation and recognition of claim 4, wherein the single correct alignment scheme is one of alignment schemes, the single alignment scheme being probabilistically present in the alignment scheme.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010294624.4A CN111507348A (en) | 2020-04-15 | 2020-04-15 | Character segmentation and identification method based on CTC deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010294624.4A CN111507348A (en) | 2020-04-15 | 2020-04-15 | Character segmentation and identification method based on CTC deep neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111507348A true CN111507348A (en) | 2020-08-07 |
Family
ID=71870990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010294624.4A Pending CN111507348A (en) | 2020-04-15 | 2020-04-15 | Character segmentation and identification method based on CTC deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111507348A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381175A (en) * | 2020-12-05 | 2021-02-19 | 中国人民解放军32181部队 | Circuit board identification and analysis method based on image processing |
CN113537201A (en) * | 2021-09-16 | 2021-10-22 | 江西风向标教育科技有限公司 | Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3200123A1 (en) * | 2016-01-28 | 2017-08-02 | Siemens Aktiengesellschaft | Text recognition |
CN108960245A (en) * | 2018-07-13 | 2018-12-07 | 广东工业大学 | The detection of tire-mold character and recognition methods, device, equipment and storage medium |
CN109241894A (en) * | 2018-08-28 | 2019-01-18 | 南京安链数据科技有限公司 | A kind of specific aim ticket contents identifying system and method based on form locating and deep learning |
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
US10388272B1 (en) * | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
CN110175603A (en) * | 2019-04-01 | 2019-08-27 | 佛山缔乐视觉科技有限公司 | A kind of engraving character recognition methods, system and storage medium |
CN110766017A (en) * | 2019-10-22 | 2020-02-07 | 国网新疆电力有限公司信息通信公司 | Mobile terminal character recognition method and system based on deep learning |
CN110866530A (en) * | 2019-11-13 | 2020-03-06 | 云南大学 | Character image recognition method and device and electronic equipment |
-
2020
- 2020-04-15 CN CN202010294624.4A patent/CN111507348A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3200123A1 (en) * | 2016-01-28 | 2017-08-02 | Siemens Aktiengesellschaft | Text recognition |
CN108960245A (en) * | 2018-07-13 | 2018-12-07 | 广东工业大学 | The detection of tire-mold character and recognition methods, device, equipment and storage medium |
CN109241894A (en) * | 2018-08-28 | 2019-01-18 | 南京安链数据科技有限公司 | A kind of specific aim ticket contents identifying system and method based on form locating and deep learning |
US10388272B1 (en) * | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
CN110175603A (en) * | 2019-04-01 | 2019-08-27 | 佛山缔乐视觉科技有限公司 | A kind of engraving character recognition methods, system and storage medium |
CN110766017A (en) * | 2019-10-22 | 2020-02-07 | 国网新疆电力有限公司信息通信公司 | Mobile terminal character recognition method and system based on deep learning |
CN110866530A (en) * | 2019-11-13 | 2020-03-06 | 云南大学 | Character image recognition method and device and electronic equipment |
Non-Patent Citations (1)
Title |
---|
张艺玮;赵一嘉;王馨悦;董兰芳;: "结合密集神经网络与长短时记忆模型的中文识别" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381175A (en) * | 2020-12-05 | 2021-02-19 | 中国人民解放军32181部队 | Circuit board identification and analysis method based on image processing |
CN113537201A (en) * | 2021-09-16 | 2021-10-22 | 江西风向标教育科技有限公司 | Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022147965A1 (en) | Arithmetic question marking system based on mixnet-yolov3 and convolutional recurrent neural network (crnn) | |
Zuo et al. | Natural scene text recognition based on encoder-decoder framework | |
CN113158808B (en) | Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction | |
Rehman et al. | Performance analysis of character segmentation approach for cursive script recognition on benchmark database | |
CN104778470B (en) | Text detection based on component tree and Hough forest and recognition methods | |
CN112818951A (en) | Ticket identification method | |
EP3349124A1 (en) | Method and system for generating parsed document from digital document | |
CN113537227B (en) | Structured text recognition method and system | |
CN110413787B (en) | Text clustering method, device, terminal and storage medium | |
CN111507348A (en) | Character segmentation and identification method based on CTC deep neural network | |
CN111539417B (en) | Text recognition training optimization method based on deep neural network | |
CN109086772A (en) | A kind of recognition methods and system distorting adhesion character picture validation code | |
CN114187595A (en) | Document layout recognition method and system based on fusion of visual features and semantic features | |
CN117437647B (en) | Oracle character detection method based on deep learning and computer vision | |
CN114581932A (en) | Picture table line extraction model construction method and picture table extraction method | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
Karanje et al. | Survey on text detection, segmentation and recognition from a natural scene images | |
CN109284678A (en) | Guideboard method for recognizing semantics and system | |
US20230315799A1 (en) | Method and system for extracting information from input document comprising multi-format information | |
CN111581478A (en) | Cross-website general news acquisition method for specific subject | |
CN113761209B (en) | Text splicing method and device, electronic equipment and storage medium | |
CN112800259B (en) | Image generation method and system based on edge closure and commonality detection | |
CN114529894A (en) | Rapid scene text detection method fusing hole convolution | |
Fan et al. | BURSTS: A bottom-up approach for robust spotting of texts in scenes | |
Mosannafat et al. | Farsi text detection and localization in videos and images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |