CN111507348A

CN111507348A - Character segmentation and identification method based on CTC deep neural network

Info

Publication number: CN111507348A
Application number: CN202010294624.4A
Authority: CN
Inventors: 侯进; 黄贤俊
Original assignee: Shenyuan Hengji Technology Co ltd
Current assignee: Shenyuan Hengji Technology Co ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-08-07

Abstract

The invention provides a character segmentation and recognition method based on a CTC deep neural network, which comprises the following steps of a1. extracting features from an input image by using CNN, a2. carrying out CE LL segmentation on the features extracted by a1, fixing the height and width of CE LL, determining the number of the CE LL by the image length, a3. directly segmenting and classifying each CE LL with the determined features and outputting segmentation signals, a4. calculating the loss between the real segmentation signals and the segmentation signals output by a model by using CTC L OSS, feeding back the loss condition and training the whole model, a5. segmenting a text by using the segmentation signals output by a3 and carrying out CNN + softmax classification recognition on a single character, wherein the real segmentation signals are mapped by a labeled text, and the CTC L OSS can automatically solve the problem of text alignment.

Description

Character segmentation and identification method based on CTC deep neural network

Technical Field

The invention relates to the technical field of character segmentation and recognition, in particular to a character segmentation and recognition method based on a CTC deep neural network.

Background

The OCR (Optical Character Recognition) is an image processing technology for detecting, recognizing and structuring image characters, the current OCR technology is divided into three modules of detection, Recognition and structuring, and the detection and Recognition have two frames, namely 1 single Character detection and single Character Recognition frames, wherein the core task of the detection module is to detect each independent Character region of an image, the Recognition module is responsible for recognizing characters of each cut Character region image, the basic frame of the existing Recognition model is CNN + softmax, 2 text line detection and whole line Recognition frames, the core task of the detection module is to detect a text region in the image, and the Recognition module is responsible for recognizing texts of the cut text region image, and the basic frame of the existing Recognition model is CNN + L + STM + CTC.

The method mainly comprises the following steps of I extracting features from pictures, II enumerating a large number of rectangles to try to regress corresponding objects, III classifying the enumerated rectangles into 2 types, including a positive sample and other negative samples which are large in intersection, IV cutting the positive sample from a feature map, then removing the boundary of a regression target according to the feature map, and text line recognition, wherein the depth cycle network is used for carrying out word string recognition, combining CNN and RNN, extracting image features through CNN, transversely slicing the feature map, then adopting a typical RNN structure L STM cycle network to carry out text inference, and finally adopting a CTC loss function to calculate the difference between predicted characters and labels to finish end-to-end training.

The text information in the image content is structured based on a template and rule logic, the existing frames have certain disadvantages, detection needs to carry out drawing frame marking on each character position under the 1 st frame, marking cost is extremely high, and meanwhile, the structuring difficulty is greatly improved, so that the specific target of a general detection task is to detect text lines instead of detecting individual characters independently; in the framework of type 2, the identification module takes a lot of time. Aiming at the problems of the existing frame, a method for segmenting and identifying characters, which is used for optimizing the identification frame and reducing the time consumption of an identification module, is urgently needed.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

The invention aims to provide a character segmentation and identification method based on a CTC deep neural network, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a character segmentation and identification method based on a CTC deep neural network comprises the following steps:

a1. extracting features of the input image by using CNN;

a2. performing CE LL segmentation on the features extracted in the step a1, wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;

a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;

a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;

a5. and c, segmenting the text by using the segmentation signal output by the step a3, and performing CNN + softmax classification recognition on the single character.

Further, the real segmentation signal is mapped from the annotation text.

Further, the CTC L OSS may automatically resolve text alignment issues.

Further, the CTC L OSS calculation formula is as follows:

where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.

Further, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.

Further, the probability of the single alignment scheme is calculated as follows:

compared with the prior art, the invention has the following beneficial effects: 1. compared with the prior art, the method greatly improves the speed of OCR recognition, and recognition optimization can be targeted after the OCR recognition is cut into single characters, so that the final precision is improved; 2. compared with the prior art, the method improves the recognition frame, and separates the recognition process into two steps of character segmentation and single character recognition, so that optimization can be carried out separately and pertinently. 3. Compared with the prior art, the method has the advantages of unique concept, novel idea and operability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a process diagram of the CTC deep neural network-based text segmentation and recognition method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiment is only one embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is further described with reference to the following drawings and detailed description:

as shown in fig. 1, the method for character segmentation and recognition based on CTC deep neural network includes the following steps:

a1. extracting features of the input image by using CNN;

The method fundamentally and greatly improves the speed of OCR recognition, and the recognition optimization can have pertinence after the OCR recognition is cut into single characters, so that the final precision is improved, and meanwhile, the recognition frame is improved, and the recognition process is separated into two steps of character segmentation and single character recognition, so that the optimization can be separately carried out with pertinence.

According to the above, the real segmentation signal is mapped from the annotation text.

In accordance with the above, the CTC L OSS may automatically resolve text alignment issues.

According to the above, the CTC L OSS calculation formula is as follows:

According to the above, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.

According to the above, the probability of the single alignment scheme is calculated as follows:

verifying that the real segmentation signals are mapped by the label text, so that the text contents of ' I ', ' I ' and ' Chinese are respectively input, and the real segmentation signals ' 101 ', ' 10101 ' and ' 101010101 ' can be obtained. The core role of CTC is to automatically solve the alignment problem, so that the difference between the segmented signal output by the model and the real signal mapped based on the text length in the above case can be calculated.

The word "state" is taken as an example to illustrate its associated definition and computational logic:

by using the above method for character segmentation and recognition based on the CTC deep neural network, the CTC L OSS aims to maximize the probability value of formula 1, where x in formula 1 is a feature generated after an image is extracted by using CNN, L is a real signal, and pi represents a single correct alignment scheme, where all in formula 3 are correct alignment schemes.

The probability of a single correct alignment scheme is calculated using the above calculation formula 2.

And calculating the loss between the real segmentation signal and the segmentation signal output by the model through a formula 1, a formula 2 and a formula 3, feeding back the loss condition and training the whole model.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that various changes, modifications and substitutions can be made without departing from the spirit and scope of the invention as defined by the appended claims. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The character segmentation and identification method based on the CTC deep neural network is characterized by comprising the following steps of:

a1. extracting features of the input image by using CNN;

a2. performing CE LL segmentation on the features extracted in the step (a1), wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;

a5. and (d) segmenting the text by utilizing the segmentation signal output by the step (a3), and performing CNN + softmax classification recognition on the single character.

2. The method of CTC deep neural network-based word segmentation and recognition according to claim 1, wherein the true segmentation signal is mapped from an annotation text.

3. The CTC deep neural network-based word segmentation and recognition method of claim 1, wherein the CTC L OSS may automatically solve a text alignment problem.

4. The CTC deep neural network-based text segmentation and recognition method of claim 1, wherein the CTC L OSS calculation formula is as follows:

5. The method of CTC deep neural network-based text segmentation and recognition of claim 4, wherein the single correct alignment scheme is one of alignment schemes, the single alignment scheme being probabilistically present in the alignment scheme.

6. The method of CTC deep neural network-based text segmentation and recognition of claim 5, wherein the probability of the single alignment scheme is calculated as follows: