CN110309769B - Method for segmenting character strings in picture - Google Patents

Method for segmenting character strings in picture Download PDF

Info

Publication number
CN110309769B
CN110309769B CN201910576925.3A CN201910576925A CN110309769B CN 110309769 B CN110309769 B CN 110309769B CN 201910576925 A CN201910576925 A CN 201910576925A CN 110309769 B CN110309769 B CN 110309769B
Authority
CN
China
Prior art keywords
sub
picture
term memory
neural network
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910576925.3A
Other languages
Chinese (zh)
Other versions
CN110309769A (en
Inventor
张春红
胡铮
邵文良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910576925.3A priority Critical patent/CN110309769B/en
Publication of CN110309769A publication Critical patent/CN110309769A/en
Application granted granted Critical
Publication of CN110309769B publication Critical patent/CN110309769B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for segmenting character strings in a picture, and belongs to the field of computer vision. Firstly, collecting a plurality of character string pictures to divide the character string pictures into training samples and testing samples, and respectively preprocessing each training sample to obtain a plurality of sub-pictures corresponding to each training sample; and marking each sub-picture of each training sample as a sequence in an IOBES mode. Then training a model of the bidirectional long-short term memory neural network and the conditional random field by using a training sample for sequence labeling; during testing, a test sample is input into a trained model of the bidirectional long-short term memory neural network and the conditional random field, and a label sequence with the highest score is obtained. And finally, taking the label sequence with the highest score as a segmentation line of image segmentation, and performing segmentation on the test sample. The invention avoids the step of manually establishing the threshold value when the segmentation is carried out by using a rule algorithm such as a projection method, does not need other prior knowledge, and is convenient to transplant.

Description

Method for segmenting character strings in picture
Technical Field
The invention belongs to the field of computer vision, relates to image page segmentation, and particularly relates to a method for segmenting character strings in a picture.
Background
The character string segmentation belongs to the field of character detection in computer vision, and in most image character detection tasks, character detection mainly aims at natural scene, and can be taken as a task of target detection at the moment, and a conventional target detection algorithm is adopted. However, image text encountered in many text detection fields often falls into a form that occupies one line in the image, such as a license plate number, a house number, or a table text. In such a task, it is generally necessary to recognize characters in an image by first detecting a region of the characters and then recognizing the characters.
In the laboratory sheet table shown in fig. 1, to detect characters in the table, generally, rows in the table are divided and columns are divided. The common method is that the target detection model such as SSD or fast-RCNN algorithm can be used for directly detecting characters, however, omission easily occurs when the target detection-based algorithm is used in a task with dense characters such as a table, and secondly, the arrangement of the characters in the table has a certain rule, so that a simpler method can be adopted, and a better effect can be achieved.
In the prior art, table character detection is generally realized by adopting an image page segmentation method, and an image page segmentation algorithm is used for continuously segmenting a picture so as to obtain a series of image areas containing characters. One of the most common image page segmentation algorithms is the projection method, as disclosed in document 1: the license plate character segmentation method based on the projection characteristic value is a license plate character segmentation algorithm [ J ] of computer application research, 2006,23(7). the projection method is mainly used for common horizontal image segmentation tasks such as license plate character segmentation and the like. Firstly, carrying out binarization on an image, unifying the image into a standard of black background and white characters, and then calculating the sum of pixel values of all pixel points in each row or each column; several thresholds are then determined by a priori knowledge to find reasonable partitioning points.
Although this method has good effect on the image with regular format and clear handwriting like the license plate number, in many OCR works, there is no regular text picture, so it is difficult to use a reasonable threshold to find the segmentation point. In a common OCR task, many manually established rules need to be added according to the characteristics of data to perfect the segmentation result. These methods typically require a large amount of a priori knowledge and result in a very bulky model. Therefore, a general solution is urgently needed to be found, and a machine learning method is adopted to train through a given data set, so that the model can automatically learn the characteristics of the segmentation points, and a large amount of labor cost is avoided when the prior rule is searched.
Disclosure of Invention
Aiming at the problems, the invention adopts a sequence labeling method, and labels rows and columns in the image to enable the model to predict the segmentation line of the character region in the image, thereby achieving the effect of higher accuracy than the algorithm based on rules, having the advantages of general model, reducing threshold parameters, having no need of prior knowledge advantage and significance of introducing the sequence labeling into the computer vision field, and particularly being a method for segmenting character strings in the image.
Comprises the following steps:
the method comprises the following steps of firstly, collecting a plurality of character string pictures to divide the character string pictures into training samples and testing samples;
step two, respectively preprocessing each training sample to obtain a plurality of sub-pictures corresponding to each training sample;
the pretreatment is as follows: firstly, binarizing a picture, and scaling the picture to 25 pixels in height; then, the adjacent 5 columns of pixel points are divided into one sub-picture, and the dimension of each sub-picture is 5 × 25 ═ 125.
And thirdly, marking each sub-picture of each training sample as a sequence in an IOBES mode aiming at each training sample.
The IOBES notation is: if the sub-picture input is the beginning of a text region, it is labeled B, if the sub-picture input is inside a text region, it is labeled I, if the sub-picture input is the end of a text region, it is labeled E, if the sub-picture input alone becomes a text region, it is labeled S, if the sub-picture input does not belong to a text region, it is labeled O.
Training a model of the bidirectional long-short term memory neural network and the conditional random field by using a training sample for sequence labeling;
the method comprises the following specific steps:
step 401, a bidirectional long and short term memory neural network structure is adopted, and unit information before and after the current long and short term memory neural network unit is connected in series.
Step 402, aiming at a training sample, inputting pixel points of each sub-picture in the sample into a long-short term memory neural network connected in series, and outputting five probability values of the sub-picture labels which are IOBES respectively;
let the set of sub-picture input pixel points be X ═ X1,x2,...xi,...xn);xiAnd the ith pixel point is input as the sub-picture. The probability value of the output label of the long-term and short-term memory neural network connected in series is as follows:
Figure BDA0002112289290000021
w is the five-dimensional output value of the full connection layer and is used for corresponding to IOBES;
Figure BDA0002112289290000022
Figure BDA0002112289290000023
the values of the corresponding units of the backward long-short term memory neural network,
Figure BDA0002112289290000024
Figure BDA0002112289290000025
is the value of the corresponding unit of the forward long-short term memory neural network.
Step 403, adding a conditional random field model after the long-short term memory neural networks are connected in series, and calculating a score for each training sample;
the training sample for the jth labeled sequence comprises m sub-pictures in total, and the label set is y ═ y (y)1,y2,...,ym);
First, the label y from the previous sub-picture is calculatedlLabel y transferred to the next sub-picturel+1Sum of probabilities
Figure BDA0002112289290000026
Then, the label probability values of all the sub-pictures are calculated as
Figure BDA0002112289290000027
Figure BDA0002112289290000028
Denotes the l sub-picture label as ylThe probability value of (2).
Finally, obtaining the score of the training sample;
the calculation formula is as follows:
Figure BDA0002112289290000029
step 404, setting constraint conditions of a model for training the bidirectional long-short term memory neural network and the conditional random field;
the constraint conditions are as follows:
passing the label probability values of all sub-pictures in each training sample through softmax, and ensuring that the probability sum is 1 and is derivable:
Figure BDA0002112289290000031
es(X,y)the power of (a) is the fraction of the correct tag sequence labeled in the current training sample;
Figure BDA0002112289290000032
for each sub-picture, one of the output labels IOBES is selected, and the labels of all the sub-pictures are combined in a sequence.
Step 405, maximizing the log-likelihood function of the correct label sequence, and optimizing each parameter of the model through a back propagation algorithm;
Figure BDA0002112289290000033
inputting the test sample into the trained models of the bidirectional long-short term memory neural network and the conditional random field to obtain a label sequence with the highest score;
and when in test, calculating the label sequence with the highest score by a Viterbi algorithm.
And step six, taking the label sequence with the highest score as a segmentation line of image segmentation, and performing segmentation on the test sample.
The process is as follows:
firstly, all sub-graph sequences contained by the BIE sequences are found, and then sub-graph sequences corresponding to a single S classification are found.
Then, artificial rules are defined: and (3) connecting a plurality of I tags after the plurality of O tags, and converting the first I tag into a B tag by using rule correction. Similarly, if multiple I tags are followed by multiple O tags, the last I tag is converted to an E tag.
And (4) connecting the subgraphs which are judged as the character areas in series through post-processing to obtain the detected character areas, and finishing character detection.
The invention has the advantages that:
a method for segmenting character strings in a picture applies a sequence labeling problem to image page segmentation, thereby avoiding the step of manually formulating a threshold value required by segmentation by using a rule algorithm such as a projection method. Meanwhile, the model is used for segmentation, and only the training needs to be carried out again in different data sets, other prior knowledge is not needed, and the transplantation is convenient.
Drawings
Fig. 1 is a conventional laboratory sheet table used for character string segmentation.
Fig. 2 is a flowchart of a method for segmenting a character string in a picture according to the present invention.
Fig. 3 is a picture with a line of text as employed in an embodiment of the present invention.
FIG. 4 is a schematic diagram of the bidirectional long-short term memory neural network and the model of the conditional random field for sequence labeling according to the present invention.
Detailed description of the preferred embodiments
The invention will be described in further detail below with reference to the drawings and examples.
The method uses the long-short term memory neural network and the model of the conditional random field to carry out sequence labeling to carry out image page segmentation, finds out the segmentation lines in the image through the sequence labeling model, and achieves the effect of segmenting character strings in the image. The application environment is as follows:
CPU Intel(R)Xeon(R)CPU [email protected]
memory device 32G
GPU Nvidia TITAN Xp
Operating system Ubuntu 16.04LTS
Developing languages Python
As shown in fig. 2, the following steps are divided:
the method comprises the following steps of firstly, collecting a plurality of character string pictures to divide the character string pictures into training samples and testing samples;
step two, respectively preprocessing each training sample to obtain a plurality of sub-pictures corresponding to each training sample;
the pretreatment is as follows: firstly, binarizing a picture, and scaling the picture to 25 pixels in height; and then dividing every five adjacent columns of pixel points into one sub-picture, wherein the dimension of each sub-picture is 5 × 25 ═ 125.
And thirdly, marking each sub-picture of each training sample as a sequence in an IOBES mode aiming at each training sample.
The IOBES notation is: if the sub-picture input is the beginning of a text region, it is labeled B, if the sub-picture input is inside a text region, it is labeled I, if the sub-picture input is the end of a text region, it is labeled E, if the sub-picture input alone becomes a text region, it is labeled S, if the sub-picture input does not belong to a text region, it is labeled O.
Training a model of the bidirectional long-short term memory neural network and the conditional random field by using a training sample for sequence labeling;
the method comprises the following specific steps:
step 401, a bidirectional long and short term memory neural network structure is adopted, and unit information before and after the current long and short term memory neural network unit is connected in series.
Step 402, aiming at a training sample, inputting pixel points of each sub-picture in the sample into a long-short term memory neural network connected in series, and outputting five probability values of the sub-picture labels which are IOBES respectively;
let the set of sub-picture input pixel points be X ═ X1,x2,...xi,...xn);xiAnd the ith pixel point is input as the sub-picture. The picture input for each unit is the value of all pixel points in the picture area with width 5 and height 25. The probability value of the output label of each unit of the long-term and short-term memory neural network connected in series front and back is as follows:
Figure BDA0002112289290000041
w is a full connection layer, the input is a vector formed by connecting the bidirectional long-short term memory neural networks in series, the output is five dimensions, the dimension is reduced to 5 dimensions, and the vector corresponds to BEIOS;
Figure BDA0002112289290000042
Figure BDA0002112289290000043
the values of the corresponding units of the backward long-short term memory neural network,
Figure BDA0002112289290000044
Figure BDA0002112289290000045
is the value of the corresponding unit of the forward long-short term memory neural network.
The network firstly calculates the values of the corresponding units of the long-term and short-term memory neural networks in the front and back directions, and then connects the two output values in series.
Step 403, adding a conditional random field model after the long-short term memory neural networks are connected in series, and calculating a score for each training sample;
the training sample for the jth labeled sequence comprises m sub-pictures in total, and the label set is y ═ y (y)1,y2,...,ym);
First, the label y from the previous sub-picture is calculatedlLabel y transferred to the next sub-picturel+1Sum of probabilities
Figure BDA0002112289290000051
Then, the label probability values of all the sub-pictures are calculated as
Figure BDA0002112289290000052
Figure BDA0002112289290000053
Denotes the l sub-picture label as ylThe probability value of (2).
Finally, obtaining the score of the training sample;
the calculation formula is as follows:
Figure BDA0002112289290000054
step 404, setting constraint conditions of a model for training the bidirectional long-short term memory neural network and the conditional random field;
the constraint conditions are as follows:
passing the label probability values of all sub-pictures in each training sample through softmax, and ensuring that the probability sum is 1 and is derivable:
Figure BDA0002112289290000055
es(X,y)the power of (a) is the fraction of the correct tag sequence labeled in the current training sample;
Figure BDA0002112289290000056
for each sub-picture, one of the output labels IOBES is selected, and the labels of all the sub-pictures are combined in a sequence.
Step 405, maximizing the log-likelihood function of the correct label sequence, and optimizing each parameter of the model through a back propagation algorithm;
Figure BDA0002112289290000057
inputting a test sample into the trained model of the bidirectional long-short term memory neural network and the conditional random field during testing to obtain a label sequence with the highest score;
and when in test, calculating the label sequence with the highest score by a Viterbi algorithm.
And step six, taking the label sequence with the highest score as a segmentation line of image segmentation, and performing segmentation on the test sample.
The process is as follows:
firstly, all sub-graph sequences contained by the BIE sequences are found, and then sub-graph sequences corresponding to a single S classification are found.
Then, artificial rules are defined: and (3) connecting a plurality of I tags after the plurality of O tags, and converting the first I tag into a B tag by using rule correction. Similarly, if multiple I tags are followed by multiple O tags, the last I tag is converted to an E tag.
And (4) connecting the subgraphs which are judged as the character areas in series through post-processing to obtain the detected character areas, and finishing character detection.
Example (b):
as shown in fig. 3, the present invention uses a picture with a line of text as input, and first preprocesses the picture to obtain a plurality of sub-pictures, and performs sequential IOBES labeling on each group of input sub-pictures.
And then, realizing image page segmentation by adopting a sequence labeling model combining a bidirectional long-short term memory neural network and a conditional random field. The sequence annotation is applied to image page segmentation, so that character string segmentation in an image is converted into a sequence annotation problem, and segmentation is performed through a neural network model.
The unidirectional long-short term memory neural network has the disadvantage that each current unit can only obtain information of the previous unit and can not obtain information of the following unit. Therefore, the present invention employs a bi-directional architecture, which was proposed by Hochreiter et al in 1997 and recently improved and generalized by Alex Graves. The earliest applications of Neural networks to timing models were Recurrent Neural networks (rcn), which can pass the output of a single Neural Network module to the next module, allowing persistence of information. Recurrent neural networks perform well in many tasks, such as speech recognition, machine translation, etc. In order to solve the long-term dependence problem of the recurrent neural network, Hochreiter et al designs a long-short term memory neural network, and the long-short term memory neural network realizes the acquisition of long-distance information through a well-designed threshold system. And integrating the information of the units before and after the current long-short term memory neural network unit. After passing through the long and short term memory neural network, each picture input already corresponds to an output value, however, there is a potential problem that the label sequence is related to each other.
Under the IOBES labeling approach, there are many labeling approaches that are illegal, such as the inability to connect an I-tag after an O-tag because the beginning of an image area should be a B-tag; the O-tag and B-tag cannot be connected after the I-tag because the end of an image area should be left with the E-tag. However, the model cannot learn the relationship between such labels only by using the long-short term memory neural network. Therefore, the invention adds a conditional random field module after the long-short term memory neural network to deal with the problem of the mutual connection between the labels.
Conditional random fields are a common algorithm for dealing with sequence labeling, which learns the interrelationships between tags.
As shown in fig. 4, in the present experiment, the conditional random field is implemented in such a way that the set of each sub-picture input pixel point is X ═ (X)1,x2,...xi,...xn) The probability value of the output label of each unit of the long-term and short-term memory neural network connected in series front and back is as follows:
Figure BDA0002112289290000061
adding a conditional random field model after the long-short term memory neural network connected in series, and correspondingly outputting a tag formation set of which the structure set is y ═ y (y)1,y2,...,ym)。
For each sequence set y ═ y1,y2,...,ym) Calculating a score:
Figure BDA0002112289290000062
then, a constraint is set to pass the scores of all possible tag sequences through one softmax, in order to make their sum of probabilities 1, and to derive:
Figure BDA0002112289290000063
in training, the log-likelihood function of the correct tag sequence is maximized:
Figure BDA0002112289290000064
and during testing, calculating the label sequence with the highest score through a Viterbi algorithm to be used as the output of the whole sequence label.
The post-processing first finds all sub-graph sequences contained by the BIE sequence and then finds the sub-graph corresponding to a single S classification. In addition, considering that B, E, S has fewer tags, more I and O tags, unbalanced data, and some B and E tags may not be identified in all categories. Therefore, some manual rules need to be defined to perfect the experimental results, such as: and (3) connecting a plurality of I tags after the plurality of O tags, and converting the first I tag into a B tag by using rule correction. Similarly, if multiple I tags are followed by multiple O tags, we also formulate rules to convert the last I tag to an E tag.
And (4) connecting the subgraphs which are judged as the character areas in series through post-processing to obtain the detected character areas, and finishing character detection.
Setting parameters:
in the text detection task, the number of long-term and short-term memory neural network units in each direction is 300. Meanwhile, an Adam optimizer is used to train parameters in the network, with the parameters of the optimizer being 0.001.
The results show that:
in the experiment, a medical internet company provided laboratory sheet table picture was used as a data set, and the data set contained 500 laboratory sheet table pictures in total. The intersection ratio of the detected region and the real region is calculated (IoU), a threshold value is set to 0.8, and when the intersection ratio is greater than 0.8, the detected character region is considered to be correct. The effect of the model is evaluated from three aspects of Precision (Precision), Recall (Recall) and F1-score, wherein the F1-score is calculated as the harmonic mean of the Precision and Recall.
Model (model) Precision(%) Recall(%) F1-score
Faster R-CNN 85.73 87.26 86.45
SSD 85.03 88.60 86.78
Projection method 88.25 88.66 88.45
Our model 91.23 91.78 91.50
And comparing the experimental result with two models, namely fast RCNN and SSD, based on image target detection, and simultaneously comparing the experimental result with a rule segmentation algorithm based on a projection method. Experiments prove that the image segmentation algorithm based on sequence annotation has better effect than other models in the text detection task of the table area.

Claims (4)

1. A method for segmenting character strings in a picture is characterized by comprising the following steps:
the method comprises the following steps of firstly, collecting a plurality of character string pictures to divide the character string pictures into training samples and testing samples;
step two, respectively preprocessing each training sample to obtain a plurality of sub-pictures corresponding to each training sample;
marking each sub-picture of each training sample as a sequence in an IOBES mode aiming at each training sample;
training a model of the bidirectional long-short term memory neural network and the conditional random field by using a training sample for sequence labeling;
the method comprises the following specific steps:
step 401, adopting a bidirectional long and short term memory neural network structure to serially connect unit information before and after a current long and short term memory neural network unit;
step 402, aiming at a training sample, inputting pixel points of each sub-picture in the sample into a long-short term memory neural network connected in series, and outputting five probability values of the sub-picture labels which are IOBES respectively;
let the set of sub-picture input pixel points be X ═ X1,x2,...xi,...xn);xiThe ith pixel point input for the sub-picture; the probability value of the output label of the long-term and short-term memory neural network connected in series is as follows:
Figure FDA0002889348980000011
w is the five-dimensional output value of the full connection layer and is used for corresponding to IOBES;
Figure FDA0002889348980000012
Figure FDA0002889348980000013
the values of the corresponding units of the backward long-short term memory neural network,
Figure FDA0002889348980000014
Figure FDA0002889348980000015
the values of the corresponding units of the forward long-short term memory neural network are obtained;
step 403, adding a conditional random field model after the long-short term memory neural networks are connected in series, and calculating a score for each training sample;
the method specifically comprises the following steps:
the training sample for the jth labeled sequence comprises m sub-pictures in total, and the label set is y ═ y (y)1,y2,...,ym);
First, the label y from the previous sub-picture is calculatedlLabel y transferred to the next sub-picturel+1Sum of probabilities
Figure FDA0002889348980000016
Then, the label probability values of all the sub-pictures are calculated as
Figure FDA0002889348980000017
Figure FDA0002889348980000018
Denotes the l sub-picture label as ylA probability value of (d);
finally, obtaining the score of the training sample;
the calculation formula is as follows:
Figure FDA0002889348980000019
step 404, setting constraint conditions of a model for training the bidirectional long-short term memory neural network and the conditional random field;
the constraint conditions are as follows:
passing the label probability values of all sub-pictures in each training sample through softmax, and ensuring that the probability sum is 1 and is derivable:
Figure FDA00028893489800000110
es(X,y)of the correct tag sequence in the current training sampleA score;
Figure FDA0002889348980000021
selecting one IOBES from output labels for each sub-picture, and combining sequences formed by the labels of all the sub-pictures;
step 405, maximizing the log-likelihood function of the correct label sequence, and optimizing each parameter of the model through a back propagation algorithm;
Figure FDA0002889348980000022
inputting the test sample into the trained models of the bidirectional long-short term memory neural network and the conditional random field to obtain a label sequence with the highest score;
during testing, calculating a label sequence with the highest score through a Viterbi algorithm;
and step six, taking the label sequence with the highest score as a segmentation line of image segmentation, and performing segmentation on the test sample.
2. The method as claimed in claim 1, wherein the preprocessing in the step two is to: firstly, binarizing a picture, and scaling the picture to 25 pixels in height; then, the adjacent 5 columns of pixel points are divided into one sub-picture, and the dimension of each sub-picture is 5 × 25 ═ 125.
3. The method according to claim 1, wherein the IOBES notation in step three is as follows: if the sub-picture input is the beginning of a text region, it is labeled B, if the sub-picture input is inside a text region, it is labeled I, if the sub-picture input is the end of a text region, it is labeled E, if the sub-picture input alone becomes a text region, it is labeled S, if the sub-picture input does not belong to a text region, it is labeled O.
4. The method for segmenting the character string in the picture as claimed in claim 1, wherein the step six process is as follows:
firstly, finding out all sub-graph sequences contained by BIE sequences, and then finding out sub-graph sequences corresponding to single S classification;
then, artificial rules are defined: connecting a plurality of I tags after the plurality of O tags, and converting a first I tag into a B tag by using rule correction; similarly, if a plurality of I tags are followed by a plurality of O tags, the last I tag is converted into an E tag;
and (4) connecting the subgraphs which are judged as the character areas in series through post-processing to obtain the detected character areas, and finishing character detection.
CN201910576925.3A 2019-06-28 2019-06-28 Method for segmenting character strings in picture Expired - Fee Related CN110309769B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910576925.3A CN110309769B (en) 2019-06-28 2019-06-28 Method for segmenting character strings in picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910576925.3A CN110309769B (en) 2019-06-28 2019-06-28 Method for segmenting character strings in picture

Publications (2)

Publication Number Publication Date
CN110309769A CN110309769A (en) 2019-10-08
CN110309769B true CN110309769B (en) 2021-06-15

Family

ID=68079318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910576925.3A Expired - Fee Related CN110309769B (en) 2019-06-28 2019-06-28 Method for segmenting character strings in picture

Country Status (1)

Country Link
CN (1) CN110309769B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699887B (en) * 2020-12-30 2024-07-09 科大讯飞股份有限公司 Method and device for obtaining mathematical object annotation model and mathematical object annotation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN108229582A (en) * 2018-02-01 2018-06-29 浙江大学 Entity recognition dual training method is named in a kind of multitask towards medical domain
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168957A (en) * 2017-06-12 2017-09-15 云南大学 A kind of Chinese word cutting method
CN107391709A (en) * 2017-07-28 2017-11-24 深圳市唯特视科技有限公司 A kind of method that image captions generation is carried out based on new attention model
US20190130251A1 (en) * 2017-10-31 2019-05-02 Google Llc Neural question answering system
CN109815952A (en) * 2019-01-24 2019-05-28 珠海市筑巢科技有限公司 Brand name recognition methods, computer installation and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN108229582A (en) * 2018-02-01 2018-06-29 浙江大学 Entity recognition dual training method is named in a kind of multitask towards medical domain
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于双向LSTM的图结构依存句法分析;谌志群 等;《杭州电子科技大学学报(自然科学版)》;20180115;全文 *
基于深度学习的医疗单据图文识别关键技术研究与实现;邵文良;《中国优秀硕士学位论文全文数据库》;20190915;全文 *

Also Published As

Publication number Publication date
CN110309769A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
He et al. Multi-scale multi-task fcn for semantic page segmentation and table detection
CN113220919B (en) Dam defect image text cross-modal retrieval method and model
CN110569696A (en) Neural network system, method and apparatus for vehicle component identification
Rehman et al. Performance analysis of character segmentation approach for cursive script recognition on benchmark database
CN112686345B (en) Offline English handwriting recognition method based on attention mechanism
CN107622271B (en) Handwritten text line extraction method and system
CN112651940B (en) Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
Wu et al. Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks.
Xiao et al. An extended attention mechanism for scene text recognition
Georgieva et al. Optical character recognition for autonomous stores
CN113807218B (en) Layout analysis method, device, computer equipment and storage medium
CN110309769B (en) Method for segmenting character strings in picture
CN111832497B (en) Text detection post-processing method based on geometric features
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
Kumar et al. Scene text recognition using artificial neural network: a survey
Lai et al. Robust text line detection in equipment nameplate images
Vankadaru et al. Text Identification from Handwritten Data using Bi-LSTM and CNN with FastAI
Kataria et al. CNN-bidirectional LSTM based optical character recognition of Sanskrit manuscripts: A comprehensive systematic literature review
Gao et al. Complex Labels Text Detection Algorithm Based on Improved YOLOv5
CN108334884B (en) Handwritten document retrieval method based on machine learning
Sun et al. Contextual models for automatic building extraction in high resolution remote sensing image using object-based boosting method
Le et al. An Attention-Based Encoder–Decoder for Recognizing Japanese Historical Documents
Gupta et al. iCAST: Impact of Climate on Assistive Scene Text detection for autonomous vehicles
Islam et al. An enhanced MSER pruning algorithm for detection and localization of bangla texts from scene images.
CN116543389B (en) Character recognition method, device, equipment and medium based on relational network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210615