CN102332097A

CN102332097A - Method for segmenting complex background text images based on image segmentation

Info

Publication number: CN102332097A
Application number: CN201110322549A
Authority: CN
Inventors: 王春恒; 史存召; 肖柏华; 周文
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Infan Technology (beijing) Co Ltd
Priority date: 2011-10-21
Filing date: 2011-10-21
Publication date: 2012-01-25
Anticipated expiration: 2031-10-21
Also published as: CN102332097B

Abstract

The invention discloses a method for segmenting complex background text images based on image segmentation. The method comprises the following steps of: 1) coarsely segmenting an original text block image into sub-images; 2) estimating the polarity of each sub-image to finally determine the polarity of the whole text block image; 3) according to the polarity of the text block image, automatically providing certain foreground and background points with high confidence level as hard constraints of the image segmentation by combining inherent characteristics of character strokes; 4) applying corresponding soft constraints to the sub-images, propagating the hard constraints to the whole sub-images by using the image segmentation, and then segmenting the sub-images; and 5) merging the segmented sub-images to obtain an integral text segmented image. The method adopts a segmentation-combination technique, and has local space adaptability, so the method can be used for segmenting complex background text block images with non-uniform backgrounds; and simultaneously, by the method, the hard restraints are automatically provided for the image segmentation and are expanded to the whole sub-images by combining the soft restraints, so a good segmentation effect can be achieved for the text images with complex backgrounds.

Description

A kind of complex background text image dividing method that cuts based on figure

Technical field

The present invention relates to the text image segmentation technology in pattern-recognition and the field of machine vision, a kind of specifically complex background text image dividing method that cuts based on figure.

Background technology

Along with the widespread use of image acquisition equipments such as digital camera, camera, hypervelocity scanner, the information in the image more and more causes people's attention, yet the content of computer understanding image is also very difficult at present.The literal that embeds in the image can provide some people desired important information, and the content of understanding in the image is had important help.Let computing machine as the literal in the human recognition image; It is the automatic detection recognition system of literal; More and more caused people's attention in recent years; It is extremely important for storage, classification, understanding and the retrieval etc. of image and video, has a wide range of applications and commercial value.In many cases, the scene literal in the image even become that image is main, the information of most critical, so Many researchers is devoted to study the detection method research of image Chinese version piece; Yet because the text block in the image often has very complicated background, such as illumination, literal size, resolution etc.; Detected text block is directly sent into traditional OCR recognition engine; The non-constant of its recognition effect, therefore, the cutting techniques of text block; Be the important technology that connects text detection and identification, indispensable for the superperformance of total system.

Present most of text block dividing method can roughly be classified as two types: statistical threshold method and machine learning method.Wherein the statistical threshold method calculates global threshold according to the statistical property of the gray scale of image or color or local threshold comes text image is just cut apart; These class methods are passable for traditional scanned document or the comparatively simple text block segmentation effect of background; Yet when literal and background have close brightness, can't be fine cut apart.The method of machine learning comprises unsupervised color cluster, various model learning methods.When literal and background have close color, the method for color cluster will lose efficacy; If can learn out proper model, the Model Selection method can obtain satisfied effect, is difficult to realize yet learn out a kind of model that can cut apart the text block of any complex background.

The statistical threshold method does not make full use of the architectural characteristic of strokes of characters, and the needed a large amount of training samples of study appropriate model are difficult to obtain.Its real literal also is a kind of special target, therefore can adopt various target dividing methods.Wherein interactive target is cut apart and is more and more received people's welcome, and the figure technology of cutting is used widely in this regard.Traditional interactive target is cut apart needs the user to provide some labels, yet considers the inherent characteristic of literal, can provide some labels automatically for figure cuts, thereby realizes literal being cut apart with scheming to cut.

Summary of the invention

The purpose of this invention is to provide a kind of complex background text image dividing method that cuts based on figure, adopt and divide-close technology, this method has the local space adaptivity, therefore can handle the uneven complex background text image of background; Simultaneously,, provide some labels as hard constraint for figure cuts automatically, these hard constraints are diffused into whole subgraph and then cut apart subgraph in conjunction with soft-constraint according to the inherent feature of strokes of characters.Subgraph after cutting apart forms whole text segmentation image through merging.

For achieving the above object, technical solution of the present invention is following:

A kind of complex background text image dividing method that cuts based on figure is characterized in that, may further comprise the steps:

Step 1 is several subgraphs with the rough segmentation of urtext piece image;

Step 2 through judging the polarity of each subgraph, is confirmed the polarity of whole text block image;

Step 3 according to the polarity of text block image, in conjunction with the inherent feature of character stroke, is cut the hard constraint that provides higher foreground point of some degree of confidence and background dot to cut as figure for figure automatically;

Step 4, the hard constraint according to obtaining applies corresponding soft-constraint to subgraph, cuts with figure hard constraint is propagated into whole subgraph, and then obtain the optimum segmentation of subgraph;

Step 5 merges the subgraph of the optimum segmentation that obtains and obtains whole text segmentation image.

The present invention adopts branch-close technology, at first text image is divided into subgraph roughly, subgraph is operated, so this method has the local space adaptivity again, can handle the uneven complex background text image of background; Simultaneously, according to the inherent feature of strokes of characters, this method provides some labels as hard constraint for figure cuts automatically, in conjunction with soft-constraint these hard constraints is diffused into whole subgraph and then cuts apart subgraph.This method has good segmentation effect to the text image of complex background.

Description of drawings

Fig. 1 is the process flow diagram of a kind of complex background text image dividing method that cuts based on figure of proposing of the present invention.

Fig. 2 is the synoptic diagram as a result that among the present invention text image is divided into subgraph.

Fig. 3 is that hard constraint obtains criterion and synoptic diagram as a result among the present invention.

Fig. 4 is the text image segmentation result synoptic diagram according to the embodiment of the invention.

Embodiment

For making the object of the invention, technical scheme and advantage clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, to further explain of the present invention.

Fig. 1 is the embodiment process flow diagram of the method for the invention, and with reference to Fig. 1, a kind of complex background text image dividing method that cuts based on figure that the present invention proposes specifically may further comprise the steps:

Step 1 is several subgraphs with the rough segmentation of urtext piece image;

At first, import a secondary urtext piece image, ask for urtext piece edge of image image; Then; Edge image is carried out the connected domain analysis,, find the connected domain subgraph of some its characteristic conforms character properties according to some characteristics of character connected domain; As " seed " subgraph, be several subgraphs with the rough segmentation of urtext piece image according to these " seed " subgraphs.

Wherein, When finding " seed " subgraph according to " seed " subgraph the urtext image to be carried out rough segmentation afterwards; Because this step cut apart the integrality of wanting guarantee information; Therefore the All Ranges except seed subgraph zone all will be considered, and partly adopts pressure to cut apart for the All Ranges beyond the seed subgraph zone here, all can be segmented in the middle of the subgraph to guarantee all literal.

Subgraph to the urtext piece image of input carries out obtaining after rough segmentation is cut is as shown in Figure 2.

Step 2, the polarity of each subgraph that is partitioned into through judgement is finally confirmed the polarity of whole text block image;

At first, each subgraph is carried out initial binaryzation with traditional method, the stroke width Stroke_width of statistics subgraph literal _Origin, and add up the subgraph expansion after the initial binaryzation and corrode the stroke width Stroke_width of literal afterwards _DilateAnd Stroke_width _Erode, according to the polarity of following rule judgment subgraph,, the back strokes of characters that promptly expands attenuates if broadening, corrode the back stroke, the polarity of subgraph then, and promptly prospect is 1, otherwise is 0:

Wherein, Foreground is the subgraph prospect, and it is 1 (black matrix wrongly written or mispronounced character) that white is represented subgraph polarity, and vice versa; Expand and adopt identical template with corrosion, simultaneously, in order to ensure the objectivity of stroke width statistics; Promptly the subgraph to all polarity all can adapt to; Preferably, stroke width needs to add up on the image on the edge of, promptly; At first the subgraph after the binaryzation is asked for edge image, and then add up stroke width on the image on the edge of.

Then, add up the polarity of each sub-graphs, choose the polarity of whole text block image in a vote.

Add up the polarity of each sub-graphs, choose the polarity of whole text block image in a vote, be specially: if a certain text block image Semi-polarity is 1 subgraph quantity greater than polarity is 0 subgraph quantity, and then the polarity of text piece image is 1.

After the polarity of text piece image confirms as 1, think that the polarity of all subgraphs that text piece image is comprised is 1.

Step 3 according to the polarity of whole text block image, in conjunction with the inherent feature of character stroke, is cut the hard constraint that provides higher foreground point of some degree of confidence and background dot to cut as figure automatically for figure;

At first, consider the characteristic that character stroke had: 1) stroke of same word generally has identical stroke width; 2) same section stroke generally has identical color or brightness; 3) read for the ease of people, near the color of the point the stroke is general different with the color or the brightness of stroke.According to above characteristic, level, each sub-graphs of vertical scanning obtain changing oscillogram corresponding to the brightness of each subgraph respectively.

Then, confirm candidate's prospect and background dot according to the polarity of brightness variation oscillogram and text block image.Such as, if the polarity of text block image is 1, promptly prospect is 1; The width of its crest should be between 1 to 7 (pixel), and then choosing the crest that meets certain condition is prospect as candidate's stroke, and trough is then as candidate background; Otherwise if the polarity of text block image is 0, promptly prospect is 0; Then choose meet above width trough as candidate's prospect, crest is then as candidate background.Preferably, selecting the stroke of brightness on the crest mean flow rate of crest is candidate's stroke, i.e. prospect, and remaining then is a candidate background.

At last, be that prospect and candidate background are carried out cluster to these candidate's strokes, get the hard constraint point that cuts as figure from the nearer prospect of cluster centre point, background dot.

This be because, from cluster centre near more, then its possibility that belongs to prospect or background is big more, promptly its degree of confidence is high more.

Shown in Figure 3 is inherent feature according to the polar bond literal of text block image, for subgraph obtains the higher prospect of degree of confidence and the synoptic diagram of background pixel point.Leftmost image is an original image among Fig. 3, and the providing through the resulting four lines brightness of horizontal scanning left side original image variation diagram of middle example images property is because the polarity of text piece image is 1; So select for use black arrow mark crest be corresponding candidate's prospect corresponding to the point among the former figure; Remaining then is a candidate background, obtains candidate's prospect and background dot automatically according to this principle, and then picks out the higher prospect of degree of confidence and background dot as hard constraint through cluster; Shown in Fig. 3 rightmost image; Wherein white is represented the higher foreground point of degree of confidence, and black is represented the background dot that degree of confidence is higher, and the rest of pixels in the image is used grey colour specification.

Step 4 according to the hard constraint that obtains, and to each sub-graphs that rough segmentation in the step 1 obtains, applies corresponding soft-constraint, cuts with figure these hard constraints are propagated into whole subgraph, and then obtain the optimum segmentation of subgraph;

At first, the hard constraint that obtains according to step 3 is for figure cuts the setting soft-constraint.Suppose the node of all pixels of subgraph for " figure ", adjacent 8 pixels of each node with the set that P representes node, are used L={L for the neighborhood of figure ₁, L ₂... L _p... represent the segmentation tag of each node, if be prospect, L _p=1, on the contrary then be 0.With the soft-constraint that loss function E (L) presentation graphs cuts, comprise area loss R (L) and border loss B (L) two parts, be shown below:

E(L)＝λR(L)+B(L)，

Wherein, λ has reflected the proportion relation between R (L) and the B (L);

R (L) = \underset{p &Element; P}{Σ} R_{p} (L_{p}),

P is a certain node among the figure;

B (L) = \underset{{p, q} &Element; N}{Σ} B_{{p, q}} * δ (L_{p}, L_{q}),

P, q are 2 adjacent points among the figure; N is the set of neighbor pixel, B _{{ p, q}}Be adjacent border loss at 2; δ (L _p, L _q) be impulse function, be 1 as p, when q has same label, all the other situation are 0:

δ (L_{p}, L_{q}) = \{\begin{matrix} 1 & If L_{p} &NotEqual; L_{q} \\ 0 & Otherwise . \end{matrix} .

Area loss is that certain pixel is divided into the loss that prospect or background are brought, the area loss R of each pixel _p(L _p) comprise two parts losses, R _p(L _p)=R _p(0)+R _p(1), R wherein _p(1) is the loss that this pixel is categorized as prospect, R _p(0) be the loss that this pixel is divided into background, therefore can stipulate: if color of pixel approaches the color of prospect, then prospect is lost R _p(1) should be less, and background loss R _p(0) should be bigger.

The concrete computing method of area loss are following:

1) with the prospect background point that obtains in the step 3 cluster respectively, to suppose to gather respectively and be n class and m class, the foreground point cluster centre is Center{fore} _n, the background dot cluster centre is Center{back} _m

2), calculate this and put each distances of clustering centers

and

for certain pixel p

3) R so, _p(1) and R _p(0) can define as follows;

R_{p} (1) = \min {Dist {fore}_{k}^{p}},

k＝1，2，...n

R_{p} (0) = \min {Dist {back}_{k}^{p}},

k＝1，2，...m.

Border loss B (L) is the discontinuous loss that causes of neighbor, promptly, that is to say the discontinuous punishment of neighbor, if the neighbor feature similarity, then should B _{{ p, q}}Bigger, on the contrary then little.

B (L) can be set to neighbor pixel p, the decreasing function of distance between the q, and B (L) adopts like minor function here:

B_{{p, q}} = \exp (- \frac{{({color}_{p} - {color}_{q})}^{2}}{2 σ^{2}}),

Wherein, color _p, color _qBe respectively pixel p, the R of q, G, B color characteristic, σ is a scale factor, is made as 0.25.

Then, use max-flow/minimal cut algorithm to find the best dividing method that satisfies hard constraint.That is, use max-flow/minimal cut algorithm to obtain the minimum subgraph segmentation result of feasible as above loss function E (L) (soft-constraint) under hard constraint, be the optimum segmentation result of subgraph for whole text block image.

That is to say, cut the hard constraint that step 3 is obtained, be diffused into whole subgraph, promptly solve the minimal solution of loss function, obtain the minimal cut result of text block image through the soft-constraint (border loss and area loss) that defines in the step 4 with figure.

The subgraph of the optimum segmentation that obtains is merged, and the subgraph splicing of the white gravoply, with black engraved characters that is about to be partitioned into is merged into final two-value split image, i.e. the text segmentation image on black matrix.

According to hard constraint, in conjunction with soft-constraint, with scheming to cut to cut apart subgraph, the result after the subgraph that then optimum segmentation is obtained merges is as shown in Figure 4.A secondary picture is the text block image of original input above Fig. 4, below a pair then for cutting and cut apart subgraph with figure, the text segmentation image of the integral body that obtains after the subgraph merging that then optimum segmentation is obtained.

The above; Be merely the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with this technological people in the technical scope that the present invention disclosed; Conversion or the replacement expected can be understood, all of the present invention comprising within the scope should be encompassed in.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims

1. a complex background text image dividing method that cuts based on figure is characterized in that, may further comprise the steps:

Step 1 is several subgraphs with the rough segmentation of urtext piece image;

2. the method for claim 1 is characterized in that, said step 1 is specially:

Asking for urtext piece edge of image image, edge image is carried out the connected domain analysis obtain " seed " subgraph, is several subgraphs according to said " seed " subgraph with the rough segmentation of urtext piece image.

3. method as claimed in claim 2; It is characterized in that; When the rough segmentation of urtext piece image being several subgraphs according to " seed " subgraph; Be the integrality of guarantee information, partly adopt for the All Ranges beyond " seed " subgraph zone and force to cut apart, all can be segmented in the subgraph to guarantee all literal.

4. the method for claim 1 is characterized in that, judges in the said step 2 that the polarity of subgraph is specially:

Each subgraph is carried out initial binaryzation, the stroke width of statistics subgraph literal, and subgraph expands and the stroke width of corrosion back literal, attenuates if the stroke of subgraph expansion back literal broadens, corrodes the back stroke, then the polarity of this subgraph is 1, otherwise is 0.

5. the method for claim 1 is characterized in that, confirms in the said step 2 that the polarity of whole text block image is specially:

According to the polarity of subgraph, through choosing the polarity of whole text block image in a vote.

6. the method for claim 1 is characterized in that, said step 3 specifically comprises:

According to the characteristic that character stroke had, level, each sub-graphs of vertical scanning obtain changing oscillogram corresponding to the brightness of each subgraph respectively;

The polarity that changes oscillogram and text block image according to brightness is confirmed candidate foreground point and background dot;

Cluster is carried out in candidate foreground point and background dot, get the hard constraint point that cuts as figure from the nearer prospect of cluster centre point, background dot.

7. the method for claim 1 is characterized in that, the loss function that the soft-constraint in the said step 4 is cut for figure, said loss function E (L) comprise area loss R (L) and border loss B (L):

E(L)＝λR(L)+B(L)，

Wherein, λ is the proportion relation between R (L) and the B (L).

8. method as claimed in claim 7 is characterized in that, area loss R (L) is divided into the loss that prospect or background are brought with certain pixel:

R (L) = \underset{p &Element; P}{Σ} R_{p} (L_{p}),

Wherein, p is a certain node among the figure; L _pSegmentation tag for node p; The area loss R of each pixel _p(L _p) comprise two parts:

R _p(L _p)＝R _p(0)+R _p(1)，

Wherein, R _p(1) is the loss of prospect that this pixel is divided into, R _p(0) is the loss that this pixel is divided into background.

9. method as claimed in claim 7 is characterized in that, border loss B (L) is the discontinuous loss that causes of neighbor:

B (L) = \underset{{p, q} &Element; N}{Σ} B_{{p, q}} * δ (L_{p}, L_{q}),

Wherein, p, q are 2 adjacent points among the figure, and N is the set of neighbor pixel, B _{{ p, q}}Be adjacent border loss, δ (L at 2 _p, L _q) be impulse function.

10. the method for claim 1; It is characterized in that; Cut hard constraint is propagated into whole subgraph with figure in the said step 4; And then the optimum segmentation that obtains subgraph is specially: use max-flow/minimal cut algorithm to obtain the subgraph segmentation result that makes that under the hard constraint of step 3 soft-constraint is minimum, be the optimum segmentation result of subgraph.