CN107093172A

CN107093172A - character detecting method and system

Info

Publication number: CN107093172A
Application number: CN201610091568.8A
Authority: CN
Inventors: 徐昆; 郭晓威; 黄飞跃; 郑宇飞; 张惜今; 卢艺帆
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Priority date: 2016-02-18
Filing date: 2016-02-18
Publication date: 2017-08-25
Anticipated expiration: 2036-02-18
Also published as: CN107093172B; WO2017140233A1

Abstract

The invention discloses a kind of character detecting method and system；Method includes：Each image in three chrominance channels of target image is subjected to processing of losing lustre, image of losing lustre is obtained, and, target image is converted into bianry image；Connection block with same hue in image of losing lustre is merged, and the connection merged block that will there is same hue in bianry image；To the connection block in the connection block and bianry image of the every kind of color channel for the chrominance channel of image three of losing lustre, merged in a vertical and horizontal direction in the way of connection respectively, obtain the character area of candidate in target image；Specific region is extracted in the position of the character area of correspondence candidate on target image, and the comparative result based on the probability comprising character area in the specific region extracted and predetermined probabilities threshold value judges whether include literal line or text line in the specific region extracted.Implement the present invention, the text in image can accurately be detected.

Description

Character detecting method and system

Technical field

The present invention relates to the text detection technology in image, more particularly to a kind of character detecting method and system.

Background technology

File and picture is the document of picture format, and it is by paper document by certain mode (such as scanning) The document of picture format is converted into, for user's electronic reading, the typical case of file and picture is portable text Shelves form (PDF, Portable Document Format) format-pattern and DjVu format-patterns.

Current text detection technology can be detected to the word in file and picture (to be carried in positioning image The region of word), and the region based on the carrying word detected carries out Text region.

Image in general sense not only includes file and picture, in addition to non-document image is (namely by sweeping The user retouched in format-pattern such as network album uploads image, and these images are probably joint photo expert group (JPG) image, bitmap (BMP) image, TIF (TIFF) image, figure are handed over Change form (GIF) image and tradable image file format (EXIF) image etc..

If the word in non-document format-pattern can be recognized, accurate semantic information can be obtained, is helped User search, management image.Want to recognize the word in the word in Non-scanning mode format-pattern, detection image It is necessary previous step, differentiates that image is using the feature being manually specified more than current text detection technology Whether include word in no, and detected more for English character, due to Chinese with English in font knot There is significant difference on structure, lived when being detected applied to the Chinese in file and picture in the degree of accuracy and file and picture There is larger difference in the precision of detection English, it is difficult to meet the demand of practical application.

The content of the invention

The embodiment of the present invention provides a kind of character detecting method and system, the text in image can be carried out accurate Really detection.

What the technical scheme of the embodiment of the present invention was realized in：

In a first aspect, the embodiment of the present invention provides a kind of character detecting method, methods described includes：

Each image in three chrominance channels of target image is subjected to processing of losing lustre, image of losing lustre is obtained, and, The target image is converted into bianry image；

It will be merged in the image of losing lustre with the connection block of same hue, and by the bianry image In have same hue connection merged block；

To in the connection block and the bianry image of every kind of color channel of the chrominance channel of image three of losing lustre Connection block, merged in a vertical and horizontal direction in the way of connection respectively, obtain the target figure The character area of candidate as in；

Specific region is extracted in the position of the character area of the correspondence candidate on the target image, based on institute The probability comprising character area and the comparative result of predetermined probabilities threshold value judge institute in the specific region extracted State in the specific region of extraction and whether include literal line or text line.

Preferably, each image in three chrominance channels by target image carries out processing of losing lustre, and is subtracted Color image, including：

The quantization that each passage does K grade respectively in the redgreenblue passage of the target image is obtained The interval of K grade；

Brightness of each pixel in the chrominance channels of RGB tri- in the target image is mapped into respective channel to quantify It is interval in, K is integer and 255>K>1.

Preferably, it is described to merge the connection block with same hue in image of losing lustre, and will be described There is the connection merged block of same hue in bianry image, including：

One individually connection is used as to each pixel in the image of losing lustre and in the bianry image Block, the Union-find Sets set up for the pixel perform following handle：

If the pixel is identical with the color of any pixel in the pixel of 8 adjoinings, by two adjacent colors Connection merged block belonging to color identical pixel is same connection block.

The elemental area of each connection block is judged, if the elemental area of the connection block is less than Elemental area threshold value, then by the connection block be incorporated to the adjacent connection block of the connection block, and by the company The color of logical block is set to the color of be incorporated to connection block.

Preferably, it is described to merge the connection block with same hue in the image of losing lustre, and will In the bianry image after the connection merged block with same hue, methods described also includes：

Abandon the connection block for meeting default feature in image of losing lustre and in bianry image；The default feature bag Include at least one of：

Area is less than the connection block of elemental area threshold value in the connection block；

Any one edge lengths are more than the connection block of the first preset ratio of the respective image length of side in the connection block；

Any one length of side is more than frame length threshold, and elemental area and the ratio of bounding box product in the connection block Connection block of the value less than fractional threshold.

The position relationship of connection block based on every kind of color channel in the image of losing lustre merge respectively for New connection block, and merged for the connection block in the bianry image based on position relationship to be new Connect block；At least one wherein, including below performing handle：

Combined distance is less than the connection block of distance threshold；

The maximum in the average value for the respective length and width that block is connected described in any two is taken, if the maximum Value meets preparatory condition, merges selected described two connection blocks；

Merge the connection block that bounding box meets default cross feature in the presence of intersection and cross section；

Merge bounding box alignment and meet the connection block that default alignment merges rule.

Preferably, connection block, the Yi Jisuo of every kind of color channel to the chrominance channel of image three of losing lustre The connection block in bianry image is stated, is merged, obtained in the way of connection in a vertical and horizontal direction respectively The character area of candidate into the target image, including：

Merging, the merging of vertical direction based on the different types of horizontal direction of progress successively of connection merging rule, And the merging of horizontal direction；Wherein, the connection merges rule and included：

Two connection blocks for meeting the connection selection of at least one following condition are new connection block：

Minimum of the bounding box of two connection blocks in the upward centre distance of reference axis or Edge Distance The of the minimum length of side in distance, the bounding box correspondence length of side with reference to axial direction of connection block described less than two One preset ratio；

The bounding box of two connection blocks is being less than two in the distance in the reference axially direction Second default ratio of the bounding box of the individual connection block in the minimum length of side in the length of side with reference to axial direction Example；

The bounding box of two connection blocks is less than two connections in the difference of the length of side with reference to axial direction 3rd preset ratio of the minimum length of side in the bounding box correspondence length of side with reference to axial direction of block.

Preferably, the position of the character area of the correspondence candidate on the target image extracts specific Region, based on the probability comprising character area in the specific region extracted and the ratio of predetermined probabilities threshold value Relatively result judges literal line or text line whether are included in the specific region of the extraction, including：

, will be in lose lustre image and described two to extract a specific region on the target image The bounding box that value image is connected, will be in image and the two-value of losing lustre with specific sliding window step-length sliding window Connect and differentiate in obtained bounding box feeding convolutional neural networks grader in figure, obtain in each sliding window Probability comprising word；

Probability comprising word in the sliding window is averaged, obtaining the character area of the candidate includes text The probability of word row or text line；

If resulting is more than default probability threshold value, judge there is literal line or text in the specific region Word is arranged.

Second aspect, the embodiment of the present invention provides a kind of text detection system, and the system includes：

The two-value that loses lustre processing unit, place of losing lustre is carried out for each image in three chrominance channels by target image Reason, obtains image of losing lustre, and, the target image is converted into bianry image；

First combining unit, for the connection block with same hue in the image of losing lustre to be merged, And the connection merged block that will there is same hue in the bianry image；

Second combining unit, connection block for every kind of color channel to the chrominance channel of image three of losing lustre, And the connection block in the bianry image, closed in a vertical and horizontal direction in the way of connection respectively And, obtain the character area of candidate in the target image；

Extract special in judging unit, the position for the character area of the correspondence candidate on the target image Region is determined, based on the probability comprising character area in the specific region extracted and predetermined probabilities threshold value Comparative result judges whether include literal line or text line in the specific region of the extraction.

Preferably, the two-value processing unit of losing lustre, is additionally operable to lead to the redgreenblue of the target image Each passage does the quantization of K grade and obtains the interval of K grade respectively in road；

Preferably, first combining unit, is additionally operable to in the image of losing lustre and the bianry image In each pixel as a single connection block, the Union-find Sets set up for the pixel perform following locate Reason：

First combining unit, if being additionally operable to the pixel and the color of any pixel in the pixel of 8 adjoinings It is color identical, then it is same connection block by the connection merged block belonging to two adjacent color identical pixels

First combining unit, is additionally operable to judge the elemental area of each connection block, if The elemental area of the connection block is less than elemental area threshold value, then is incorporated to the connection block and connects block with described Adjacent connection block, and the color of the connection block is set to the be incorporated to color for connecting block.

Preferably, the system also includes：

Discard processing unit, for will to there is same hue in the image of losing lustre in first combining unit Connection block merge, and will in the bianry image with same hue connection merged block after, Abandon the connection block for meeting default feature in image of losing lustre and in bianry image；The default feature include with It is at least one lower：

Discard the connection block that area in the connection block is less than elemental area threshold value；

Abandon the connection that any one edge lengths in the connection block are more than the first preset ratio of the respective image length of side Block；

Abandon any one length of side in the connection block and be more than frame length threshold, and elemental area is accumulated with bounding box Ratio be less than fractional threshold connection block.

Preferably, the system also includes

4th combining unit, for will to there is same hue in the image of losing lustre in first combining unit Connection block merge, and will in the bianry image with same hue connection merged block after, The position relationship of connection block based on every kind of color channel in the image of losing lustre merges to be new respectively Block is connected, and is merged for the connection block in the bianry image based on position relationship as new connection Block；

Wherein, the 4th combining unit, is additionally operable to perform at least one following processing：

Combined distance is less than the connection block of distance threshold；

Preferably, second combining unit, is additionally operable to merge rule based on connection and different types of enters successively The merging of the merging of row horizontal direction, the merging of vertical direction, and horizontal direction；Wherein, the connection Merging rule includes：

Preferably, the judging unit, is additionally operable to extract a specific region on the target image, The bounding box that will be connected in lose lustre image and the bianry image, will with specific sliding window step-length sliding window Connected in lose lustre image and the binary map in obtained bounding box feeding convolutional neural networks grader Differentiate, obtain the probability for including word in each sliding window；

The judging unit, is additionally operable to average to the probability comprising word in the sliding window, obtains described The character area of candidate includes the probability of literal line or text line；

The judging unit, if be additionally operable to it is resulting be more than default probability threshold value, judge described specific There is literal line or text line in region.

According to color segmentation it is connection block to image in the embodiment of the present invention, block will be connected to be latent comprising word Bounding box, then verify that each bounding box includes literal line (or text line) with convolutional neural networks sliding windows Probability, when probability be more than predetermined probabilities threshold value when judge bounding box in include literal line (or text line), Above-mentioned processing is applied to file and picture and non-document image, and the text in image accurately can be detected.

Brief description of the drawings

Fig. 1 is a schematic flow sheet one of character detecting method of the embodiment of the present invention；

Fig. 2 is a schematic flow sheet two of character detecting method of the embodiment of the present invention；

Fig. 3 to Fig. 6 is the testing result schematic diagram of character detecting method in the embodiment of the present invention；

Fig. 7 to Fig. 8 is the schematic diagram of convolutional neural networks in the embodiment of the present invention；

Fig. 9 is an optional structural representation of text detection system of the embodiment of the present invention.

Embodiment

The embodiment of the present invention, which is provided, a kind of to be used to (to include the image and Non-scanning mode form of scan format in image Image) the middle method and system for detecting word, the image of scan format of the image not only including routine here Such as PDF format, non-document image can also be included as combined photo expert group (JPG) image, bitmap (BMP) It is image, TIF (TIFF) image, GIF(Graphic Interchange format) (GIF) image, commutative Image file format (EXIF) image etc..

The text detection system that the embodiment of the present invention is recorded is by implementing file test method to carrying text in image The region of word is positioned, and the image that file detecting system carries out text detection can be file and picture such as PDF Document or non-document image, such as JPG images, BMP images, tiff image, GIF images And exif image, it is used as a source of image, mainly electronic equipment (such as smart mobile phone, flat board Computer, notebook computer) the printed matter such as screenshotss, poster magazine scanning electron version and other contain The digital picture of printed Chinese character.

, in a step 101 will be every in three chrominance channels of target image in the embodiment of the present invention referring to Fig. 1 Individual image carries out processing of losing lustre, and obtains image of losing lustre, and, the target image is converted into bianry image； In a step 102, it will be merged in the image of losing lustre with the connection block of same hue, and by institute State the connection merged block in bianry image with same hue；In step 103 to the color of image three of losing lustre Connection block in the connection block of every kind of color channel of passage and the bianry image, respectively vertical and Merged in horizontal direction in the way of connection, obtain the character area of candidate in the target image； In step 104, specific region is extracted in the position of the character area of the correspondence candidate on the target image, Based on the probability comprising character area in the specific region extracted and the comparative result of predetermined probabilities threshold value Judge whether include literal line or text line in the specific region of the extraction.

As can be seen that text detection system is by by the Color-based clustering of image, layering, connecting merged block and mistake Filter, and the differentiation based on depth convolutional neural networks, position the text in the image as shown in Fig. 3 to Fig. 6 One's own profession (or for text line, the literal line of such as Chinese character, can certainly for letter as English alphabet, numeral, The line of text of symbol, or the text that any type of character combination such as Chinese character, letter, numeral, symbol is formed OK), so that the word in line of text is identified the line of text based on positioning.

Below in conjunction with drawings and the specific embodiments, the present invention will be described in further detail.It should be appreciated that The specific embodiments described herein are merely illustrative of the present invention, is not intended to limit the present invention.

Embodiment one

Referring to Fig. 2, the method for the present embodiment text detection system detectio word comprises the following steps：

Step 201, target image lose lustre and handle the image of losing lustre for obtaining target image.

The target image to be detected is inputted, by each passage of the color of RGB (RGB) three of target image point Do not do K grade quantization (K be integer and 255>K>1, such as value is 4), i.e., the colors of RGB tri- leads to It is K interval (Bin) that the brightness of each passage, which divides (being for example evenly dividing), in road, that is, by 0-255 Brightness degree be reduced to 0- (K-1) grade, by each pixel in target image in the chrominance channels of RGB tri- Brightness is mapped in the Bin of respective channel division, for target image, due in the chrominance channels of RGG tri- Each passage has 256 brightness degrees (0-255), therefore target image can have 255^3 (255 Cube) color is planted, and the brightness of each passage is divided into behind K interval in the chrominance channels of RGB tri-, mesh There is logo image K^3 (K cube, less than 255^3) to plant color, therefore obtain image f1 of losing lustre.

So that K values are 2 as an example, each passage has the grade of 0 and 1 two brightness after quantization, also It is that 0-127 in the brightness degree 0-255 by each passage is mapped to the brightness 0 after quantifying, by each passage Brightness degree 0-255 in 128-255 be mapped to the brightness 1 after quantifying, such as a pixel in target image The chrominance channels of corresponding RGB tri- brightness be (0,122,255), then drop color processing after brightness for (0,0, 1) processing of above-mentioned brightness mapping, is carried out to each pixel in target image.

Because the word in image generally has 2 kinds of situations:1) word is monochromatic；2) brightness of word and text There is notable difference in the region on word periphery.Step 201 realizes following technology for above-mentioned two situations respectively Effect：The word in image that makes to lose lustre has one kind of K^3 kind colors.

Step 202, the bianry image that local binarization processing obtains target image is carried out to target image.

Target image is converted into gray-scale map (only one of which gray channel), local auto-adaptive is carried out to gray-scale map Binaryzation：Gray-scale map is divided into N number of window, to each window in this N number of window according still further to one Pixel in window is divided into two parts by unified threshold value T, and it is that this pixel is to obtain bianry image f2, T The Gauss weighted sum of the window of the pre-set dimension (such as 25*25 pixels) at center.

Because the word in image generally has 2 kinds of situations:1) word is monochromatic；2) brightness of word and text There is notable difference in the region on word periphery.Step 202 realizes following technology for above-mentioned two situations respectively Effect：The word in bianry image is set to belong to one kind of black or white.

Step 201 pixel corresponding with the word in lose lustre image and the bianry image that are obtained in step 202 With identical color, using each pixel as connection block and by the connection with same hue in step 203 Merged block, so that word be attached.

Step 203, recognize has in the connection block in image of losing lustre and in bianry image, the image that will lose lustre The connection block of same hue is merged, and the connection merged block that will have same hue in bianry image.

For the connection block of each color channel of the chrominance channels of RGB tri- for image f1 of losing lustre, and binary map As f2 connection block (only one of which gray level image), following handle is performed：

1) to each pixel as one individually connection block (namely connected subgraph, is the concept in graph theory, Using each pixel on image as a summit in non-directed graph, a line has been regarded as between adjacent pixel, Whole image is regarded as a non-directed graph).

2) Union-find Sets are set up, Union-find Sets are a kind of classical algorithms, for expeditiously carrying out connection merged block Process).

3) image f1 of losing lustre, and bianry image f2 each pixel is traveled through to perform following processing：

Travel through the pixel in image f1 of losing lustre：For a certain pixel, if the pixel and the pixel of 8 adjoinings (are Refer to pixel up and down and 2 cornerwise two ends totally 8 adjacent pixels) in any pixel Color (pixel color of any passage in RGB channel refers to brightness value of the pixel in respective channel, as Color in plain gray-scale map refers to gray value of the pixel in the gray-scale map) it is identical, then by two adjacent colors Connection merged block belonging to color identical pixel is same connection block；Then, each connection block is traveled through, it is right The elemental area of each connection block is judged：If connecting the block k (numbers of k span correspondence connection block Amount) elemental area be less than elemental area threshold value (4 pixel), then connection block k (elemental area be less than picture Plain area threshold) the connection block adjacent with connection block k is incorporated to, and (elemental area is less than by connection block Elemental area threshold value) color be set to the color of be incorporated to connection block.

For example, for the pixel i in image f1 of losing lustre, (i values are I₁>=i >=1, I₁For in image f1 of losing lustre Pixel quantity) in the chrominance channels of RGB tri- any passage X (here, passage X is logical for the colors of RGB tri- Any passage in road, is set to R passages here) brightness, if pixel i and 8 adjoining pixel (refer to Pixel i up and down and 2 cornerwise two ends totally 8 adjacent pixels) in any pixel j It is consistent in the brightness of respective channel (consistent with the R passages of foregoing hypothesis), then by the connection belonging to pixel i Block is a connection block with the connection merged block belonging to pixel j.Then, each connection block is traveled through, to each The elemental area of connection block is judged：If connecting block k's (k span is the quantity of connection block) Elemental area is less than threshold value (4 pixel), then will connect block k and be merged into the connection block adjacent with connection block k, The color of pixel is set to connect the brightness for the connection block that block k is incorporated in connection block k.

For another example for a certain pixel, if (i values are I to the pixel i in the gray-scale map of target image₂≥i >=1, I₂For the quantity of the pixel in gray-scale map) with 8 adjoining pixels (pixel i up and down and 2 Totally 8 pixels at the cornerwise two ends of bar) in pixel j color (gray value) it is identical, then will be adjacent Pixel i connect merged block with belonging to pixel j for same connection block；Then, each connection block is traveled through, The elemental area of each connection block is judged：If connecting block k, (k span is the number of connection block Amount) elemental area be less than threshold value (4 pixel), then will connect block k and be merged into connecting the adjacent companies of block k In logical block, the gray value for connecting pixel in block k is set to connect the ash of pixel in the connection block that block k is incorporated to Angle value.

Step 203 will belong to the potting gum of same character (for Chinese character, at least same stroke) Turn into a referred to as one connection block together and supply subsequent treatment.

Subsequent step 204 abandon lose lustre in image and bianry image in meet default feature (here default Feature is corresponding with the feature in the non-legible region in image) connection block.

Step 204, after to the connection merged block in image of losing lustre and in bianry image, image of losing lustre is abandoned In and bianry image in meet default feature (default feature here and the spy in the non-legible region in image Levy correspondence) connection block.

The connection block difference of connection block and bianry image f2 to every kind of color channel in image f1 of losing lustre Carry out at least one following processing：

1) discard connection block in area still less than elemental area threshold value (such as 4 pixels) connection block, Area is considered as still less than the connection block of elemental area threshold value (such as 4 pixels) does not carry word；

2) the corresponding connection block of background colour is discarded：Connect any one edge lengths of block and be more than the respective image length of side First preset ratio (such as 0.8 times)；

3) the corresponding connection block of frame is discarded：Connect any one length of side of block and be more than frame length threshold (such as 65 pixels), and connection block elemental area is less than fractional threshold (such as 0.22) with the ratio that bounding box is accumulated.Connection The bounding box of block is exactly the minimum rectangle (side of rectangle that all pixels connected contained by block are all included Correspondence is parallel to image x and y-axis, it is possible to uniquely determine)

Alternatively, in view of image includes the situation of the disconnected word of the strokes such as Chinese character, it can also carry out step Rapid 206 close disconnected stroke in the word (i and j in such as Chinese character and English character) in image And to together.

Step 205, based on every kind of color channel in image of losing lustre connection block position relationship (as distance, Intersect) merge respectively as new connection block, and it is based on position pass for the connection block in bianry image System's (such as distance, intersection) is merged as new connection block.

1) combined distance be less than distance threshold connection block (distance refer to two connection block bounding box central points Chebyshev apart from d).

2) maximum in the average value of two respective length and width for connecting block is taken, ms (max are set to ((a1+b1)/2.0, (a2+b2)/2.0)), a1, b1 are the length and width of the bounding box of first connection block Degree a2, b2 are the length and widths of the bounding box of second connection block), 0.4ms is taken as distance threshold. Then, if meeting preparatory condition such as：0.4ms<1 or 1<0.4ms<3, and apart from d<3；Merge selected Two connection blocks.

3) for image f1 of losing lustre the chrominance channels of RGB tri- each passage connection block, and bianry image F2 connection block, merges the connection block that bounding box meets default cross feature in the presence of intersection and cross section.Example Such as, intersect if the bounding box of two connection blocks is present, the area of cross section is more than face in two bounding boxs Default the 10% of the area of product smaller, and cross section area is less than the 10% of image area, then merges State bounding box and there is the company connection block intersected.

4) (alignment refers to the connection block of the alignment of merging bounding box and the default alignment merging rule of satisfaction：Connect block Bounding box alignd on horizontally or vertically direction, i.e.,：1) bounding box of two connection blocks is highly consistent, And in vertical direction position consistency；2) width of the bounding box of two connection blocks is consistent, and in the horizontal direction Position consistency) merge.

Alignment merges a regular example：After the connection merged block of alignment, the encirclement of two connection blocks The bounding box areas of the relative two connections blocks of box (minimum bounding box for namely including two bounding boxs) add and Increment be less than area increase proportion threshold value (such as 10%), and merge after bounding box area be less than image surface Long-pending proportion threshold value (such as 10%), then merge the bounding box of the two connection blocks.

Step 206, connection block to every kind of color channels of image f1 of losing lustre chrominance channels of RGB tri-, with And the connection block in bianry image f2, merged in a vertical and horizontal direction in the way of connection respectively, Obtain the character area (including literal line region and word column region) of candidate in image.

Single word (such as Chinese character) is being connected into word row or column by purpose：Rule is merged based on connection (merging of merging and vertical direction to horizontal direction merges rule using identical connection, is subsequently said It is bright) first to connect block horizontal direction of carry out merging, the merging for the vertical direction that then tries again, The merging for the horizontal direction that finally tries again.

Generally the word of rows is more common than vertical setting of types word in the picture, so first to even in step 206 Logical block carries out the merging of horizontal direction, it is ensured that horizontally arranged word is merged first, reduces horizontal text The possibility mistakenly merged vertically, then carries out the merging of vertical direction, being unsatisfactory for level to connection block Merge rule but meet the vertical merging for merging rule well；But because the bounding box of connection block during this It is probably change, the new bounding box pair that level merges rule that meets is produced, so the level side that tries again To connection block merging.

Connection merges an example of rule and meets at least one following condition even for the bounding box of two connection blocks It is new connection block to connect two connection blocks：

1) centre distance (two of the bounding box of two connection blocks on reference to axially (trunnion axis or vertical axes) Distance of the individual bounding box in the coordinate at the upward center of corresponding reference axis) or Edge Distance in minimum range (distance of two bounding boxs between the upward edge coordinate of reference axis), less than two bounding box correspondences are referred to The first preset ratio (such as 0.15 of the minimum length of side in the length of side (length of side consistent with reference axial direction) of axial direction Times)；

Because two bounding boxs in corresponding reference axis upward coordinate range are probably that separation is also likely to be part Overlap, so being capable of most accurate characterization two by the way of small distance in centre distance or Edge Distance The bounding box of block is connected in the upward distance of corresponding reference axis.

2) bounding box of two connection blocks is being less than two bounding boxs pair perpendicular to the distance with reference to axially direction Should be perpendicular to the second preset ratio (such as twice) for referring to the minimum length of side in the axial length of side；

3) difference (bounding box of two 2 connection block of the bounding box of two connection blocks in the length of side with reference to axial direction Correspondence with reference to the length of side of axial direction difference) be less than the bounding boxs of two connection blocks on the corresponding side with reference to axial direction The 3rd preset ratio (such as 30%) of the minimum length of side in length.

Step 207, the corresponding bounding box of connection block that correspondence is connected together on target image is (namely The character area of candidate comprising literal line or text line) position extract specific region, for it is each extract Specific region, the specific region is judged based on the probability correspondence comprising literal line or text line in specific region In whether include literal line or text line.

Abovementioned steps 201 will lose lustre what image f1 and bianry image f2 was connected into step 206 Bounding box, that is, the new bounding box that the union of the bounding box of a line is obtained is connected to, it is in shape square Shape, that is, the potential region (the namely character area of candidate) including literal line or text line, in mesh An area-of-interest (ROIregion of interest, that is, foregoing given zone are extracted on logo image I Domain, the need for being sketched the contours of from target image I with modes such as square frame, circle, ellipse, irregular polygons from The region of reason), using most short side long S of the specific sliding window step-length such as using the region as the window length of side, 0.5S is cunning Differentiate in good convolutional neural networks (CNN) grader of window step-length sliding window feeding training in advance, obtain each Probability p _ w of word is included in sliding window, all p_w are averaged, the character area for obtaining candidate is text Probability p _ l of word row (or text line), if Probability p _ l is more than default probability threshold value (taking 0.5), sentences Determine the presence of literal line (or text line) in area-of-interest.

Step 208, a bounding box is merged into overlapping bounding box and is exported as the region comprising word.

Step 201 to 204 ensure that the positional accuracy of bounding box (that is to say potential character area) (i.e. Make to be other pictorial element rather than literal line (or text line) in this bounding box, also can be exactly right The pictorial element of literal line is answered to abandon, and probability threshold value filtering ensures the bounding box by filtering in step 208 Literal line (or text line) is inside all included, has more accurately position by the bounding box of filtering, it is not necessary to Non-maximum restraining, directly to all overlapping bounding boxs, is merged into a bounding box and exports.

Convolutional neural networks training step：

To the data (image for including word) taken, Chinese character therein is marked, then to above-mentioned steps 206 The output of (before convolutional neural networks filtering) is screened, and is chosen in marking close part, by bounding box Sliding window is cut into according to the method in above-mentioned steps 208, is manually isolated and is belonged to word and be not belonging to word Window, all windows are scaled to 32*32 pixels.

These windows are built into training and checking data, the neutral net shown in training Fig. 6 and Fig. 7, training When each data 27*27 pixel sizes are cut into by random center, and overturn at random.Using under stochastic gradient (SGD) training is dropped, and the batch_size of training takes 50, and weights decay _ (weight_decay) takes 0.0005, Momentum momentum takes 0.9, and learning rate (learning rate) calculates lr=base_lr* (1 with equation below + 0.0001*iter) ^ (- 0.75), iter is the number of times of iteration, and preceding 100,000 iteration, base_lr takes 0.001,0.0001 is taken afterwards.

The embodiment of the present invention provides a kind of text detection system, referring to Fig. 9, including：

The two-value that loses lustre processing unit 100, is subtracted for each image in three chrominance channels by target image Color processing, obtains image of losing lustre, and, the target image is converted into bianry image；

First combining unit 200, for the connection block with same hue in the image of losing lustre to be closed And, and the connection merged block that will there is same hue in the bianry image；

Second combining unit 300, the connection for every kind of color channel to the chrominance channel of image three of losing lustre Connection block in block and the bianry image, is entered in the way of connection in a vertical and horizontal direction respectively Row merges, and obtains the character area of candidate in the target image；

Judging unit 400, the position for the character area of the correspondence candidate on the target image is carried Specific region is taken, based on the probability comprising character area in the specific region extracted and predetermined probabilities threshold The comparative result of value judges whether include literal line or text line in the specific region of the extraction.

Preferably, the two-value processing unit 100 of losing lustre, is additionally operable to the RGB three of the target image Each passage does the quantization of K grade and obtains the interval of K grade respectively in chrominance channel；

Preferably, first combining unit 200, is additionally operable to in the image of losing lustre and the two-value Each pixel in image as a single connection block, the Union-find Sets set up for the pixel perform with Lower processing：

First combining unit 200, if being additionally operable to the pixel and any pixel in the pixel of 8 adjoinings Color it is identical, then by the connection merged block belonging to two adjacent color identical pixels be same connection Block

First combining unit 200, is additionally operable to judge the elemental area of each connection block, If the elemental area of the connection block is less than elemental area threshold value, the connection block is incorporated to and the company The adjacent connection block of logical block, and the color of the connection block is set to the be incorporated to color for connecting block.

Preferably, the system also includes：

Discard processing unit 500, for will have in first combining unit 200 in the image of losing lustre The connection block of same hue is merged, and the connection block with same hue in the bianry image is closed And afterwards, abandon the connection block for meeting default feature in image of losing lustre and in bianry image；The default spy Levy including at least one of：

Preferably, the system also includes

3rd combining unit 600, for will have in first combining unit 200 in the image of losing lustre The connection block of same hue is merged, and the connection block with same hue in the bianry image is closed And afterwards, the position relationship of the connection block based on every kind of color channel in the image of losing lustre is closed respectively And for new connection block, and for the connection block in the bianry image based on position relationship merge for New connection block；

Wherein, the 3rd combining unit 600, is additionally operable to perform at least one following processing：

Combined distance is less than the connection block of distance threshold；

Preferably, second combining unit 300, be additionally operable to based on connection merge rule it is different types of according to It is secondary carry out horizontal direction merging, the merging of vertical direction, and horizontal direction merging；Wherein, it is described Connection merges rule and included：

Preferably, the judging unit 400, is additionally operable to interested to extract one on the target image Region, the bounding box that will be connected in lose lustre image and the bianry image, with specific sliding window step-length Sliding window will connect obtained bounding box feeding convolutional neural networks point in lose lustre image and the binary map Differentiate in class device, obtain the probability for including word in each sliding window；

The judging unit 400, is additionally operable to average to the probability comprising word in the sliding window, obtains The character area of the candidate includes the probability of literal line or text line；

The judging unit 400, if being additionally operable to the resulting default probability threshold value that is more than, judgement is described There is literal line or text line in area-of-interest.

Being stored with a kind of computer-readable storage medium of offer of the embodiment of the present invention, the computer-readable storage medium can Execute instruction, the executable instruction is used to perform the file test method shown in Fig. 1 or Fig. 2.

In summary, the embodiment of the present invention has the advantages that：

The present invention proposes the method and system of text detection in image, it is adaptable to position the figure in network album The word such as printed Chinese character as in, the result of output can as character identification system input, help final Produce accurate Text region result.

It will be appreciated by those skilled in the art that：Realize that all or part of step of above method embodiment can be with Completed by the related hardware of programmed instruction, foregoing program can be stored in embodied on computer readable storage In medium, the program upon execution, performs the step of including above method embodiment；And foregoing storage is situated between Matter includes：Movable storage device, random access memory (RAM, Random Access Memory), Read-only storage (ROM, Read-Only Memory), magnetic disc or CD etc. are various can be with storage program The medium of code.

Or, if the above-mentioned integrated unit of the present invention is realized using in the form of software function module and as independently Production marketing or in use, can also be stored in a computer read/write memory medium.Based on so Understanding, the part that the technical scheme of the embodiment of the present invention substantially contributes to correlation technique in other words can To be embodied in the form of software product, the computer software product is stored in a storage medium, bag Some instructions are included to so that a computer equipment (can be personal computer, server or network Equipment etc.) perform all or part of each of the invention embodiment methods described.And foregoing storage medium bag Include：Movable storage device, RAM, ROM, magnetic disc or CD etc. are various can be with Jie of store program codes Matter.

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited to This, any one skilled in the art the invention discloses technical scope in, can readily occur in Change or replacement, should all be included within the scope of the present invention.Therefore, protection scope of the present invention should It is defined by the scope of the claims.

Claims

1. a kind of character detecting method, it is characterised in that methods described includes：

2. the method as described in claim 1, it is characterised in that in three chrominance channels by target image Each image lose lustre processing, obtain image of losing lustre, including：

3. the method as described in claim 1, it is characterised in that have in the image that will lose lustre mutually homochromy Color connection block is merged, and the connection merged block that will have same hue in the bianry image, bag Include：

If the pixel is identical with the color of any pixel in the pixel of 8 adjoinings, by two adjacent colors Connection merged block belonging to color identical pixel is same connection block；

4. the method as described in claim 1, it is characterised in that described that will there is phase in the image of losing lustre Connection block with color is merged, and the connection merged block that will have same hue in the bianry image Afterwards, methods described also includes：

Lose lustre described in abandoning in image and meet the connection block of default feature in the bianry image；It is described pre- If feature includes at least one of：

5. the method as described in claim 1, it is characterised in that described that will there is phase in the image of losing lustre Connection block with color is merged, and the connection merged block that will have same hue in the bianry image Afterwards, methods described also includes：

The position relationship of connection block based on every kind of color channel in the image of losing lustre merge respectively for New connection block, and merged for the connection block in the bianry image based on position relationship to be new Connect block；Wherein, the merging includes performing at least one following processing：

Combined distance is less than the connection block of distance threshold；

6. the method as described in claim 1, it is characterised in that described to the chrominance channel of image three of losing lustre Every kind of color channel connection block and the bianry image in connection block, respectively vertically and horizontally Merged on direction in the way of connection, obtain the character area of candidate in the target image, including：

7. the method as described in any one of claim 1 to 6, it is characterised in that described in the target figure As specific region is extracted in the position of the character area of the upper correspondence candidate, based on the given zone extracted The probability comprising character area and the comparative result of predetermined probabilities threshold value judge the specific region of the extraction in domain In whether include literal line or text line, including：

8. a kind of text detection system, it is characterised in that the system includes：

9. system as claimed in claim 8, it is characterised in that

The two-value processing unit of losing lustre, is additionally operable in the redgreenblue passage by the target image each The quantization that passage does K grade respectively obtains the interval of K grade；

10. system as claimed in claim 8, it is characterised in that

First combining unit, is additionally operable to each in the image of losing lustre and in the bianry image Pixel individually connects block as one, and the Union-find Sets set up for the pixel perform following processing：

First combining unit, if being additionally operable to the pixel and the color of any pixel in the pixel of 8 adjoinings It is color identical, then it is same connection block by the connection merged block belonging to two adjacent color identical pixels；

11. system as claimed in claim 7, it is characterised in that the system also includes：

12. system as claimed in claim 8, it is characterised in that the system also includes

Combined distance is less than the connection block of distance threshold；

13. system as claimed in claim 8, it is characterised in that

Second combining unit, is additionally operable to merge the different types of carry out level side successively of rule based on connection To merging, the merging of vertical direction and the merging of horizontal direction；Wherein, the connection merges rule Including：

14. the system as described in any one of claim 8 to 13, it is characterised in that

The judging unit, is additionally operable to extract a specific region on the target image, will be described Lose lustre the bounding box that image and the bianry image connected, will be subtracted with specific sliding window step-length sliding window described Connect and differentiate in obtained bounding box feeding convolutional neural networks grader in color image and the binary map, obtain The probability of word is included in each sliding window；