CN109460763B - Text region extraction method based on multilevel text component positioning and growth - Google Patents

Text region extraction method based on multilevel text component positioning and growth Download PDF

Info

Publication number
CN109460763B
CN109460763B CN201811267160.7A CN201811267160A CN109460763B CN 109460763 B CN109460763 B CN 109460763B CN 201811267160 A CN201811267160 A CN 201811267160A CN 109460763 B CN109460763 B CN 109460763B
Authority
CN
China
Prior art keywords
text
pixel
region
value
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811267160.7A
Other languages
Chinese (zh)
Other versions
CN109460763A (en
Inventor
苏丰
丁文俊
汪洋
王雨阳
王岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811267160.7A priority Critical patent/CN109460763B/en
Publication of CN109460763A publication Critical patent/CN109460763A/en
Application granted granted Critical
Publication of CN109460763B publication Critical patent/CN109460763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a text region extraction method in a natural scene image based on multi-level text component positioning and growth, which comprises the steps of firstly inputting a gray level or color RGB image; running an MSER algorithm on an input image, further running an SWT algorithm in the MSER by taking an MSER boundary as an area edge, and acquiring a stroke width value of pixels in an extremum area; calculating a stroke width histogram in an extremum region, selecting a pixel set corresponding to three stroke widths containing the maximum number of pixels in the histogram, and taking pixels in the pixel set passing edge gradient difference angle characteristic verification as seed pixels; and based on the seed pixels, iteratively performing a growth process of two layers in and between the characters, further filtering the regions obtained after growth based on various text region characteristics, and outputting the finally obtained text regions as results. The text region extraction method provided by the invention can give consideration to the precision and the recall rate of the extraction result, does not depend on a specific machine learning model, and is simple and easy to reproduce.

Description

Text region extraction method based on multi-level text component positioning and growth
Technical Field
The invention belongs to the field of image target detection, and relates to a text region extraction method based on multi-level text component positioning and growth in a natural scene image.
Background
Characters in the natural scene image contain rich semantic information, have important significance for understanding the image and the scene, and have significant utilization value in image understanding, retrieval, classification, labeling and other applications. However, since characters in the natural scene image often have great differences in attributes such as size, direction, color, language, style, and the like, and are easily affected by factors such as illumination, occlusion, background, and the like in the natural scene, accurately detecting characters in the natural scene image is a challenging task.
In general, natural scene image text detection can be divided into two subtasks. The first step extracts possible character candidate regions, and the second step merges character candidate regions belonging to the same text line. The success of the first step is crucial to the effective extraction of the characters in the natural scene image, and if the possible character candidate regions cannot be accurately and completely extracted, the processing of subsequent merging to generate text lines is difficult to obtain a good result.
For the step of extracting possible character candidate regions, two conventional algorithms based on connected region analysis, which are commonly used at present, are a Maximum Stable Extremum Region (MSER) algorithm and a Stroke Width Transformation (SWT) algorithm, respectively. The MSER algorithm is based on a watershed method, focuses on reflecting the relative stability inside an extremum region, is not specially drawn for the character characteristics, and the extraction result of the MSER algorithm depends on the specific threshold setting of parameters such as the gray value change rate of pixels inside the region, so that the accuracy and the recall rate of the extraction result are difficult to be considered. Although the Stroke Width Transformation (SWT) algorithm grasps and fully utilizes the parallel characteristics of character stroke edges, the reliability of the SWT algorithm depends on the quality of image edge pixels to a great extent, and when character stroke edge pixels are matched, the SWT algorithm depends on the difference threshold value of two pixel gradient directions to a great extent, and different threshold value settings influence the matching result and further influence the final character candidate region extraction result. In practical use, the two classical text region extraction methods usually adopt a fixed and single parameter threshold, the extraction result is very sensitive to the selected specific parameter threshold, the characters in the natural scene image have great difference in appearance and quality, the use of a high threshold in the algorithm will improve the accuracy of the extraction result, but will also cause many text regions to be omitted in the extraction result; on the other hand, using a low threshold will increase the recall of the extracted results, but the accuracy of the extracted text regions will also decrease. Therefore, the situation that texts in natural scene images are complex and changeable is difficult to deal with by adopting a single and non-adaptive processing strategy.
The Chinese invention patent CN10756380.A provides a vehicle license plate detection and identification method combining an MSER algorithm and an SWT algorithm. In the license plate detection part in the patent, firstly, graying and contrast enhancement operations are carried out on an input image, then Canny edge detection and MSER area detection are carried out on the processed image, and then intersection is taken between the expanded Canny edge and an original MSER area to obtain a candidate license plate area. And further, an SWT algorithm based on morphological processing is operated on the candidate license plate area to obtain the stroke width of characters in the candidate license plate area, and finally the candidate area is screened and aggregated according to the stroke width to obtain the final license plate position. The method has good detection effect on the characters in the license plate with obvious character and background contrast and high character edge quality, but the excessive graying, edge detection, morphological operation and other processing contained in the method cause the method to have certain limitation on the type of the applicable image text, and text objects with various forms and complex backgrounds in natural scene images are difficult to effectively process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a self-adaptive text region extraction method in a natural scene image based on multi-level text component positioning and growth, which adopts a strategy of easy, difficult and divide-and-conquer to treat text objects with different detection conditions differently, namely on the granularity level of a plurality of objects, firstly extracts a more standard seed text component in the image by using a relatively strict detection condition, and then effectively extracts the text component with poor quality by relaxing the detection condition based on the obtained seed text component and text characteristics thereof so as to obtain a text region extraction result with better precision and recall ratio.
The invention specifically adopts the following technical scheme:
a method for extracting text regions in natural scene images based on multi-level text component positioning and growth is characterized by comprising the following steps:
the method comprises the following steps: inputting a gray or color natural scene image containing a text object;
step two: extracting text seed pixels in the input image by combining MSER, SWT and a region pixel gray value smoothing algorithm; the method specifically comprises the following steps:
step (21): running an MSER algorithm on the input image, and smoothing the obtained gray value of the pixels in each extremum region;
step (22): running an SWT algorithm in the extremum region to obtain a stroke width value of each pixel in the extremum region;
step (23): calculating a histogram of the stroke width of each pixel in each extreme value area, and selecting three histogram peak values with the largest number of pixels as candidate main stroke widths of the extreme value area;
step (24): calculating the edge gradient difference angle characteristic of the pixel set corresponding to each candidate main stroke width of each extreme value region, and taking the pixel set corresponding to the main stroke width smaller than a given edge gradient difference angle characteristic threshold value as a text seed pixel in the extreme value region;
step three: based on the extracted text seed pixels, the character in-growth process is carried out in an iteration mode, and the specific method comprises the following steps: calculating difference values of the gray value, the color value and the stroke width value of the pixel adjacent to the text seed pixel and the corresponding value of the text seed pixel, taking the adjacent pixel with the difference value smaller than a specific threshold value as a new text seed pixel obtained by growth, and iterating the growth process until the adjacent pixel reaches the edge of the region or cannot be further grown to obtain a text pixel connected region;
step four: and iterating the character-to-character growth process based on the text pixel connected region obtained in the step three, wherein the method specifically comprises the following steps: selecting two text pixel connected regions with the center distance smaller than a specific threshold, searching connected pixels with enough quantity and gray value, color value and stroke width value and the difference value of the corresponding parameter mean value of the two text pixel connected regions smaller than the specific threshold on a plurality of connecting lines of corresponding quartering points of the sides of the minimum enclosing rectangle, and taking the connected pixels as a text seed pixel set of a new text pixel connected region obtained by growth;
step five: repeating the iterative growth process of the third step and the fourth step based on the new text seed pixel obtained in the fourth step until a new text pixel connected region cannot be obtained;
step six: and filtering the finally obtained text pixel connected region based on the threshold values of the various features to remove non-text regions possibly contained in the text pixel connected region, and taking the filtered text region as a final extraction result.
The invention has the beneficial effects that:
the method for extracting the text region in the natural scene image based on multi-level text component positioning and growth has the following advantages:
1) the self-adaptive extraction strategy can give consideration to the precision and recall rate of the extraction result;
2) the iterative hierarchical growth strategy uses different parameter thresholds in the growth process of each level, thereby ensuring the pertinence, controllability and rationality of the growth steps.
3) The extraction method is independent of a specific machine learning model, and is simple and easy to reproduce;
drawings
FIG. 1: a method flow diagram of the present invention;
FIG. 2: the detection schematic diagram of the invention;
FIG. 3: calculating a schematic diagram of edge gradient difference angle features;
FIG. 4 is a schematic view of: and the text pixel connected region grows a schematic diagram.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any inventive step, are within the scope of the present invention.
As shown in fig. 1 and fig. 2, the method for extracting a text region in a natural scene image based on multi-level text component positioning and growth of the present invention includes the following steps:
inputting a gray level or RGB color natural scene image I containing a text object;
step two, extracting text seed pixels in the input image I, comprising the following steps:
in the step (21), a Maximum Stable Extremum Region (MSER) algorithm is run on the input image I, and the obtained gray values of the pixels in each extremum region are smoothed (see the description (1)), so that the text edge quality can be improved, the calculation of the subsequent SWT algorithm can be accelerated, and the stroke width value obtained by the SWT algorithm in the next step is more stable. Meanwhile, taking the smoothed gray value corresponding to the maximum number of pixels as a main gray value of the extreme value area;
step (22) running an SWT algorithm in the area by taking the boundary pixel of the extremum area as an edge to obtain the stroke width value of each pixel in the extremum area;
step (23) calculating a histogram of the stroke width of each pixel in each extreme value area, and selecting three histogram peak values with the largest pixel number as candidate main stroke widths of the area;
and (24) calculating an edge gradient difference angle feature (see description (2)) of a pixel set corresponding to each candidate main stroke width in each extreme value region, taking the pixel set corresponding to the main stroke width smaller than a specific edge gradient difference angle feature threshold value as a text seed pixel in the extreme value region, and taking the mean value of the candidate main stroke widths meeting the conditions as the main stroke widths of the extreme value region and the text seed pixel set thereof.
And step three, performing character in-growth starting from text seed pixels, calculating the difference value between the gray value, the color value and the stroke width value of the pixel adjacent to the seed pixels and the corresponding value of the text seed pixels, taking the adjacent pixels with the difference value smaller than a specific threshold value as new text seed pixels obtained by growth (see the specification (3)) and iterating the growth process until the edge of the extremum area is reached or the adjacent pixels cannot be further grown, and finally taking the grown and expanded text seed pixel set as a text pixel communication area of the components in the corresponding characters. On the basis of the stroke width value parameter of the pixel, the method further utilizes the attributes of all aspects of the text reflected by the seed pixel obtained in the step two, and improves the robustness of the growing method for dealing with various interference, noise and degradation conditions.
And step four, starting from the text pixel connected region obtained in the step three, iterating to carry out the inter-character growth process. Specifically, two text pixel connected regions with the center distance smaller than a specific threshold are selected, on a plurality of connecting lines between corresponding quartering points of the sides in the vertical direction of the minimum enclosing rectangle, the difference between a gray value, a color value and a stroke width value and the mean value of the main gray value, the color mean value and the main stroke width of the two text pixel connected regions is found to be smaller than the specific threshold, and the connected pixels with enough number are used as a text seed pixel set of a new text pixel connected region grown between characters (see description (4)). Compared with the method that the inter-character growth is only carried out on the connecting line of the central points of the two text pixel communication areas, the method searches all possible inter-character text pixel areas on the more dense connecting lines of the plurality of quartering points, and can better cope with the difference of different sizes of characters and the growth results in the characters, thereby further improving the recall rate of the extraction of the text areas;
step five, repeatedly and iteratively executing the step three and the step four until a new text pixel connected region cannot be obtained;
and step six, filtering the text pixel connected regions based on the threshold values of the characteristics of the various text regions, and outputting the filtered regions as the text region extraction results. (see description (5) for details).
Description of (1): the step (21) of the invention for smoothing the pixel gray value in the extreme value area comprises the following steps:
setting the lowest gray value of each pixel in the extreme value area as min and the highest gray value as max, dividing the gray value range of the extreme value area into five parts, wherein the gray value range of the ith part is
Figure GDA0003484719940000051
Figure GDA0003484719940000052
For the pixel P whose gray value falls within the ith range, the gray value is set to
Figure GDA0003484719940000053
I.e. the intermediate gray value of the ith range, and the gray value with the most number of pixels in the updated extremum region is taken as the main gray value of the extremum region.
Description (2): the method for calculating the edge gradient difference angle characteristic of the pixel set corresponding to each main stroke width in the step (24) of the invention comprises the following steps:
as shown in FIG. 3, for each edge pixel p detected by the Canny edge detector in the Stroke Width Transform (SWT) algorithmiEmitting a ray along the gradient direction of the gray value, if the ray passes through another edge pixel pjThen the ray and p thereoniAnd pjEach pixel p in betweenkAre all assigned an angle value dangle (k)The value is from piAnd pjStarting the direction radian difference of the two rays along the respective gradient directions; due to the fact that for one pixel p in a certain extremum regioniThere may be more than one ray passing through the pixel location, assuming that there are m rays, of which the s-th ray
Figure GDA0003484719940000054
A value of dsAnd the ray contains NsA pixel in the connected region, then pixel piFinal dangle (i)The value calculation method comprises the following steps:
Figure GDA0003484719940000055
further, for a pixel set comprising N pixels, D isangleThe characteristic value being d of all pixels thereinangleAverage of the values:
Figure GDA0003484719940000056
in the present invention, DangleThe threshold of the eigenvalue is set to 0.93 if D of a certain set of pixelsangleIf the feature value is greater than the threshold, the non-text region is considered to be discarded.
Description (3): the character in-growth process of the text seed pixel in the third step of the invention comprises the following specific steps:
starting from any pixel in a text seed pixel set of a certain extreme value region, searching for a pixel p which simultaneously meets the following four conditions in an imagei
1) Pixel pnCommunicating with any pixel in the text seed pixel set;
2) pixel piThe stroke width value of (a) differs from the main stroke width of the text seed pixel set by no more than 40% of the latter;
3) pixel piThe difference between the gray value of the text seed pixel set and the mean gray value of the text seed pixel set is not more than 40 percent of the latter;
4) for an RGB color natural scene input image, pixel piThe values of the three RGB color channels of (a) differ by no more than 60% from the average value of the text seed pixel set over the three RGB channels.
And adding the found pixels meeting the conditions into the text seed pixel set as new text seed pixels.
Description (4): the specific steps of the inter-character growth process between the text pixel connected regions in the fourth step of the invention are as follows:
1) as shown in fig. 4, for each text pixel connected region, representing the region by a minimum bounding rectangle, calculating the maximum value d of the length and width of the bounding rectangle, and representing the center of the text pixel connected region by the center of the minimum bounding rectangle, and marking the quartering points of the sides of the minimum bounding rectangle in the vertical direction;
2) searching other text pixel connected regions in a circular image range which takes the center of the text pixel connected region as the center of a circle and takes 3d as the radius;
3) if one or more text pixel connected regions are found, searching pixels p meeting the following conditions on a plurality of connecting lines connecting corresponding quartering points of the text pixel connected region and another text pixel connected region from near to far in sequencei
a) Pixel pnThe stroke width value of (2) differs by no more than 40% from the average value of the main stroke widths of the two regions;
b) pixel piDoes not differ by more than 40% from the mean of the grey values of the two regions;
c) for an RGB color natural scene input image, pixel piThe values of the three RGB color channels of (a) do not differ by more than 60% from the RGB three-channel average of the two regions.
4) And removing isolated pixels from the pixel sets meeting the conditions, and regarding the residual pixel sets as the grown text seed pixel sets of the new text pixel connected region.
Description (5): in the sixth step of the invention, the text pixel connected region is filtered based on the following threshold values of the text region characteristics so as to improve the precision of the text region extraction result, wherein the characteristics for filtering and the threshold values thereof are as follows:
1) aspect ratio: namely the length-width ratio of a text pixel connected region, and the effective value range is [0.2, 2.0 ];
2) symmetry: respectively and equally dividing the minimum bounding rectangle of the text pixel connected region into two sub-regions according to the horizontal direction and the vertical direction, wherein the number of text pixels in any sub-region in the horizontal direction or the vertical direction is required to be not more than 80% of the total number of text pixels in the whole region;
3) stability: and if three of four adjacent pixels of a certain text pixel do not belong to the text pixel, judging the pixel as an unstable noise pixel. The number of noise pixels in any one sub-region should not exceed 50% of the total number of text pixels in the whole region;
4) pixel ratio of main stroke width: namely the proportion of the number of pixels corresponding to the width of the main stroke in the text pixel connected region to the total number of pixels in the region, and the effective value range is [0.5, 0.9 ].
The method for extracting the text region in the natural scene image based on multi-level text component positioning and growth has the following innovations:
1) adaptive text region extraction strategy: the text in the natural scene image has a complex and changeable form, and the overall extraction strategy of the invention is to position a relatively regular and high-quality seed text region (namely a text seed pixel set) by using relatively strict and sensitive detection conditions; and then starting from the seed text region, growing to obtain the text region with poor quality by adopting a looser and more robust constraint condition. Because adaptive matching conditions are adopted at different stages in the text region extraction process, the precision and the recall rate of the extraction result are considered;
2) determination of seed text region: different characteristics used for positioning text regions in a Maximum Stable Extremum Region (MSER) algorithm and a Stroke Width Transformation (SWT) algorithm are combined, a seed text region with high quality is extracted from an image under a relatively strict condition to serve as a basis of a later growth process, and region pixel gray value smoothing processing is introduced, so that the stability and the efficiency of the extraction algorithm are effectively improved;
3) multilayer iterative growth strategy: the growth process in the characters is focused on the growth of text regions which belong to the same connected component with the text seed pixels, the growth process between the characters is focused on the growth of the text regions on a plurality of possible paths between the connected components of the text pixels, the two growth processes are operated in an iterative mode by taking the text characteristic similarity as a growth condition until no new text region grows out, and the reliability of the method for dealing with various noise, interference and degradation conditions is improved.

Claims (6)

1. A method for extracting text regions in natural scene images based on multi-level text component positioning and growth is characterized by comprising the following steps:
the method comprises the following steps: inputting a gray or color natural scene image containing a text object;
step two: extracting text seed pixels in the input image by combining MSER, SWT and a region pixel gray value smoothing algorithm; the method specifically comprises the following steps:
step (21): running an MSER algorithm on the input image, and smoothing the obtained gray value of the pixels in each extremum region;
step (22): running an SWT algorithm in the extremum region to obtain a stroke width value of each pixel in the extremum region;
step (23): calculating a histogram of the stroke width of each pixel in each extreme value area, and selecting three histogram peak values with the largest number of pixels as candidate main stroke widths of the extreme value area;
step (24): calculating the edge gradient difference angle characteristic of the pixel set corresponding to each candidate main stroke width of each extreme value region, and taking the pixel set corresponding to the main stroke width smaller than a given edge gradient difference angle characteristic threshold value as a text seed pixel in the extreme value region;
step three: based on the extracted text seed pixels, the character in-growth process is carried out in an iteration mode, and the specific method comprises the following steps: calculating difference values of the gray value, the color value and the stroke width value of the pixel adjacent to the text seed pixel and the corresponding value of the text seed pixel, taking the adjacent pixel with the difference value smaller than a specific threshold value as a new text seed pixel obtained by growth, and iterating the growth process until the adjacent pixel reaches the edge of the region or cannot be further grown to obtain a text pixel connected region;
step four: and iterating the character-to-character growth process based on the text pixel connected region obtained in the step three, wherein the method specifically comprises the following steps: selecting two text pixel connected regions with the center distance smaller than a specific threshold, searching connected pixels with enough quantity and gray value, color value and stroke width value and the difference value of the corresponding parameter mean value of the two text pixel connected regions smaller than the specific threshold on a plurality of connecting lines of corresponding quartering points of the sides of the minimum enclosing rectangle, and taking the connected pixels as a text seed pixel set of a new text pixel connected region obtained by growth;
step five: repeating the iterative growth process of the third step and the fourth step based on the new text seed pixel obtained in the fourth step until a new text pixel connected region cannot be obtained;
step six: and filtering the finally obtained text pixel connected region based on the threshold values of the various features to remove non-text regions possibly contained in the text pixel connected region, and taking the filtered text region as a final extraction result.
2. The method of claim 1, wherein the step (21) of smoothing the gray values of the pixels in the extreme region comprises the steps of: setting the lowest gray value as min and the highest gray value as max for each pixel in the extreme value area, dividing the gray value range of the extreme value area into five parts, wherein the gray value range of the ith part is
Figure FDA0003484719930000021
Figure FDA0003484719930000022
For the pixel P whose gray value falls within the ith range, the gray value is set to
Figure FDA0003484719930000023
Figure FDA0003484719930000024
I.e. the intermediate gray value of the ith range, and the gray value with the largest number of pixels in the processed extremum region is taken as the main gray value of the extremum region.
3. The method for extracting text regions from natural scene images based on multi-level text component positioning and growing as claimed in claim 1, wherein the text seed pixel character in-growth process in step three comprises the following steps:
from any pixel in text seed pixel set of some extremum regionIn the image, a pixel p satisfying four conditions described below is searchedi
1) Pixel piCommunicating with any pixel in the text seed pixel set;
2) pixel piThe stroke width value of (a) differs from the main stroke width of the text seed pixel set by no more than 40% of the latter;
3) pixel piThe difference between the gray value of the text seed pixel set and the mean gray value of the text seed pixel set is not more than 40 percent of the latter;
4) for an RGB color natural scene input image, pixel piThe difference between the values of the three RGB color channels and the average value of the text seed pixel set on the RGB three channels is not more than 60 percent of the latter;
and adding the found pixels meeting the conditions into the text seed pixel set as new text seed pixels.
4. The method for extracting text regions from natural scene images based on multi-level text component positioning and growing as claimed in claim 1, wherein the step four of performing inter-character growing process between text pixel connected regions comprises the following specific steps:
1) for each text pixel connected region, representing the region by a minimum bounding rectangle, calculating the maximum value d of the length and the width of the bounding rectangle, representing the center of the text pixel connected region by the center of the minimum bounding rectangle, and marking the quartering points of the sides of the minimum bounding rectangle in the vertical direction;
2) searching other text pixel connected regions in a circular image range which takes the center of the text pixel connected region as the center of a circle and takes 3d as the radius;
3) if one or more text pixel connected regions are found, pixels p meeting the following conditions are found on a plurality of connecting lines of corresponding quartering points connecting the text pixel connected region and another text pixel connected region from near to far in sequencei
a) Pixel piThe stroke width value of (1) and the main stroke width of the two regionsThe difference between the average values does not exceed 40% of the latter;
b) pixel piDoes not differ by more than 40% from the mean of the grey values of the two regions;
c) for an RGB color natural scene input image, pixel piThe difference between the values of the three RGB color channels and the average value of the RGB three channels of the two areas is not more than 60 percent of the latter;
4) and removing isolated pixels from the pixel sets meeting the conditions, and regarding the residual pixel sets as the grown text seed pixel sets of the new text pixel connected region.
5. The method as claimed in claim 1, wherein in step six, the text region is filtered based on multiple features of the text region, including aspect ratio, symmetry, stability, and pixel ratio of main stroke width.
6. The method of claim 5, wherein the method for extracting text regions in natural scene images based on multi-level text component positioning and growing comprises the following steps:
1) aspect ratio: namely the ratio of the length to the width of a text pixel connected region, and the effective value range is [0.2, 2.0 ];
2) symmetry: respectively and equally dividing the minimum bounding rectangle of the text pixel connected region into two sub-regions according to the horizontal direction and the vertical direction, wherein the number of text pixels in any sub-region in the horizontal direction or the vertical direction is required to be not more than 80% of the total number of text pixels in the whole region;
3) stability: if three of four adjacent pixels of a certain text pixel do not belong to the character pixel, judging the pixel as an unstable noise pixel; the number of noise pixels in any one sub-region should not exceed 50% of the total number of text pixels in the whole region;
4) pixel ratio of main stroke width: namely the proportion of the number of pixels corresponding to the width of the main stroke in the text pixel connected region to the total number of pixels in the region, and the effective value range is [0.5, 0.9 ].
CN201811267160.7A 2018-10-29 2018-10-29 Text region extraction method based on multilevel text component positioning and growth Active CN109460763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811267160.7A CN109460763B (en) 2018-10-29 2018-10-29 Text region extraction method based on multilevel text component positioning and growth

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811267160.7A CN109460763B (en) 2018-10-29 2018-10-29 Text region extraction method based on multilevel text component positioning and growth

Publications (2)

Publication Number Publication Date
CN109460763A CN109460763A (en) 2019-03-12
CN109460763B true CN109460763B (en) 2022-06-21

Family

ID=65608611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811267160.7A Active CN109460763B (en) 2018-10-29 2018-10-29 Text region extraction method based on multilevel text component positioning and growth

Country Status (1)

Country Link
CN (1) CN109460763B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188622B (en) * 2019-05-09 2021-08-06 新华三信息安全技术有限公司 Character positioning method and device and electronic equipment
CN112699712A (en) * 2019-10-22 2021-04-23 杭州海康威视数字技术股份有限公司 Document image region separation method and device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345850A (en) * 2018-01-23 2018-07-31 哈尔滨工业大学 The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG10201510667SA (en) * 2012-06-27 2016-01-28 Agency Science Tech & Res Text detection devices and text detection methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345850A (en) * 2018-01-23 2018-07-31 哈尔滨工业大学 The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"TEXT DETECTION IN NATURAL SCENE IMAGES BY HIERARCHICAL LOCALIZATION AND GROWING OF TEXTUAL COMPONENTS";Wenjun Ding;《Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017》;20170714;第775-780页 *
"Text Proposals Based on Windowed Maximally Stable Extremal Region for Scene Text Detection";Feng Su;《2017 14th IAPR International Conference on Document Analysis and Recognition》;20180129;第376-381页 *
Wenjun Ding."TEXT DETECTION IN NATURAL SCENE IMAGES BY HIERARCHICAL LOCALIZATION AND GROWING OF TEXTUAL COMPONENTS".《Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017》.2017, *

Also Published As

Publication number Publication date
CN109460763A (en) 2019-03-12

Similar Documents

Publication Publication Date Title
CN107609549B (en) Text detection method for certificate image in natural scene
CN115082419B (en) Blow-molded luggage production defect detection method
CN108171104B (en) Character detection method and device
WO2020107717A1 (en) Visual saliency region detection method and apparatus
JP6080259B2 (en) Character cutting device and character cutting method
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN111340824B (en) Image feature segmentation method based on data mining
CN105184763B (en) Image processing method and device
CN102663378B (en) Method for indentifying joined-up handwritten characters
US10438083B1 (en) Method and system for processing candidate strings generated by an optical character recognition process
EP3008632A1 (en) Color sketch image searching
CN112991536B (en) Automatic extraction and vectorization method for geographic surface elements of thematic map
De Automatic data extraction from 2D and 3D pie chart images
CN109460763B (en) Text region extraction method based on multilevel text component positioning and growth
CN115033721A (en) Image retrieval method based on big data
CN112258532B (en) Positioning and segmentation method for callus in ultrasonic image
Saputra et al. Integration GLCM and geometric feature extraction of region of interest for classifying tuna
Fernando et al. Extreme value theory based text binarization in documents and natural scenes
KR20120040004A (en) System for color clustering based on tensor voting and method therefor
Sari et al. Text extraction from historical document images by the combination of several thresholding techniques
Zhang et al. A novel approach for binarization of overlay text
CN109299295B (en) Blue printing layout database searching method
Zayed et al. A New Refined-TLBO Aided Bi-Generative Adversarial Network for Finger Vein Recognition
Thilagavathy et al. Fuzzy based edge enhanced text detection algorithm using MSER
Hanbury How do superpixels affect image segmentation?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant