CN104156730A - Anti-noise Chinese character feature extraction method based on framework - Google Patents

Anti-noise Chinese character feature extraction method based on framework Download PDF

Info

Publication number
CN104156730A
CN104156730A CN201410360498.2A CN201410360498A CN104156730A CN 104156730 A CN104156730 A CN 104156730A CN 201410360498 A CN201410360498 A CN 201410360498A CN 104156730 A CN104156730 A CN 104156730A
Authority
CN
China
Prior art keywords
point
classification
pca
end points
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410360498.2A
Other languages
Chinese (zh)
Other versions
CN104156730B (en
Inventor
周元峰
朱东方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201410360498.2A priority Critical patent/CN104156730B/en
Publication of CN104156730A publication Critical patent/CN104156730A/en
Application granted granted Critical
Publication of CN104156730B publication Critical patent/CN104156730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an anti-noise Chinese character feature extraction method based on a framework. The anti-noise Chinese character feature extraction method comprises the following steps: smoothing and denoising text gray level images and binarizing; sampling the binarized images and converting the binarized images into a point cloud model; conducting corrosion operation on the original binarized images to obtain a thick center shaft; carrying out PCA analysis based on the center shaft to obtain splitting results; merging the splitting results and post-processing the merged point cloud model; performing B spline curve fitting on the classified point cloud to obtain a framework. As the Chinese character image information is converted into the point cloud model, the influence on extraction of the Chinese character framework from noise and other factors is reduced; the B spline curve is adopted for fitting the framework, so that features of the original Chinese characters are better maintained; the original Chinese character images are directly processed without further normalization pretreatment, the difficulty in extracting the Chinese character framework is lowered, and the efficiency is improved.

Description

A kind of antinoise Research of Chinese Feature Extraction method based on skeleton
Technical field
The present invention relates to image processing and pattern recognition field, be specially a kind of method of extraction automatically of the Hanzi features based on skeleton of robust.
Background technology
The identification of Chinese character is a field of word identification.Because Hanzi font library is huge, and Chinese character pattern is more, cannot as English waits alphabet type letters, have better simply recognizer, is therefore more difficult Applied research fields to the identification of Chinese character always.The identification of Chinese character is generally divided into printed Chinese character identification and Handwritten Chinese Character Recognition, more for the Study of recognition of block letter, but handwritten form is owing to having otherness, and therefore discrimination is lower.
For the identification of Chinese character, feature extraction is one of most important link in recognition system.The good feature that can extract in different shape, different-style situation is one of research emphasis in current Chinese Character Recognition.In traditional research range, direction character is used for extracting the feature of Chinese character widely, but direction character need to and be set up elastic mesh to the standardization of Chinese character travel direction, and to comparatively difficulty of the handwritten Kanji recognition of different shape, the simple demand that cannot meet actual use based on directional characteristic Research of Chinese Feature Extraction.
The direction of another kind of feature extraction is the method based on Chinese character skeleton.The skeleton of Chinese character can be stronger the shape topological structure that symbolizes Chinese character, and can keep preferably geometric properties, can also reduce significantly and calculate and the difficulty of coupling character library simultaneously.Although the extraction of Chinese character skeleton can be used for showing Hanzi features, because Chinese character especially handwritten Chinese character has stronger otherness and low-quality, the extraction of therefore skeleton of Chinese character being carried out to better quality is still a current difficult problem.More method is placed on the extraction of Chinese character contour by focus and processes above, and other method is used the etch in morphology, can not process preferably the low-quality Chinese character situations such as noise, sparse, fracture.
Summary of the invention
The deficiency existing for solving prior art, the invention discloses a kind of antinoise Research of Chinese Feature Extraction method based on skeleton, for features such as the othernesses of Chinese character especially low-quality Chinese character, cover Chinese character with point cloud model, point cloud model has sparse property, unconnectedness, and can reduce preferably the impact of noise on feature skeletal extraction.Carry out the extraction of skeleton by being converted into point cloud model, use principal component analysis (PCA) to carry out " division-merging " classification and process, finally use the matched curve of minimum squared distance method.Reduce the impact that the factors such as noise are extracted Chinese character skeleton, and reasonably Chinese-character stroke has been classified and curve, finally drawn the framework characteristic of comparatively fairing.
For achieving the above object, concrete scheme of the present invention is as follows:
An antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1: the gray level image of need text to be processed is carried out to pre-service, comprise gray level image is carried out to smoothing processing, and gray level image is carried out to binary conversion treatment;
Step 2: the gray level image after binary conversion treatment is carried out to down-sampling, generate point cloud model data;
Step 3: the gray level image after binary conversion treatment is corroded to operation and obtain thick axis point set;
Step 4: carry out PCA based on this axis point set according to splitting condition and analyze division, obtain dividing result;
Step 5: division result is merged, and the point that is combined rear point of crossing place carries out aftertreatment;
Step 6: step 5 cloud data after treatment is carried out to B-spline curves matching, obtain the skeleton as Hanzi features.
In described step 1, specifically comprise:
The gray level image of the text that scanning is got carries out smoothing processing, then the image after smoothing processing is carried out to binaryzation operation and be converted into the bianry image that only has black and white, wherein white pixel is background color, black picture element is prospect Chinese character color, and the disposal route of smoothing processing is to use the cvSmooth method of OpenCV to carry out Gaussian smoothing to neighborhood.
In described step 2, specifically comprise:
Image after binary conversion treatment is carried out to down-sampling, sampling is only carried out on black picture element, image is converted into point cloud model data by the sampling ratio of choosing setting, a point coordinate of the horizontal and vertical coordinate composition cloud data of each pixel being sampled.
In described step 3, specifically comprise:
The pixel that uses corrosion to check the gray level image after binary conversion treatment corrodes set operation, until reach the condition of etch-stop, obtains final thick axis point set;
The condition of described etch-stop is: in the image after binary conversion treatment, current point around has eight abutment points, judges whether any two of current point are communicated with between mutually in abutting connection with black color dots, if be not communicated with, represented it is axial point, otherwise is not axial point.
In described step 4, splitting condition is:
Utilize and the some cloud subset points in two adjacent circles of part is carried out to the angle α that PCA analyzes two local principal directions that obtain splitting condition is set.
In described step 4, PCA analyzes division, specifically comprises:
Concentrate and choose arbitrarily a untreated point from thick axial point, calculate local principal direction (Main Local Direction, MLD), if the result of current local principal direction Vi and more lower local principal direction Vj is-1 (when the angle of Vi and Vj is greater than θ (θ is predefined turning angle)), it is turning point, otherwise be not, continue to process along the next PCA central point of axial point search, if search is less than axial point, current point is turning point, finally first PCA central point of any one type i and last PCA central point are labeled as to starting point x (i) and terminal y (i), and the point between this is made as to i type, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi record and are used as to merge and matching, then concentrate and select arbitrarily again a point to carry out PCA analysis division from thick axial point, until all points are all processed complete, through limited number of time iteration, the thickest axial point concentrates institute to be a little all disposed, obtain final division set.
Described step 5 specifically comprises:
Maximum PCA radius in note PCA analytic process is MaxRadius, remember end points x (i) and the y (i) of a certain classification i, the PCA radius that two-end-point is corresponding is Rx (i) and Ry (i), dist (x (i), y (j)) return to the distance of any two ends point, union operation only occurs in the end points place of arbitrary classification, by judging whether end points place meets arbitrary merging condition wherein, in the time meeting, complete last union operation.
Described merging condition comprises:
Condition one: for two end points x (i) and the y (i) of classification i, if dist is (x (i), y (i)) <=Rx (i)+Ry (i), and whole PCA central points of classification i only have two-end-point, be that central point number is 2 (other are classified to rare 3 central points), and have end points x (j) or the y (j) of the second classification j crossing with two characteristic circle, classify i and classification j meet merging condition;
Condition two: for arbitrary classification i and classification j, if two-end-point arbitrarily in two classes, be assumed to be the classification x end points of i and the y end points of j class, the line vectorial Vij forming and any two points that the vectorial Vx angle of end points x is less than θ (θ is predefined turning angle) and classification i and the j end points place of classifying that meet x (i) and y (j) exist maximum step-length to be not more than the minimum spanning tree of RectSize/16 (RectSize is the cloud middle distance Euclidean distance of 2 farthest), and (V (x (i)), the angle of V (y (j)) belongs to [0, θ] or belong to [180 – θ, 180], presentation class i and classification j meet merging condition, wherein θ is predefined turning angle,
Condition three: in arbitrary classification i and classification j, if any two ends point in two classes, be assumed to be the x end points of i class and the y end points of j class, dist (x (i), y (j)) <=Rx (i)+Ry (j), and in two end points, get along well self x end points of y (j) intersects, get along well self y end points of x (i) intersects, two-end-point is also got along well round the intersecting of PCA unit character of the third classification simultaneously, and any two points at classification i and classification j end points place exists maximum step-length to be not more than the minimum spanning tree of RectSize/16, the i that classifies meets merging condition with classification j.
Described step 6 specifically comprises:
Adopt least square fitting (SDM) method, first by the center point set S producing in PCA analytic process center (i)(this center point set is other points in the PCA central point point cloud that around radius covers) is as initial B batten reference mark, quantity and position to reference mark are adjusted, and with SDM method iterative fitting B-spline curves, the B-spline curves that finally obtain are as the framework characteristic of Chinese character.With respect to other iterative fitting B-spline curves methods, SDM has iteration speed faster, more stable convergence.Utilize these B-spline curves as last Chinese character skeleton feature.
Gray level image after binary conversion treatment is corroded to operation and obtain thick axis point set; It is the set operation to pixel that the transfer pair bianry image of use morphology bianry image corrodes operation.Use specific corrosion core (or being called template) to carry out set operation to pixel, just can make border to internal contraction, corrode operation by limited number of time, finally obtain axis.The sets definition of corrosion is as follows:
In above-mentioned formula, S' once corrodes pixel set afterwards, and S is original pixels collection, and φ i is the set of corrosion core in i position." " operation represents: if current location corrosion core and S intersection point number are number c, return to this intersection point, otherwise return to 0.Pti is illustrated in the pixel on the bianry image that i orders.Corrode operation by limited number of time, can obtain final axis.
PCA analyzes (principal component analysis, PCA) also referred to as principal component analysis (PCA), is the basic fundamental in multivariable analysis.The method utilization that this patent proposes is carried out to the some cloud point set in two adjacent circles of part the angle α that PCA analyzes two local principal directions that obtain splitting condition is set, and the average central point in PCA analysis is simultaneously used as the reference mark while finally carrying out B-spline curves matching.In this patent, the some cloud point set in a radius of circle R is carried out to PCA analysis and be called as a PCA unit, a PCA unit comprises characteristic radius R, average central point χ, the characteristic circle taking R as radius taking χ as the center of circle and local direction V.
PCA is to centered by a point, and radius R is that the process that the some cloud point set in circle carries out PCA analysis is:
(1) the average χ of calculating sampling point first, as shown in Equation 3:
χ=(∑ Xi)/N formula 3
Xi is the point in radius circle, and N is the some number in radius circle.
(2) then calculate the deviation matrix C=X – χ of original vector matrix X and χ, then calculate covariance matrix T=CC t.
(3) finally adopt SVD decomposition method to calculate eigenvalue λ and the proper vector M of covariance matrix.The formula that Matrix C is carried out to unusual decomposition (SVD decomposition) is:
C=MSV tformula 4
Wherein, M is the proper vector of covariance matrix T, and column vector is arranged, and S is the diagonal matrix obtaining through unusual decomposition, V tbe a square formation, the column vector of V is C simultaneously tthe proper vector of C.Analyze by the PCA to cloud data, we have obtained local principal direction (Main Local Direction, MLD), and this direction V is the principal element that splitting condition is set.
Division result is merged, and the point that is combined rear point of crossing place carries out aftertreatment, in all possible point of crossing, and the larger place of aim curve ratio of curvature may be labeled as turning point by division, but in the larger place of ratio of curvature and point of crossing place may some classification be to belong to of a sort, therefore the type of " unanimously " should be merged, to reduce the quantity of classification.Suppose that cloud classification desirable matched curve length in arbitrfary point should not be less than or equal to its mean breadth.
After merging fixed point varieties of clouds type, may have error owing to putting cloud division under different radiuses, need to carry out aftertreatment to the some cloud at point of crossing place, disposal route is the point of crossing method for relocating based on apart from weights.
Beneficial effect of the present invention:
Text image is passed through level and smooth, denoising by the present invention, then be converted into bianry image, then bianry image is carried out to down-sampling and be converted into cloud data model, on this point cloud model, carry out PCA analysis, by " division-merging " operation, cloud data is classified, finally on the basis of classification, carry out the matching of B-spline curves, use the framework characteristic of this matched curve as Chinese character, carry out Chinese Character Recognition sort operation.Tool has the following advantages:
(1) problem of Chinese character image processing is converted into point cloud model processing, can reduces preferably the impact that noise extracts Chinese character skeleton.The situation comparatively violent to grey scale change also has reasonable result.
(2) curve simulating and original point cloud data and the ideal curve goodness of fit are higher, and can well process the situation of infall, can show preferably the framework characteristic of Chinese character.
(3) do not require the normalization pre-service of Chinese character being carried out to position, just can extract rational Chinese character skeleton, can in follow-up processing, carry out the identification of unique point and the operation of the Classification and Identification of Chinese character to Chinese character skeleton.
Brief description of the drawings
Fig. 1 is Research of Chinese Feature Extraction on basis of the present invention and the overview flow chart of identification;
Fig. 2 is feature extraction schematic flow sheet of the present invention;
Fig. 3 is the Chinese character skeleton feature extraction example realizing according to the present invention;
The X-type corrosion core using when Fig. 4 (a) corrodes operation;
The cross corrosion core using when Fig. 4 (b) corrodes operation;
The complete eight neighborhood types that use when Fig. 4 (c) corrodes operation.
Embodiment:
Below in conjunction with accompanying drawing, the present invention is described in detail:
As shown in Figure 1; Overall procedure based on Research of Chinese Feature Extraction of the present invention and identification comprises the following steps:
A. need text to be processed is scanned, obtain gray level image.
B. gray level image is carried out to pre-service, as level and smooth, binaryzation etc. obtains bianry image.
C. the image that comprises character is carried out the extraction of feature, obtain proper vector set.
D. according to this proper vector set and priori Hanzi features storehouse compare, matching operation, Chinese character is identified.
E. the Chinese character identifying is carried out to aftertreatment and obtain final text.
The main focus of content of the present invention is to extract how fast and effectively the framework characteristic of Chinese character, represents information, the architectural feature of Chinese character with this.As shown in Figure 2, concrete steps of the present invention are as follows:
Step 1: to text gray level image carry out smoothly, image denoising processing, and carry out binaryzation.
Step 2: binary image is carried out to down-sampling, be converted into point cloud model.
Step 3: corrode operation and obtain thick axis on former binary image.
Step 4: carry out PCA analysis based on this axis, obtain dividing result.
Step 5: division result is merged, and be combined rear some varieties of clouds type and carry out aftertreatment.
Step 6: sorted some cloud carried out to B-spline curves matching, obtain skeleton.
In step 1, to text gray level image carry out smoothly, image denoising processing, and it is as follows to carry out the detailed process of binaryzation:
Carry out the Gaussian smoothing of 5X5 neighborhood size for the cvSmooth method in the gray-scale map that the comprises text use OpenCV getting.Then the gray-scale map after using fixing threshold value to smoothing denoising is converted into bianry image, and bianry image only has the pixel of black and white, and wherein the pixel of black represents the region that the text in original gray level image covers.
In step 2, binary image is carried out to down-sampling, the detailed process that is converted into point cloud model is as follows:
Carry out down-sampling for the black picture element in the bianry image getting in step 1, can choose different sampling ratios as 1/5 of pixel, 1/8 etc., the black pixel point coordinate transformation that sampling is obtained is point cloud model data, and this cloud data can characterize the feature of text preferably.As shown in Figure 3.
In step 3, on former binary image, corroding operation, to obtain the detailed process of thick axis as follows:
Use specific corrosion core (or being called template) to corrode set operation to pixel cloud, obtain rough axis.Use as Fig. 4 (a)-Fig. 4 (c) as shown in three kinds simultaneously and corrode cores, taking eight neighborhoods of current point as differentiation, be respectively cross, X-type and complete eight neighborhood types.Repeatedly corrode operation according to the order of ABC-ABC, until reach the condition of etch-stop.Use the advantage of above-mentioned three kinds of corrosion cores to be the serrated boundary that can effectively eliminate to produce while covering bianry image.
The definition of etch-stop operating conditions, the decision condition that is axial point (core point) is as follows: in bianry image, current point has eight abutment points around, whether any two that judge it be communicated with between mutually in abutting connection with black color dots, if be not communicated with (the connection here also comprises that diagonal line is communicated with), represent it is axial point, otherwise be not axial point.Corrode operation by limited number of time, can obtain final axis.
Step 4 is carried out PCA analysis based on this axis, and the detailed process that obtains dividing result is as follows:
According to the thick skeleton eroding away in step 3, on skeleton point, carry out PCA analysis.In the time estimating the PCA radius of some points, first provide an initial radium R, then taking this axial point as the center of circle, in circle taking R as radius, if white pixel point is being greater than a certain threshold value with ratio β=Σ pti/ Σ ptj (Σ pti is white pixel point number sum, and Σ ptj is closing of black pixel point number) of black pixel point, as chooses β and equal 0.15, the radius that this radius R ' is processed as PCA, otherwise increase this radius.Central point while carrying out PCA analysis, the axial point obtaining from step 2 obtains.In the time that we can carry out PCA analysis to any axial point, just can judge whether adjacent two principal direction V are harmonious.Provide next PCA central point Center (i+1) of search below and judge the definition whether two principal direction Vi, Vj are harmonious.
Definition axis point set S axis, current PC A central point Center (i), current PC A treatment radius R (i), current PC A processes local principal direction vector Vi, alternative axis point set S alt, S altin any point be designated as Ptj, and define the angle that angle (Vi, Vj) is compute vector Vi and Vj, definition threshold angle θ, wherein θ ∈ (0,45).For the partial points that PCA is processed can cover cloud data preferably, set the distance dist (i of the optional axial point Center of next forward (i+1) and reverse optional axial point Center (i-1) and current processing enter point Center (i), i+1), dist (i, i-1) all belong to [1, R (i)+0.5*R (i)-1].
The optional point set of forward:
S op_alt={Pt(x)|angle((Pt(x)-Center(i)),Vi)
∈ [0, θ], Pt (x) ∈ S axisformula 5
S op_altfor the optional point set of forward, Pt (x) is the point that optional point is concentrated, and angle ((Pt (x) – Center (i))) represents the sharp angle between Pt (x) point and current PC A central point Center (i).
Reverse optional point set:
S neg_alt={Pt(x)|angle((Pt(x)-Center(i)),Vi)
∈ [180-θ, 180], Pt (x) ∈ S axisformula 6
S neg_altfor reverse optional point set, Pt (x) is the point that optional point is concentrated, next central point of forward:
Center ( i + 1 ) = min ( angle ( Pt ( x ) - Center ( i ) ) , Vi ) &ForAll; Pt ( x ) &Element; S op _ alt Formula 7
Oppositely next central point:
Center ( i - 1 ) = min ( angle ( Pt ( x ) - Center ( i ) ) , - Vi ) &ForAll; Pt ( x ) &Element; S neg _ alt Formula 8
Arbitrary PCA central point is processed function:
F ( i , j ) = 1 if angle ( Vi , Vj ) &Element; [ 0 , &theta; ] 2 if angle ( Vi , Vj ) &Element; [ 180 - &theta; , 180 ] - 1 ( else ) Formula 9
The algorithm that axial point is carried out to PCA processing and division is as follows: defining classification counter m, initial m is 0.
# step 1.m adds 1, from untreated axial point concentrate appoint get any start PCA process, and mark such be m.
# step 2. is processed current point according to formula 9, if result is-1, finishes such search, and the processing of current direction stops, and forwards step 1 to.If result is 1 or 2, go to step 3.
# step 3. a bit, if search down a bit, goes to step 2 under processing according to 5,6,7,8 points of both direction recursive searches of formula, searches element otherwise stop current direction, and the processing of direction stops, and goes to step 1.
Through limited number of time iteration, axial point concentrates institute to be a little all disposed the most at last, obtains final division set.
Step 5 merges division result, and it is as follows to be combined the process that rear some varieties of clouds type carry out aftertreatment:
Dividing all possible point of crossing, and the larger place of aim curve ratio of curvature, all be split into turning point, but in the larger place of ratio of curvature and point of crossing place may some classification be to belong to of a sort, therefore the type of " unanimously " should be merged, to reduce the quantity of classification.Suppose that cloud classification desirable matched curve length in arbitrfary point should not be less than or equal to its mean breadth.Maximum PCA radius in note PCA analytic process is MaxRadius, remember end points x (i) and the y (i) of a certain classification i, the PCA radius that two-end-point is corresponding is Rx (i) and Ry (i), dist (x (i), y (j)) returns to the distance of any two ends point.Notice that union operation only occurs in the end points place of arbitrary classification.Provide the definition of " consistance " below.
Condition one: for two end points x (i) and the y (i) of classification i, if dist is (x (i), y (i)) <=Rx (i)+Ry (i), and whole PCA central points of classification i only have two-end-point, and have end points x (j) or the y (j) of the second classification j crossing with two characteristic circle, the i that classifies is consistent with classification j.
Condition two: for arbitrary classification i and classification j, if meeting any two points that vectorial Vij that the line of x (i) and y (j) forms and the vectorial Vx angle of end points x be less than θ (θ is predefined turning angle) and classification i and the j end points place of classifying, 2 class any two ends points (being assumed to be the x end points of i class and the y end points of j class at this) exist minimum spanning tree that maximum step-length is not more than RectSize/16 1., and angle (V (x (i)), V (y (j)) ∈ [0, θ] or ∈ [180 – θ, 180] 2., presentation class i is consistent with classification j, wherein θ is previously defined turning angle.
Condition three: in arbitrary classification i and classification j, if 2 class any two ends points (being assumed to be the x end points of i class and the y end points of j class at this) dist (x (i), y (j)) <=Rx (i)+Ry (j) is 3., and in two end points, get along well self x end points of y (j) intersects, get along well self y end points of x (i) intersects, two-end-point is also got along well round the intersecting 4. of PCA unit character of the third classification simultaneously, and any two points at classification i and classification j end points place exists maximum step-length to be not more than the minimum spanning tree of avgRectSize, the i that classifies is consistent with classification j.
Three of brief analysis merge the adequacy of condition: because the maximum radius of PCA unit is MaxRadius, therefore need to judge whether to carry out any two class end points x (i) of union operation, y (j) meets inequality:
Dist (xi, yj)-Rx (i)-Ry (j) <=2*MaxRadius formula 10
For condition one, because cloud classification desirable matched curve length in arbitrfary point should not be less than the hypothesis of its mean breadth, therefore the PCA units of arbitrfary point cloud classification is not less than two, can obtain thus the unreasonable classification that produced by axis noise effect when situation that PCA units is less than two must be division, need to merge with the most contiguous type.
Condition two is to consistent but merged by the type of another kind of partition to point of crossing prescription.Two classes that need to merge arbitrarily for point of crossing place, need to ensure that the two class end points MLD of place angles are less than the threshold value (in condition two, formula 2.) of definition, and the point of two classes are continuous (in condition two, formula 1.).
Condition three is to originally belonging to same type but the situation that is split into multiclass because the adjacent MLD of PCA unit angle exceedes threshold value merges.Wherein 3. formula shows that two-end-point PCA unit circle intersects, and 4. formula shows that at two-end-point place any class right and wrong are from ring, and two classes the 3rd class of all getting along well intersects, the continuity of two mid points that also should satisfy condition simultaneously.
For " consistance " definition in above-mentioned condition, represent that two classes are the classification that can merge, if classification i is consistent with classification j, i class is included into j class.Consistent type can be combined into a class by union operation.
The method that some cloud after being combined carries out aftertreatment is:
For any infall point pt (k), calculate the minor increment D of the initial B-spline curves of the multiclass point such as type i, the j data that itself and this point belong to pdk, and use the PCA treatment radius Ri (k) nearest apart from this point, Rj (k) etc. to estimate the mean radius of the classes such as this point affiliated type i, j at pt (k) some place.Suppose that certain 1 pt (k) belongs to two class i, j, and the minor increment of the initial B-spline curves Curve (i) of distance and Curve (j) is respectively D pdkand D (i) pdk(j), the nearest PCA processing enter point radius that belongs to i and j class is Ri (k), Rj (k), and note weighted minimum distance ratio is λ d:
λ d=(D pdk(i)/D pdk(j)) × (Rj (k)/Ri (k)) formula 11
If λ dbe less than threshold value ratio or be greater than 1/ratio, belong to i and j class, if λ dbe greater than this point of threshold value ratio and belong to j class, if λ dbe less than 1/ratio, belong to i class.
Step 6 is carried out B-spline curves matching to sorted some cloud, obtains skeleton as follows as the process of Hanzi features:
Classification results is carried out to the matching of B-spline curves, adopt SDM method, first by the center point set S producing in aforementioned PCA analytic process center (i)as initial reference mark, quantity and position to reference mark are adjusted, and with SDM method iterative fitting B-spline curves, until being less than certain threshold epsilon or carrying out repeatedly iteration its squared distance (SD) error variation afterwards, its squared distance (SD) error is less than certain threshold value ζ.Final B-spline curves are as the framework characteristic of final Chinese character.
Although above-mentioned explanation combines design sketch and process flow diagram has carried out detailed description; but be not the restriction that protection scope of the present invention is carried out; technician can modify to algorithm or be out of shape on basis of the present invention, but acquired results is still in protection scope of the present invention.

Claims (10)

1. the antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1: the gray level image of need text to be processed is carried out to pre-service, comprise gray level image is carried out to smoothing processing, and gray level image is carried out to binary conversion treatment;
Step 2: the gray level image after binary conversion treatment is carried out to down-sampling, generate point cloud model data;
Step 3: the gray level image after binary conversion treatment is corroded to operation and obtain thick axis point set;
Step 4: carry out PCA based on this axial point cloud according to splitting condition and analyze division, obtain dividing result;
Step 5: division result is merged, and the point that is combined rear point of crossing place carries out aftertreatment;
Step 6: after treatment some cloud of step 5 carried out to B-spline curves matching, obtain the skeleton as Hanzi features.
2. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, in described step 1, specifically comprises:
The gray level image of the text that scanning is got carries out smoothing processing, then the image after smoothing processing is carried out to binaryzation operation and be converted into the bianry image that only has black and white, wherein white pixel is background color, black picture element is prospect Chinese character color, and the disposal route of smoothing processing is to use the cvSmooth method of OpenCV to carry out Gaussian smoothing to neighborhood.
3. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, in described step 2, specifically comprises:
Image after binary conversion treatment is carried out to down-sampling, sampling is only carried out on black picture element, image is converted into point cloud model data by the sampling ratio of choosing setting, a point coordinate of the horizontal and vertical coordinate composition cloud data of each pixel being sampled.
4. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, in described step 3, specifically comprises:
The pixel cloud that uses corrosion to check the gray level image after binary conversion treatment corrodes set operation, until reach the condition of etch-stop, obtains final thick axis point set.
5. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 4, it is characterized in that, the condition of described etch-stop is: in the image after binary conversion treatment, current point has eight abutment points around, whether any two that judge current point be communicated with between mutually in abutting connection with black color dots, if be not communicated with, represent it is axial point, otherwise be not axial point.
6. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, in described step 4, splitting condition is:
Utilize and the some cloud in two adjacent circles of part is carried out to the angle α that PCA analyzes two local principal directions that obtain splitting condition is set.
7. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, in described step 4, PCA analyzes division, specifically comprises:
Concentrate and choose arbitrarily a untreated point from thick axial point, calculate local principal direction, if the result of current local principal direction Vi and more lower local principal direction Vj is-1, it is turning point, otherwise be not, continue to process along the next PCA central point of axial point search, if search is less than axial point, current point is turning point, finally first PCA central point of any one type i and last PCA central point are labeled as to starting point x (i) and terminal y (i), and the point between this is made as to i type, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi record and are used as to merge and matching, then concentrate and select arbitrarily again a point to carry out PCA analysis division from thick axial point, until all points are all processed complete, through limited number of time iteration, the thickest axial point concentrates institute to be a little all disposed, obtain final division set.
8. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, described step 5 specifically comprises:
Maximum PCA radius in note PCA analytic process is MaxRadius, remember end points x (i) and the y (i) of a certain classification i, the PCA radius that two-end-point is corresponding is Rx (i) and Ry (i), dist (x (i), y (j)) return to the distance of any two ends point, union operation only occurs in the end points place of arbitrary classification, by judging whether end points place meets arbitrary merging condition wherein, in the time meeting, complete last union operation.
9. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 8, is characterized in that, described merging condition comprises:
Condition one: for two end points x (i) and the y (i) of classification i, if dist is (x (i), y (i)) <=Rx (i)+Ry (i), and whole PCA central points of classification i only have two-end-point, be that central point number is 2, and have end points x (j) or the y (j) of the second classification j crossing with two characteristic circle, classify i and classification j meet merging condition;
Condition two: for arbitrary classification i and classification j, if two-end-point arbitrarily in two classes, be assumed to be the classification x end points of i and the y end points of j class, the line vectorial Vij forming and any two points that the vectorial Vx angle of end points x is less than θ (θ is predefined turning angle) and classification i and the j end points place of classifying that meet x (i) and y (j) exist maximum step-length to be not more than the minimum spanning tree of RectSize/16, and (V (x (i)), the angle of V (y (j)) belongs to [0, θ] or belong to [180 – θ, 180], presentation class i and classification j meet merging condition, wherein θ is previously defined turning angle,
Condition three: in arbitrary classification i and classification j, if any two ends point in two classes, be assumed to be the x end points of i class and the y end points of j class, dist (x (i), y (j)) <=Rx (i)+Ry (j), and in two end points, get along well self x end points of y (j) intersects, get along well self y end points of x (i) intersects, two-end-point is also got along well round the intersecting of PCA unit character of the third classification simultaneously, and any two points at classification i and classification j end points place exists maximum step-length to be not more than the minimum spanning tree of avgRectSize, the i that classifies meets merging condition with classification j.
10. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton as claimed in claim 1, is characterized in that, described step 6 specifically comprises:
Adopt minimum squared distance approximating method, first by the center point set S producing in PCA analytic process center (i)as initial B batten reference mark, quantity and position to reference mark are adjusted, and with SDM method iterative fitting B-spline curves, the B-spline curves that finally obtain are as the framework characteristic of Chinese character.
CN201410360498.2A 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton Active CN104156730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410360498.2A CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410360498.2A CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Publications (2)

Publication Number Publication Date
CN104156730A true CN104156730A (en) 2014-11-19
CN104156730B CN104156730B (en) 2017-12-01

Family

ID=51882227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410360498.2A Active CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Country Status (1)

Country Link
CN (1) CN104156730B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780412A (en) * 2016-11-28 2017-05-31 西安精雕软件科技有限公司 A kind of method that utilization handwritten form skeleton line generates machining path
CN108171144A (en) * 2017-12-26 2018-06-15 四川大学 Information processing method, device, electronic equipment and storage medium
CN109147469A (en) * 2018-07-09 2019-01-04 安徽慧视金瞳科技有限公司 A kind of calligraphy exercising method
CN109409211A (en) * 2018-09-11 2019-03-01 北京语言大学 The processing method and system of Chinese character skeleton pen section
CN109712147A (en) * 2018-12-19 2019-05-03 广东工业大学 A kind of interference fringe center line approximating method extracted based on Zhang-Suen image framework
CN110246104A (en) * 2019-06-13 2019-09-17 大连民族大学 A kind of Chinese character image processing method
WO2022233307A1 (en) * 2021-05-07 2022-11-10 天津理工大学 Weeding robot based on crop stalk positioning, and weeding method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186787A (en) * 2011-12-31 2013-07-03 廖志武 Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model
CN103268631A (en) * 2013-05-23 2013-08-28 中国科学院深圳先进技术研究院 Method and device for extracting point cloud framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186787A (en) * 2011-12-31 2013-07-03 廖志武 Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model
CN103268631A (en) * 2013-05-23 2013-08-28 中国科学院深圳先进技术研究院 Method and device for extracting point cloud framework

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DONGFANG ZHU等: "Fitting multiple curves to point clouds with complicated topological structures", 《COMPUTER-AIDED DESIGN ANDCOMPUTER GRAPHICS, IEEE》 *
HUAIPING YANG等: "Control point adjustment for B-spline curve approximation", 《COMPUTER-AIDED DESIGN》 *
侯显玲: "低质汉字骨架提取研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
张静: "骨架提取算法研究与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
黄爱民: "《数字图像处理与分析基础》", 31 August 2005 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780412A (en) * 2016-11-28 2017-05-31 西安精雕软件科技有限公司 A kind of method that utilization handwritten form skeleton line generates machining path
CN106780412B (en) * 2016-11-28 2020-04-14 西安精雕软件科技有限公司 Method for generating machining path by utilizing handwritten body skeleton line
CN108171144A (en) * 2017-12-26 2018-06-15 四川大学 Information processing method, device, electronic equipment and storage medium
CN109147469A (en) * 2018-07-09 2019-01-04 安徽慧视金瞳科技有限公司 A kind of calligraphy exercising method
CN109409211A (en) * 2018-09-11 2019-03-01 北京语言大学 The processing method and system of Chinese character skeleton pen section
CN109712147A (en) * 2018-12-19 2019-05-03 广东工业大学 A kind of interference fringe center line approximating method extracted based on Zhang-Suen image framework
CN110246104A (en) * 2019-06-13 2019-09-17 大连民族大学 A kind of Chinese character image processing method
CN110246104B (en) * 2019-06-13 2023-04-25 大连民族大学 Chinese character image processing method
WO2022233307A1 (en) * 2021-05-07 2022-11-10 天津理工大学 Weeding robot based on crop stalk positioning, and weeding method

Also Published As

Publication number Publication date
CN104156730B (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN104156730A (en) Anti-noise Chinese character feature extraction method based on framework
CN108256456B (en) Finger vein identification method based on multi-feature threshold fusion
CN101561866B (en) Character recognition method based on SIFT feature and gray scale difference value histogram feature
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN108021890B (en) High-resolution remote sensing image port detection method based on PLSA and BOW
CN103761531A (en) Sparse-coding license plate character recognition method based on shape and contour features
CN113033567B (en) Oracle bone rubbing image character training method fusing segmentation network and generation network
CN105095880A (en) LGBP encoding-based finger multi-modal feature fusion method
CN105117707A (en) Regional image-based facial expression recognition method
Hallale et al. Twelve directional feature extraction for handwritten English character recognition
CN102136074B (en) Man-machine interface (MMI) based wood image texture analyzing and identifying method
Vithlani et al. Structural and statistical feature extraction methods for character and digit recognition
CN104834891A (en) Method and system for filtering Chinese character image type spam
CN110188646B (en) Human ear identification method based on fusion of gradient direction histogram and local binary pattern
CN103714340A (en) Self-adaptation feature extracting method based on image partitioning
CN108154144A (en) A kind of name of vessel character locating method and system based on image
CN103942572A (en) Method and device for extracting facial expression features based on bidirectional compressed data space dimension reduction
CN107609482B (en) Chinese text image inversion discrimination method based on Chinese character stroke characteristics
Aravinda et al. Template matching method for Kannada handwritten recognition based on correlation analysis
Kang et al. Local segmentation of touching characters using contour based shape decomposition
Kang et al. Template based segmentation of touching components in handwritten text lines
Jameel et al. Offline recognition of handwritten urdu characters using b spline curves: A survey
Rusinol et al. CVC-UAB's Participation in the Flowchart Recognition Task of CLEF-IP 2012.
Eraqi et al. HMM-based offline Arabic handwriting recognition: Using new feature extraction and lexicon ranking techniques
CN111325270B (en) Dongba text recognition method based on template matching and BP neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant