CN103218613A

CN103218613A - Method and device for identifying handwritten form figures

Info

Publication number: CN103218613A
Application number: CN2013101230858A
Authority: CN
Inventors: 张莉; 周伟达; 王晓乾; 何书萍; 王邦军; 杨季文; 李凡长
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2013-04-10
Filing date: 2013-04-10
Publication date: 2013-07-24
Anticipated expiration: 2033-04-10
Also published as: CN103218613B

Abstract

The invention discloses a method and device for identifying handwritten form figures. The method for identifying the handwritten form figures includes the following steps: determining an image to be identified; determining at least three pixel features of a specific pixel in the image to be identified according to the gray level of the pixel; respectively determining corresponding covariance of the image to be identified according to the at least three pixel features of the specific pixel; respectively calculating the distance between each covariance of the image to be identified and a Lie group mean value which corresponds to each category of figure-type labels which are included in a preset training image set; respectively determining a figure-type label which corresponds to a minimum distance among a plurality of distances which are determined by each covariance of the image to be identified as a spare figure-type label; and determining the figure-type label with a largest number in the spare figure-type label as a figure-type label to be identified of the image to be identified. Therefore, by adoption of the method for identifying the handwritten form figures, identification accuracy of the handwritten form figures can be effectively improved.

Description

Handwritten digit recognition method and device

Technical Field

The invention relates to the technical field of handwritten form number recognition, in particular to a handwritten form number recognition method and device.

Background

The handwriting of arabic numerals, which is a common symbol in all countries of the world, frequently appears in various fields such as postal systems, bank checks, industrial applications, and the like. With the rapid development of computer technology and digital image processing technology, handwritten digit recognition technology is widely applied, and great convenience is brought to the work of people.

Since numbers in various fields often represent precise values, and small errors are likely to have unpredictable results, a simple and efficient handwritten number recognition method with high accuracy has been an important research direction.

With the popularization and application of machine learning technology, many physicists and chemists begin to widely use data of the lie group theory research related field; accordingly, in the field of handwritten numeral recognition technology, lie group structure data has been widely used in its good mathematical structure.

The lie group mean classifier (lieMeans) is a simple and effective classification method proposed by j.a. hartigan et al in the article "AK-Means Clustering Algorithm", however, it selects a single covariance characteristic to realize classification, so that the solution found by the gradient descent method is only a local minimum value, not necessarily a global minimum value, and the performance is not good enough when processing the multi-classification problem.

Therefore, the existing handwritten form recognition method based on the lie group mean classifier realizes classification by selecting a single covariance characteristic, cannot fully utilize the spatial information of an image to be recognized, and causes low recognition accuracy.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present invention provide a method and an apparatus for identifying handwritten form numbers, so as to improve accuracy of identification of handwritten form numbers, and the technical scheme is as follows:

in one aspect, an embodiment of the present invention provides a handwritten digit recognition method, including:

determining an image to be recognized, wherein the image to be recognized comprises a digital category label to be recognized in a handwritten form;

determining at least three pixel characteristics of a specific pixel point in the image to be identified according to the gray value of the pixel point;

respectively determining corresponding covariance of the image to be identified according to at least three pixel characteristics of the specific pixel point, wherein each pixel characteristic uniquely corresponds to one covariance;

respectively calculating the distance between each covariance of the image to be identified and the corresponding lie group mean value of each class of digital class label in a preset training image set; each training image in the training image set comprises a handwritten form of digital class labels, the digital class labels contained in the training image set relate to all digital classes, each class of digital class label in the training image set corresponds to at least three lie group mean values, and each covariance of the image to be recognized corresponds to one lie group mean value of each class of digital class label;

respectively determining a digital category label corresponding to the minimum distance in the plurality of distances determined for each covariance of the image to be identified as a standby digital category label;

and determining the digital category label with the largest number in the standby digital category labels as the digital category label to be identified.

Wherein, the calculation formula according to when determining three pixel characteristics of the specific pixel point in the image to be identified comprises:

φ_{1} (I, x, y) = {(x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |)}^{T}

φ_{2} (I, x, y) = {(I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |)}^{T}

φ_{3} (I, x, y) = {(\begin{matrix} x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, \sqrt{{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}^{2} + {| \frac{&PartialD;}{&PartialD; y} I (x, y) |}^{2}}, \\ | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |, a \tan (\frac{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}{| \frac{&PartialD;}{&PartialD; y} I (x, y) |}) \end{matrix})}^{T}

wherein phi is_j(I, x, y) (j =1,2, 3) is the j-th pixel feature of the pixel point (x, y) of the image to be recognized, I (x, y) represents the gray value at the pixel point (x, y),

is the first order partial derivative in the x direction at pixel point (x, y),

is the first order partial derivative in the y direction at pixel point (x, y),

is the second order partial derivative in the x direction at pixel point (x, y),

the second-order partial derivatives in the y direction at the pixel points (x, y) are represented, x is larger than or equal to 1 and smaller than or equal to m, m is a row pixel value in the image to be recognized, y is larger than or equal to 1 and smaller than or equal to n, n is a column pixel value in the image to be recognized, and T is matrix transposition.

Wherein, according to the three pixel characteristics of the specific pixel point, the calculation formula according to which the corresponding covariance of the image to be identified is respectively determined comprises:

C^{j} = \frac{1}{mn} Σ_{x = 1}^{m} Σ_{y = 1}^{n} {(φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I)) (φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I))}^{T}, j = 1,2,3

wherein, C^jThe covariance corresponding to the jth pixel feature,

and T is the average value of the jth pixel characteristic in the image to be identified, and the matrix transposition is carried out.

The calculation formula for respectively calculating the distance between each covariance of the image to be recognized and the corresponding lie group mean value of each class of digital class label in the preset training image set comprises:

d_{k}^{j} (C^{j}, m_{k}^{j}) = \sqrt{Σ_{i = 1}^{d_{i}} \ln (λ_{i}^{2})}, k = 1, \cdot \cdot \cdot, c, j = 1,2,3

wherein,

is the j-th mean value of the lie group corresponding to the k-th class digital class label, c is the class number of the digital class label, and lambda_iIs C^jAndof the generalized eigenvalue, d_iRepresenting the number of rows or columns of the covariance feature matrix.

The determining method of the mean value of the lie group of each type of digital category label included in the preset training image set comprises the following steps:

determining three pixel characteristics of each training image in the preset training image set according to the gray value of the pixel point;

determining corresponding covariance for each training image according to three pixel characteristics of the specific pixel point, wherein each pixel characteristic uniquely corresponds to one covariance;

and inputting the covariance corresponding to all the training images and related to the same pixel feature into a corresponding lie group mean classifier so as to determine the lie group mean value of each class of digital class labels and related to the pixel feature.

Wherein, the specific pixel point in the image to be identified comprises:

all pixel points in the image to be identified;

or,

and partial pixel points in the image to be recognized are pixel points of a handwriting area in the image to be recognized, and the handwriting area is a partial image area in the image to be recognized.

In another aspect, an embodiment of the present invention provides a handwritten digit recognition apparatus, including:

the device comprises a to-be-recognized image determining module, a recognition module and a recognition module, wherein the to-be-recognized image determining module is used for determining an image to be recognized, and the image to be recognized comprises a to-be-recognized digital category label in a handwritten form;

the pixel characteristic determining module is used for determining at least three pixel characteristics of a specific pixel point in the image to be identified according to the gray value of the pixel point;

the covariance determination module is used for respectively determining corresponding covariance of the image to be identified according to at least three pixel characteristics of the specific pixel point, wherein each pixel characteristic uniquely corresponds to one covariance;

the distance determining module is used for respectively calculating the distance between each covariance of the image to be identified and the corresponding lie group mean value of each type of digital class label in a preset training image set; each training image in the training image set comprises a handwritten form of digital class labels, the digital class labels contained in the training image set relate to all digital classes, each class of digital class label in the training image set corresponds to at least three lie group mean values, and each covariance of the image to be recognized corresponds to one lie group mean value of each class of digital class label;

a standby label determining module, configured to determine, as standby digital category labels, digital category labels corresponding to minimum distances among the distances determined for each covariance of the image to be identified, respectively;

and the to-be-identified label determining module is used for determining the digital category label with the largest number in the standby digital category labels as the to-be-identified digital category label.

In the scheme, at least three covariance characteristics are determined by using at least three pixel characteristics of a specific pixel point of an image to be recognized, and the category determination of handwritten figures is realized by using the determined at least three covariance characteristics. Therefore, compared with the mode of realizing classification by adopting a single covariance characteristic in the prior art, the scheme fully utilizes the spatial information of the image to be recognized, and therefore, the recognition accuracy of the handwritten form number can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a first flowchart of a handwritten digit recognition method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a handwritten digit recognition method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a handwritten digit recognition apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve the identification accuracy of handwritten numbers, the embodiment of the invention provides a method and a device for identifying handwritten numbers.

First, a method for recognizing handwritten numbers according to an embodiment of the present invention will be described.

As shown in fig. 1, a method of handwritten digit recognition may include:

s101, determining an image to be identified;

when handwritten numbers need to be recognized, an image to be recognized, which contains a digital category label to be recognized in a handwritten form, is determined first, and then subsequent processing is performed on the basis of the image to be recognized.

It should be noted that the numeric category label is a specific number, the category number corresponding to the numeric category label is 10 categories, and the 10 category label is: 0-9. The number category label to be identified may be any number from 0 to 9, and the handwriting form corresponding to the number category label to be identified may not be limited to one.

S102, determining at least three pixel characteristics of a specific pixel point in the image to be identified according to the gray value of the pixel point;

the stroke positions are beneficial to identifying numbers, and the gray value of the pixel point at the position with the stroke is different from other positions, so that the gray value of the pixel point can be used as important space information of the handwritten numbers to be identified. Therefore, after the image to be recognized is determined, in order to fully utilize the spatial information of the image to be recognized, at least three pixel characteristics of a specific pixel point in the image to be recognized can be determined according to the gray value of the pixel point.

It should be noted that, in order to ensure higher accuracy, a specific pixel point in the image to be recognized may include: and all the pixel points in the image to be identified. Further, in order to improve the processing efficiency on the premise of ensuring higher accuracy, the specific pixel points in the image to be recognized may include: and partial pixel points in the image to be recognized are pixel points of a handwriting area in the image to be recognized, and the handwriting area is a partial image area in the image to be recognized.

S103, respectively determining the corresponding covariance of the image to be identified according to at least three pixel characteristics of the specific pixel point;

wherein each pixel feature uniquely corresponds to one covariance.

After at least three pixel characteristics of a specific pixel point of the image to be recognized are determined, the covariance of the image to be recognized with respect to each pixel characteristic can be determined, and then the determined at least three covariances are utilized for subsequent processing.

S104, respectively calculating the distance between each covariance of the image to be recognized and the corresponding lie group mean value of each class of digital class label in a preset training image set;

each training image in the training image set comprises a handwritten digital class label, the digital class labels contained in the training image set relate to all digital classes, each class of digital class label in the training image set corresponds to at least three lie group mean values, and each covariance of the image to be recognized corresponds to one lie group mean value of each class of digital class label.

It will be appreciated that in practical applications, the number class labels included in the training image set relate to 10 numbers: 0 to 9; moreover, the number of the training images corresponding to each type of digital category label included in the preset training image set may be different or the same.

Further, it should be noted that the determining manner of the at least three lie group mean values corresponding to each class of digital class label in the training image set may include:

a. determining at least three pixel characteristics of each training image in the preset training image set according to the gray value of the pixel point;

b. determining corresponding covariance for each training image according to at least three pixel characteristics of the specific pixel point, wherein each pixel characteristic uniquely corresponds to one covariance;

c. and inputting the covariance corresponding to all the training images and related to the same pixel feature into a corresponding lie group mean classifier so as to determine the lie group mean value of each class of digital class labels and related to the pixel feature.

By the above method, each class of digital class label in the training image set corresponds to at least three lie group mean values.

S105, respectively determining a digital category label corresponding to the minimum distance in the plurality of distances determined for each covariance of the image to be identified as a standby digital category label;

and S106, determining the digital category label with the largest number in the spare digital category labels as the digital category label to be identified.

Since the smaller the distance between the covariance of the image to be recognized and the corresponding lie group mean value of a digital category label is, the greater the probability that the digital category label to be recognized is the digital category label is, after the distance between each covariance of the image to be recognized and the corresponding lie group mean value of each category of digital category label included in the preset training image set is determined, the digital category label corresponding to the minimum distance among the distances determined for each covariance of the image to be recognized may be determined as a spare digital category label, and then the largest number of digital category labels among the spare digital category labels is determined as the digital category label to be recognized.

The method for recognizing handwritten numbers provided by the embodiment of the invention is described below by taking three covariance characteristics as an example.

As shown in fig. 2, a method of handwritten digit recognition may include:

s201, determining an image to be identified;

It should be noted that the numeric category label is a specific number, the numeric category corresponding to the numeric category label is 10 categories, and the 10 category label is: 0-9. The number category label to be identified may be any number from 0 to 9, and the handwriting form corresponding to the number category label to be identified may not be limited to one.

S202, determining three pixel characteristics of a specific pixel point in the image to be identified according to the gray value of the pixel point;

the stroke positions are beneficial to identifying numbers, and the gray value of the pixel point at the position with the stroke is different from other positions, so that the gray value of the pixel point can be used as important space information of the handwritten numbers to be identified. Therefore, after the image to be recognized is determined, in order to fully utilize the spatial information of the image to be recognized, three pixel characteristics of a specific pixel point in the image to be recognized can be determined according to the gray value of the pixel point.

When determining three pixel characteristics of a specific pixel point in the image to be identified, the calculation formula according to which comprises:

φ_{1} (I, x, y) = {(x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |)}^{T} - - - (1)

φ_{2} (I, x, y) = {(I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |)}^{T} - - - (2)

φ_{3} (I, x, y) = {(\begin{matrix} x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, \sqrt{{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}^{2} + {| \frac{&PartialD;}{&PartialD; y} I (x, y) |}^{2}}, \\ | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |, a \tan (\frac{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}{| \frac{&PartialD;}{&PartialD; y} I (x, y) |}) \end{matrix})}^{T} - - - (3)

is the first order partial derivative in the x direction at pixel point (x, y),

is the first order partial derivative in the y direction at pixel point (x, y),is the second order partial derivative in the x direction at pixel point (x, y),

is the second-order partial derivative in the y direction at the pixel point (x, y), x is more than or equal to 1 and less than or equal to m, m is the row pixel value in the image to be recognized, y is more than or equal to 1 and less than or equal to n, n is the column pixel value in the image to be recognized, T is the running momentAnd (5) matrix transposition.

S203, respectively determining the corresponding covariance of the image to be identified according to the three pixel characteristics of the specific pixel point;

wherein each pixel feature uniquely corresponds to one covariance.

After three pixel characteristics of a specific pixel point of the image to be recognized are determined, the covariance of the image to be recognized about each pixel characteristic can be determined, and then the determined three covariances are utilized for subsequent processing.

Wherein, according to the three pixel characteristics of the specific pixel point, the calculation formula according to which the corresponding covariance of the image to be identified is respectively determined may include:

C^{j} = \frac{1}{mn} Σ_{x = 1}^{m} Σ_{y = 1}^{n} {(φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I)) (φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I))}^{T}, j = 1,2,3 - - - (4)

wherein, C^jThe covariance corresponding to the jth pixel feature,and T is the average value of the jth pixel characteristic in the image to be identified, and the matrix transposition is carried out.

S204, respectively calculating the distance between each covariance of the image to be recognized and the corresponding lie group mean value of each class of digital class label in a preset training image set;

each training image in the training image set comprises a handwritten digital class label, the digital class labels contained in the training image set relate to all digital classes, each class of digital class label in the training image set corresponds to three lie group mean values, and each covariance of the image to be recognized corresponds to one lie group mean value of each class of digital class label.

The calculation formula for respectively calculating the distance between each covariance of the image to be recognized and the corresponding lie group mean value of each type of digital category label in the preset training image set comprises:

d_{k}^{j} (C^{j}, m_{k}^{j}) = \sqrt{Σ_{i = 1}^{d_{i}} 1 n (λ_{i}^{2})}, k = 1, \cdot \cdot \cdot, c, j = 1,2,3 - - - (5)

wherein,

is the j-th mean value of the lie group corresponding to the k-th class digital class label, c is the class number of the digital class label, and lambda_iIs C^jAnd

of the generalized eigenvalue, d_iRepresenting the number of rows or columns of the covariance feature matrix.

Further, the determining method of the mean of the lie groups of each type of digital category label included in the preset training image set may include:

a. determining three pixel characteristics of each training image in the preset training image set according to the gray value of the pixel point;

b. determining corresponding covariance for each training image according to three pixel characteristics of the specific pixel point, wherein each pixel characteristic uniquely corresponds to one covariance;

It should be noted that, for the training image, the calculation formulas for determining the three pixel features of the specific pixel point are calculation formulas (1), (2) and (3), and the calculation formula for determining the three covariances of the training image is calculation formula (4).

S205, respectively determining a digital category label corresponding to the minimum distance in the plurality of distances determined for each covariance of the image to be identified as a standby digital category label;

and S206, determining the digital category label with the largest number in the spare digital category labels as the digital category label to be identified.

In the scheme, three covariance characteristics are determined by using three pixel characteristics of a specific pixel point of an image to be recognized, and the category determination of the handwritten form number is realized by using the determined three covariance characteristics. Therefore, compared with the mode of realizing classification by adopting a single covariance characteristic in the prior art, the scheme fully utilizes the spatial information of the image to be recognized, and therefore, the recognition accuracy of the handwritten form number can be effectively improved.

It should be noted that, constructing a preset training image set may randomly obtain a number of training images of each type of digital category label from the MNIST handwritten digital data set. Where MNIST is a subset of the U.S. famous dataset NIST, a common experimental dataset for pattern recognition, which has a training set of 60000 training images and a test set of 10000 test images.

The training process corresponding to the handwritten digit recognition method based on three covariance characteristics provided by the embodiment of the invention is described as follows:

(1) training an image processing process:

1) determining a set of training images

Wherein, I_i∈R^m×nIs the ith training image, m and n represent the row and column pixel values of the training image, l_iE {1, …, c } is I_iCorresponding numerical class labels, i.e. representations I_iWhich number is, N represents the total number of training images, c represents the number of categories of the number category label; whereinLet m = N =28, N =100c, and let i = 1.

2) For training image I_iThe pixel point at the upper point (x, y) extracts the following three pixel characteristics:

φ_{1} (I_{i}, x, y) = {(x, y, I_{i} (x, y), | \frac{&PartialD;}{&PartialD; x} I_{i} (x, y) |, | \frac{&PartialD;}{&PartialD; y} I_{i} (x, y) |)}^{T}

φ_{2} (I_{i}, x, y) = {(I_{i} (x, y), | \frac{&PartialD;}{&PartialD; x} I_{i} (x, y) |, | \frac{&PartialD;}{&PartialD; y} I_{i} (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I_{i} (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I_{i} (x, y) |)}^{T}

φ_{3} (I_{i}, x, y) = {(\begin{matrix} x, y, I_{i} (x, y), | \frac{&PartialD;}{&PartialD; x} I_{i} (x, y) |, | \frac{&PartialD;}{&PartialD; y} I_{i} (x, y) |, \sqrt{{| \frac{&PartialD;}{&PartialD; x} I_{i} (x, y) |}^{2} + {| \frac{&PartialD;}{&PartialD; y} I_{i} (x, y) |}^{2}}, \\ | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I_{i} (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I_{i} (x, y) |, a \tan (\frac{| \frac{&PartialD;}{&PartialD; x} I_{i} (x, y) |}{| \frac{&PartialD;}{&PartialD; y} I_{i} (x, y) |}) \end{matrix})}^{T}

wherein phi is_j(I, x, y) (j =1,2, 3), which is a training image I_iOf the pixel point (x, y), I_i(x, y) represents the gray value at the pixel point (x, y),is the first order partial derivative in the x direction at pixel point (x, y),

is the first order partial derivative in the y direction at pixel point (x, y),

is the second-order partial derivative in the y direction at the pixel point (x, y), x is more than or equal to 1 and less than or equal to 28, m is the training image I_iY is more than or equal to 1 and less than or equal to 28, n is the training image I_iT is matrix transposing.

3) Determining a training image I according to the extracted three pixel characteristics and the following calculation formula_iThree covariance of (2):

C_{i}^{j} = \frac{1}{mn} Σ_{x = 1}^{m} Σ_{y = 1}^{n} {(φ_{j} (I_{i}, x, y) - {\overset{&OverBar;}{φ}}_{j} (I_{i})) (φ_{j} (I_{i}, x, y) - {\overset{&OverBar;}{φ}}_{j} (I_{i}))}^{T}, j = 1,2,3

wherein,

for training image I_iThe covariance corresponding to the jth pixel feature of (a),

for training image I_iAnd T is the matrix transposition.

4) Stop if i = N, otherwise i = i +1, repeat 2) and 3).

5) Covariance of the obtained training imagesObtaining the mean value of each class of digital class labels by using the jth mean value classifier of the lie groups

k=1,…,c。

(2) And (3) testing an image processing process:

1) determining a test image I, wherein x ∈ R^m×n。

2) Extracting pixel characteristic phi from pixel point (x, y) on test image I_j(I, x, y), j =1,2,3, according to the same formula as the calculation formula for extracting three pixel features from the training image;

3) determining the covariance C of the test image I according to the three extracted pixel characteristics^jThe formula is the same as the calculation formula for determining the three covariances for the training image;

(3) the identification process comprises the following steps:

1) calculating the covariance C of the test image^jAnd the mean value of the lie group of each class of digital class label in the training image set

Is a distance therebetween, i.e.

d_{k}^{j} (C^{j}, m_{k}^{j}) = \sqrt{Σ_{i = 1}^{d_{i}} 1 n (λ_{i}^{2})}, k = 1, \cdot \cdot \cdot, c, j = 1,2,3

2) The classification result of the jth lie group mean classifier on the test image is q^jI.e. by

q^{j} = \underset{k = 1, \cdot \cdot \cdot, c}{\arg \min} d_{k}^{j} (C^{j}, m_{k}^{j})

3) And integrating the results of the three lie group mean classifiers according to a majority voting criterion, and outputting the category of the test image.

The effect of the invention can be verified by the following experiment:

randomly selecting 100 training images corresponding to each type of digital class labels from a training set, and randomly selecting 200 testing images from a testing set; this sampling process is repeated 10 times, and the final output result is the average result of the 10 times. The experimental comparison method comprises a plum group mean classifier and the invention.

The whole experimental procedure included three groups of experiments, namely two classifications for the number (1,9), three classifications for the number (1,7,9), and four classifications for the number (1,2,7, 9).

The results of the experiment are shown in table 1. The results of the lie group mean classifier are four, with results of different covariances and the average of the three. As can be seen from table 1, for the two-class classification, the three-class classification and the four-class classification, the false recognition rate of the present invention is smaller than that of the plum group mean classifier, so that the present invention has a better recognition effect.

TABLE 1

Through the above description of the method embodiments, those skilled in the art can clearly understand that the present invention can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media that can store program codes, such as Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and so on.

Corresponding to the above method embodiment, an embodiment of the present invention further provides a handwritten numeral recognition apparatus, as shown in fig. 3, which may include:

the image to be recognized determining module 110 is configured to determine an image to be recognized, where the image to be recognized includes a digital category tag to be recognized in a handwritten form;

the pixel feature determining module 120 is configured to determine at least three pixel features of a specific pixel point in the image to be identified according to a gray value of the pixel point;

a covariance determination module 130, configured to determine, according to at least three pixel features of the specific pixel point, respective covariance of the image to be identified, where each pixel feature uniquely corresponds to one covariance;

a distance determining module 140, configured to calculate a distance between each covariance of the image to be identified and a corresponding lie group mean of each class of digital class label included in a preset training image set; each training image in the training image set comprises a handwritten form of digital class labels, the digital class labels contained in the training image set relate to all digital classes, each class of digital class label in the training image set corresponds to at least three lie group mean values, and each covariance of the image to be recognized corresponds to one lie group mean value of each class of digital class label;

a spare tag determining module 150, configured to determine, as a spare digital class tag, a digital class tag corresponding to a minimum distance of a plurality of distances determined for each covariance of the image to be identified, respectively;

a to-be-identified tag determining module 160, configured to determine a digital category tag with the largest number in the spare digital category tags as the to-be-identified digital category tag.

The calculation formula according to which the pixel feature determining module 120 determines three pixel features of a specific pixel point in the image to be recognized may include:

φ_{1} (I, x, y) = {(x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |)}^{T}

φ_{2} (I, x, y) = {(I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |)}^{T}

φ_{3} (I, x, y) = {(\begin{matrix} x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, \sqrt{{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}^{2} + {| \frac{&PartialD;}{&PartialD; y} I (x, y) |}^{2}}, \\ | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |, a \tan (\frac{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}{| \frac{&PartialD;}{&PartialD; y} I (x, y) |}) \end{matrix})}^{T}

is the first order partial derivative in the x direction at pixel point (x, y),

is the first order partial derivative in the y direction at pixel point (x, y),

Correspondingly, the calculation formula according to which the covariance determination module 130 determines the respective covariance of the image to be identified according to the three pixel characteristics of the specific pixel point may include:

C^{j} = \frac{1}{mn} Σ_{x = 1}^{m} Σ_{y = 1}^{n} {(φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I)) (φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I))}^{T}, j = 1,2,3

wherein, C^jThe covariance corresponding to the jth pixel feature,

The calculation formula according to which the distance determining module 140 calculates the distance between each covariance of the image to be recognized and the corresponding lie group mean of each class of digital class label included in the preset training image set may include:

d_{k}^{j} (C^{j}, m_{k}^{j}) = \sqrt{Σ_{i = 1}^{d_{i}} 1 n (λ_{i}^{2})}, k = 1, \cdot \cdot \cdot, c, j = 1,2,3

wherein,

is the j-th mean value of the lie group corresponding to the k-th class digital class label, c is the class number of the digital class label, and lambda_iIs Cj and

For device or system embodiments, as they correspond substantially to method embodiments, reference may be made to the method embodiments for some of their descriptions. The above-described embodiments of the apparatus or system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways without departing from the spirit and scope of the present application. The present embodiment is an exemplary example only, and should not be taken as limiting, and the specific disclosure should not be taken as limiting the purpose of the application. For example, the division of the unit or the sub-unit is only one logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or a plurality of sub-units are combined together. In addition, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

Additionally, the systems, apparatus, and methods described, as well as the illustrations of various embodiments, may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present application. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The foregoing is directed to embodiments of the present invention, and it is understood that various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention.

Claims

1. A method for handwritten digit recognition, comprising:

2. The method according to claim 1, wherein the calculation formula according to which the three pixel characteristics of the specific pixel point in the image to be recognized are determined comprises:

φ_{1} (I, x, y) = {(x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |)}^{T}

φ_{2} (I, x, y) = {(I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |)}^{T}

φ_{3} (I, x, y) = {(\begin{matrix} x, y, I (x, y), | \frac{&PartialD;}{&PartialD; x} I (x, y) |, | \frac{&PartialD;}{&PartialD; y} I (x, y) |, \sqrt{{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}^{2} + {| \frac{&PartialD;}{&PartialD; y} I (x, y) |}^{2}}, \\ | \frac{{&PartialD;}^{2}}{&PartialD; x &PartialD; x} I (x, y) |, | \frac{{&PartialD;}^{2}}{&PartialD; y &PartialD; y} I (x, y) |, a \tan (\frac{| \frac{&PartialD;}{&PartialD; x} I (x, y) |}{| \frac{&PartialD;}{&PartialD; y} I (x, y) |}) \end{matrix})}^{T}

wherein phi is_j(I, x, y) (j =1,2, 3) is the j-th pixel feature of the pixel point (x, y) of the image to be recognized, I (x, y) represents the gray value at the pixel point (x, y),is the first order partial derivative in the x direction at pixel point (x, y),

3. The method of claim 2, wherein the calculation formula according to which the respective covariances of the image to be recognized are respectively determined according to the three pixel characteristics of the specific pixel point comprises:

C^{j} = \frac{1}{mn} Σ_{x = 1}^{m} Σ_{y = 1}^{n} (φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I)) {(φ_{j} (I, x, y) - {\overset{&OverBar;}{φ}}_{j} (I))}^{T}, j = 1,2,3

wherein, C^jThe covariance corresponding to the jth pixel feature,

4. The method of claim 3, wherein the calculation formula for calculating the distance between each covariance of the image to be recognized and the mean of the corresponding lie group of each class of digital class label included in the preset training image set comprises:

d_{k}^{j} (C^{j}, m_{k}^{j}) = \sqrt{Σ_{i = 1}^{d_{i}} 1 n (λ_{i}^{2})}, k = 1, \cdot \cdot \cdot, c, j = 1,2,3

wherein,

5. The method of claim 4, wherein the determining the mean of the lie groups of each class of digital class labels included in the preset training image set comprises:

6. The method of claim 1, wherein the specific pixel points in the image to be recognized comprise:

all pixel points in the image to be identified;

or,

7. A handwritten number recognition device, comprising: