CN111091128B - Character picture classification method and device and electronic equipment - Google Patents

Character picture classification method and device and electronic equipment Download PDF

Info

Publication number
CN111091128B
CN111091128B CN201911314877.7A CN201911314877A CN111091128B CN 111091128 B CN111091128 B CN 111091128B CN 201911314877 A CN201911314877 A CN 201911314877A CN 111091128 B CN111091128 B CN 111091128B
Authority
CN
China
Prior art keywords
picture
character
classification
processed
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911314877.7A
Other languages
Chinese (zh)
Other versions
CN111091128A (en
Inventor
薛亮
杨陆
张超
王晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Data Driven Technology Co ltd
Original Assignee
Beijing Data Driven Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Data Driven Technology Co ltd filed Critical Beijing Data Driven Technology Co ltd
Priority to CN201911314877.7A priority Critical patent/CN111091128B/en
Publication of CN111091128A publication Critical patent/CN111091128A/en
Application granted granted Critical
Publication of CN111091128B publication Critical patent/CN111091128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a character picture classification method, a device and electronic equipment, wherein the method comprises the following steps: extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data; matching the feature data with picture features in a preset sample library; if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features. According to the method, firstly, the characteristic data of the character pictures to be processed are matched with the picture characteristics in the preset sample library, and under the condition that the matching fails, the similarity matching is carried out based on the similarity between the characteristic data and the picture characteristics, so that the classification of the character pictures to be processed is determined.

Description

Character picture classification method and device and electronic equipment
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a method and an apparatus for classifying character and picture, and an electronic device.
Background
In the related art, the method for classifying the character pictures generally matches the feature data of the character pictures with the feature data corresponding to the preset character picture library, and if the feature data completely matches the feature data of the character pictures, the characters corresponding to the feature data completely matches the feature data of the character pictures in the preset character pictures are determined as the classifications corresponding to the character pictures. However, in the case where there is no completely matched feature data, it is difficult to classify the character pictures, and therefore, the mode has poor applicability.
Disclosure of Invention
The invention aims to provide a character picture classification method, a character picture classification device and electronic equipment, so as to improve the applicability of character classification.
In a first aspect, an embodiment of the present invention provides a method for classifying character pictures, where the method includes: extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data; matching the feature data with picture features in a preset sample library; if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features.
In an optional embodiment, before the step of extracting the features of the obtained character picture to be processed to obtain the feature data, the method further includes: and carrying out normalization processing on the acquired character pictures to be processed to obtain the character pictures to be processed with the preset pixel number.
In an optional embodiment, the step of extracting the features of the obtained character picture to be processed to obtain feature data includes: performing binarization processing on the character picture to be processed; determining a characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing; and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
In an optional embodiment, the preset sample library includes a plurality of picture classifications, each picture classification includes a plurality of sample pictures, each sample picture corresponds to a picture feature, and the picture feature includes a feature value corresponding to each pixel point in the sample picture; the step of determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features comprises the following steps: according to the picture characteristics in each picture classification, determining a similarity index corresponding to a preset sample library; calculating the similarity between the character picture to be processed and the picture classification according to the similarity index; and classifying the pictures with high similarity as the classification of the character pictures to be processed.
In an optional embodiment, the feature value corresponding to each pixel point in the sample picture is a first value or a second value; the step of determining the similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification comprises the following steps: for each picture classification in a preset sample library, the following steps are executed: for each pixel point, calculating the number of the first numerical value of the characteristic value and the number of the second numerical value of the characteristic value in the sample picture corresponding to the current picture classification; according to the number of the first values and the number of the second values, the probability that the characteristic value is the first value and the probability that the characteristic value is the second value on each pixel point are obtained; and determining the probability that the characteristic value is a first value and the probability that the characteristic value is a second value on each pixel point corresponding to each picture classification as a similarity index.
In an optional embodiment, the step of calculating the similarity between the character picture to be processed and the picture classification according to the similarity index includes: obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first value and the probability that the characteristic value is a second value; the step of classifying the pictures with high similarity as the classification of the character pictures to be processed comprises the following steps: and classifying the picture with the highest score, and determining the classification of the character picture to be processed.
In an optional embodiment, the step of obtaining the score of the character picture to be processed on each picture classification according to the probability that the feature value on each pixel point corresponding to each picture classification is the first value and the probability that the feature value is the second value includes: for each picture classification, the following steps are performed: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability of the characteristic value of each pixel point corresponding to the current picture classification being a first value and the probability of the characteristic value being a second value; and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
In a second aspect, an embodiment of the present invention provides a device for classifying a character picture, including: the feature extraction module is used for extracting the features of the acquired character pictures to be processed to obtain feature data; the complete matching module is used for completely matching the characteristic data with the picture characteristics in the preset sample library; and the similarity matching module is used for determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features if the picture features which are completely matched with the feature data do not exist in the preset sample library.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions executable by the processor to implement the character picture classification method according to any one of the foregoing embodiments.
In a fourth aspect, embodiments of the present invention provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement a method of classifying a character picture according to any one of the preceding embodiments.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a character picture classification method, a character picture classification device and electronic equipment, which are characterized in that firstly, feature extraction is carried out on an acquired character picture to be processed to obtain feature data; matching the feature data with picture features in a preset sample library; if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features. In the method, firstly, feature data of the character pictures to be processed are matched with picture features in a preset sample library, and then similarity matching is carried out based on similarity between the feature data and the picture features under the condition of failure in matching so as to determine classification of the character pictures to be processed.
Additional features and advantages of the invention will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the invention.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for classifying character pictures according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for classifying character pictures according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for classifying character pictures according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In consideration of the problem of poor applicability of a character picture classification method in the prior art, the embodiment of the invention provides a character picture classification method, a device and electronic equipment.
For the sake of understanding the present embodiment, first, a method for classifying character pictures disclosed in the present embodiment of the present invention is described in detail, as shown in fig. 1, and the method includes the following specific steps:
step S102, extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data.
The character picture to be processed can be a picture of a character area obtained from an image, and the image can be a shopping receipt or other images containing characters; the character picture can comprise a character, and the character can be Chinese characters, letters, numbers or other special marks and the like; in specific implementation, the characters in the image need to be divided to obtain a character picture corresponding to each character in the image.
In specific implementation, feature extraction can be performed according to pixel values of the character picture to be processed, and feature extraction can also be performed according to textures, colors and the like of the character picture to be processed, so that feature data is obtained, wherein the feature data is usually a feature value corresponding to each pixel point in the character picture to be processed.
Step S104, the characteristic data are matched with the picture characteristics in a preset sample library.
The preset sample library generally stores a large number of sample pictures and picture features corresponding to each sample picture, and each sample picture has the picture classification. The sample picture generally comprises English characters, numbers, symbols, chinese characters and the like. And matching the characteristic data in the character picture to be processed with the picture characteristics of the sample picture, and classifying the picture to which the sample picture corresponding to the picture characteristics belongs as the classification of the character picture to be processed when the picture characteristics completely consistent with the characteristic data exist in the preset sample library.
Step S106, if the image features matched with the feature data do not exist in the preset sample library, determining the classification of the character images to be processed according to the similarity between the feature data and the image features.
When no picture features completely consistent with the feature data exist in the preset sample library, the feature data needs to be subjected to similarity matching, namely, the similarity of the feature data and each picture feature is calculated, and the picture classification of the sample picture corresponding to the picture feature with the highest similarity is used as the classification of the character picture to be processed. The similarity may be generally determined according to a difference between the feature value of each pixel in the feature data and the feature value of each pixel in the image feature, or may be determined according to the number of identical feature values in the feature value of each pixel in the feature data and the feature value of each pixel in the image feature.
According to the character picture classification method provided by the embodiment of the invention, firstly, feature extraction is carried out on the acquired character picture to be processed to obtain feature data; matching the feature data with picture features in a preset sample library; if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features. In the method, firstly, feature data of the character pictures to be processed are completely matched with picture features in a preset sample library, and then similarity matching is carried out based on similarity between the feature data and the picture features under the condition that the complete matching fails so as to determine classification of the character pictures to be processed.
The embodiment of the invention also provides another character picture classification method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process (realized by the following steps S204-S208) for extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data, and a specific process (realized by the following steps S210-S212) for determining the classification of the character pictures to be processed according to the similarity between the characteristic data and the characteristics of the pictures; as shown in fig. 2, the method comprises the steps of:
step S202, obtaining a character picture to be processed.
When the character picture to be processed is obtained, firstly, normalization processing is needed to be carried out on the obtained character picture to be processed, and the character picture to be processed with the preset pixel number is obtained. The number of preset pixels is generally determined by the size of the sample pictures in the preset sample library, for example, the size of the sample pictures is 32×32, and then the preset pixels are 32×32, that is, the character pictures to be processed are normalized and scaled into character pictures of 32×32 pixels. In the following manner, the size of the default character picture to be processed is consistent with the size of the sample picture in the preset sample library.
Step S204, binarizing the character picture to be processed.
The binarization processing can obtain a binary image, and usually, each pixel on the picture has only two possible values or gray level states, and the binary image can be represented by a black-white image or a monochrome image. In specific implementation, setting the pixel value of an image area corresponding to a character in a character picture to be processed to 0, and setting the pixel value of an image area other than the character to 255 to obtain a binary image corresponding to the character picture to be processed; wherein, a pixel value of 0 corresponds to black, and a pixel value of 255 corresponds to white.
Step S206, determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing.
And setting pixel points with pixel values of 0 and 255 in the character picture to be processed after binarization processing as different characteristic values, wherein the characteristic values can be numerical values or different patterns. For example, a pixel point having a pixel value of 0 may be set to 1, a pixel point having a pixel value of 255 may be set to 0, that is, a black region of the character picture to be processed after the binarization processing may be set to 1, and a white region may be set to 0.
Step S208, the characteristic values of all the pixel points are spliced row by row, and the characteristic data of the character picture to be processed are obtained.
After the characteristic value of each pixel point of the character picture to be processed is obtained, the characteristic values on all the pixel points are extracted, and the characteristic matrix or the characteristic sequence (can also be a character string) is obtained by performing row-by-row splicing according to the position of each pixel point in the character picture to be processed, wherein the characteristic matrix or the characteristic sequence is the characteristic data of the character picture to be processed.
Step S210, judging whether picture features matched with the feature data exist in a preset sample library; if so, step S212 is performed; if not, execute step S214; the preset sample library comprises a plurality of picture classifications, each picture classification comprises a plurality of sample pictures, each sample picture corresponds to a picture feature, and the picture feature comprises a feature value corresponding to each pixel point in the sample picture.
Each sample picture in the preset sample library is provided with a unique picture classification corresponding to the sample picture, each sample picture is corresponding to a picture feature, and the picture feature and the feature data are generally identical in structure and determination mode, so that the picture feature and the feature data have a certain corresponding relation. In some embodiments, hash operations may be performed on the image features and feature data during the complete matching process, so as to reduce the dimension of the data and save the memory space, for example, after hash operations are performed on 1024-bit feature data, 32-bit feature data may be obtained.
Step S212, classifying the pictures corresponding to the picture features, and determining the classification corresponding to the character pictures to be processed; and (5) ending.
Step S214, according to the picture characteristics in each picture classification, determining the similarity index corresponding to the preset sample library.
The characteristic value corresponding to each pixel point in the sample picture is a first value or a second value; wherein the first and second values may be set according to the needs of the developer, for example, the first value may be set to 1 and the second value may be set to 0. In specific implementation, the step S210 may be implemented by the following steps 10-11:
step 10, for each picture classification in the preset sample library, executing the following steps: for each pixel point, calculating the number of the first numerical value of the characteristic value and the number of the second numerical value of the characteristic value in the sample picture corresponding to the current picture classification; and obtaining the probability that the characteristic value is the first value and the probability that the characteristic value is the second value on each pixel point according to the number of the first values and the number of the second values.
And 11, determining the probability that the characteristic value is a first numerical value and the probability that the characteristic value is a second numerical value on each pixel point corresponding to each picture classification as a similarity index.
In specific implementation, it is assumed that the picture classification 1 in the preset sample library includes 10 sample pictures, and the number of pixel points corresponding to each sample picture is 1024, that is, the picture features composed of 1024 feature values, where a first value corresponding to the feature value is set to 1, and a second value is set to 0, which is a schematic result of picture features corresponding to 10 sample pictures, where each line of character strings represents a picture feature corresponding to one sample picture, and each line of character strings should include 1024 values, and only the first ten values are shown due to limited space:
1100000000...
0001000000...
1111000000...
0110000000...
1000000000...
1100000000...
1100000000...
1100000000...
1100000000...
0000000000...
for the picture characteristics of the sample picture, the number of the characteristic values of 1 and the number of the characteristic values of 0 on each pixel point can be calculated, and for the 10 characteristic pictures, 7 characteristic values of 1 and 3 characteristic values of 0 are arranged on the first pixel point; 7 eigenvalues are 1 on the second bit pixel point, and 3 eigenvalues are 0; 2 eigenvalues are 1 and 8 eigenvalues are 0 on the third pixel point; and the number of the characteristic values of 1 and the number of the characteristic values of 0 on 1024 pixel points can be obtained by analogy, and the statistical data in the form of table 1 can be obtained.
TABLE 1
Eigenvalues 1 st bit 2 nd bit 3 rd bit Bit 4 Position 5 1024 th bit
0 3 3 8 8 10 0
1 7 7 2 2 0 10
Based on the number of 0 s and the number of 1 s on each pixel (corresponding to each pixel), the probability of the feature value of 1 and the probability of the feature value of 2 are calculated, and the calculation results are shown in the following table 2:
TABLE 2
Eigenvalues 1 st bit 2 nd bit 3 rd bit Bit 4 Position 5 1024 th bit
0 3/10 3/10 8/10 8/10 10/10 0/10
1 7/10 7/10 2/10 2/10 0/10 10/10
And determining the calculation result as a similarity index corresponding to the picture classification 1, and establishing the similarity index by establishing the similarity index through the mode for all picture classifications in the preset sample library, so that the similarity index corresponding to the preset sample library can be obtained.
Step S216, calculating the similarity between the character picture to be processed and the picture classification according to the similarity index.
And calculating the feature data corresponding to the character picture to be processed and the similarity index file corresponding to each picture class in the similarity index to obtain the similarity of the character picture to be processed and each picture class. In specific implementation, the step S216 may be implemented by the following steps:
obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first value and the probability that the characteristic value is a second value; the obtained image can represent the similarity between the character image to be processed and the image classification.
In a specific implementation, the score of the character picture to be processed on each picture class is calculated by the following steps 20-21:
step 20, for each picture classification, performing the following steps: and determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability of the characteristic value of each pixel point corresponding to the current picture classification being a first value and the probability of the characteristic value being a second value.
And step 21, adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
For example, when the feature data of the character picture to be processed contains 1024 feature values, each feature value is represented by a first value (for example, 1) and a second value (for example, 0), that is, the feature data is a 1024-bit character string represented by 0 or 1, and it is assumed that the feature data is 1010111101 … …; when the score of the feature data and the picture classification 1 is calculated, searching the probability that the feature value of the first pixel point in the picture classification 1 is 0.7 according to the feature value 1 of the first pixel point in the feature data, checking the probability that the feature value of the first pixel point in the picture classification 1 is 0.3 according to the feature value 0 of the second pixel point in the feature data, and thus obtaining the probability value of the pixel point in the picture classification 1 corresponding to each pixel point in the feature data by means of the column-pushing, and adding the obtained probability values to 0.7+0.3+0.2+0.8+ … to obtain the score of the character picture to be processed on the picture classification 1.
According to the above manner of calculating the score of the character picture to be processed on the picture classification 1, the score of the character picture to be processed on each picture classification in the preset sample library can be obtained.
In step S218, the pictures with high similarity are classified as the classifications of the character pictures to be processed. In specific implementation, the picture with the highest score can be classified and determined as the classification of the character picture to be processed.
In some embodiments, when the first value corresponding to the feature value is set to 1 and the second value is set to 0, the feature values corresponding to every four pixel points in the feature value of the picture can be combined to form a 2-ary number, so as to obtain a plurality of 2-ary numbers corresponding to the feature of the picture, thereby reducing the calculation times (that is, improving the calculation speed by 4 times compared with the original calculation speed) when calculating the similarity of the picture. For each picture classification in a preset sample library, the following steps are executed: acquiring the probability that the characteristic value of each pixel point corresponding to the current picture classification is 1 and the probability that the characteristic value is 0; determining the probability corresponding to each 2-system number when the 2-system number is a certain value according to the characteristic value of the 4 pixel points corresponding to each 2-system number; namely, the probabilities corresponding to the characteristic values of the 4 pixel points are added to obtain the probability of the 2-system number representing the characteristic values of the 4 pixel points; and then determining the probability of each bit 2 number corresponding to each picture classification as a certain numerical value as a similarity index.
For example, 2 combinations of 0, 1 are applied to each 2-ary number pair, and every four pixels in the picture feature are combined into one 2-ary number according to the rule of 2-ary, i.e., 0000 is determined to be 0, 0001 is determined to be 1, 0010 is determined to be 2, and so on until 1111 is determined to be 15. For the schematic result of the picture features corresponding to the 10 sample pictures in the picture classification 1 in the above embodiment, the first four pixels may be combined into a first group of 2-ary numbers, the 5 th to 8 th pixel values may be combined into a second group of 2-ary numbers, and so on; for the first group of 2-ary numbers, when the 2-ary numbers are 0000, the corresponding probabilities are the probability addition of the feature values of the first four pixel points to be 0, and when the 2-ary numbers are 0010, the corresponding probabilities are the probability addition of the feature values of each pixel point on the first four pixel points; according to the method, the probability value corresponding to each 2-bit number is counted, a similarity index file corresponding to the picture classification 1 can be obtained according to the probability, and the similarity index can be established by establishing the similarity index through all picture classifications in the preset sample library in the mode, so that the similarity index corresponding to the preset sample library can be obtained.
When the similarity index is obtained according to the 2-scale numbers, feature data in the character pictures to be processed also need to be converted into the 2-scale numbers, so that the picture classification in the similarity index can be used for calculating the similarity with the feature data, when the similarity is calculated, probabilities corresponding to each 2-scale number corresponding to the character pictures to be processed are added, the score of the character pictures to be processed on the current picture classification is obtained, and the picture classification corresponding to the highest score is used as the classification of the character pictures to be processed.
According to the character picture classification method, firstly, feature data of a character picture to be processed are matched with picture features in the preset sample library, and then, according to the picture features in each picture classification, a similarity index corresponding to the preset sample library is determined, and then, according to the similarity index, the similarity between the character picture to be processed and the picture classification is calculated, and the picture classification with high similarity is used as the classification of the character picture to be processed. The method avoids the phenomenon that the picture classification cannot be determined only by adopting complete matching, can determine the classification of each character picture, has strong applicability, and is simple to operate and high in classification efficiency.
Corresponding to the above embodiment of character picture classification, the embodiment of the present invention further provides a device for classifying character pictures, as shown in fig. 3, where the device includes:
the feature extraction module 30 is configured to perform feature extraction on the obtained character picture to be processed, so as to obtain feature data.
And the complete matching module 31 is used for matching the feature data with the picture features in the preset sample library.
The similarity matching module 32 is configured to determine, if no picture feature matching the feature data exists in the preset sample library, a classification of the character picture to be processed according to a similarity between the feature data and the picture feature.
The character picture classifying device firstly performs feature extraction on the acquired character picture to be processed to obtain feature data; matching the feature data with picture features in a preset sample library; if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features. In the method, firstly, feature data of the character pictures to be processed are matched with picture features in a preset sample library, and then similarity matching is carried out based on similarity between the feature data and the picture features under the condition of failure in matching so as to determine classification of the character pictures to be processed.
Further, the device further comprises a normalization module for: and carrying out normalization processing on the acquired character pictures to be processed to obtain the character pictures to be processed with the preset pixel number.
Further, the feature extraction module 30 is configured to: performing binarization processing on the character picture to be processed; determining a characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing; and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
Specifically, the preset sample library includes a plurality of picture classifications, each picture classification includes a plurality of sample pictures, each sample picture corresponds to a picture feature, and the picture feature includes a feature value corresponding to each pixel point in the sample picture; the above-mentioned similarity matching module 32 includes: the index determining unit is used for determining a similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification; the similarity determining unit is used for calculating the similarity between the character picture to be processed and the picture classification according to the similarity index; and the classification determining unit is used for classifying the pictures with high similarity as the classification of the character pictures to be processed.
Further, the feature value corresponding to each pixel point in the sample picture is a first value or a second value; the above index determining unit is configured to: for each picture classification in a preset sample library, the following steps are executed: for each pixel point, calculating the number of the first numerical value of the characteristic value and the number of the second numerical value of the characteristic value in the sample picture corresponding to the current picture classification; according to the number of the first values and the number of the second values, the probability that the characteristic value is the first value and the probability that the characteristic value is the second value on each pixel point are obtained; and determining the probability that the characteristic value is a first value and the probability that the characteristic value is a second value on each pixel point corresponding to each picture classification as a similarity index.
Further, the similarity determining unit is configured to: obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first value and the probability that the characteristic value is a second value; the above-mentioned classification determining unit is used for classifying the picture with highest score, and determining the classification of the character picture to be processed.
Further, the similarity determining unit is further configured to: for each picture classification, the following steps are performed: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability of the characteristic value of each pixel point corresponding to the current picture classification being a first value and the probability of the characteristic value being a second value; and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
The character and picture classifying device provided by the embodiment of the invention has the same implementation principle and technical effects as those of the embodiment of the method, and for the sake of brevity, reference may be made to the corresponding contents of the embodiment of the method.
The embodiment of the present invention further provides an electronic device, referring to fig. 4, where the electronic device includes a processor 101 and a memory 100, where the memory 100 stores machine executable instructions that can be executed by the processor 101, and the processor 101 executes the machine executable instructions to implement the above-mentioned character picture classification method.
Further, the electronic device shown in fig. 4 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
The memory 100 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 103 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 102 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.
The processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 101 or instructions in the form of software. The processor 101 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 100 and the processor 101 reads information in the memory 100 and in combination with its hardware performs the steps of the method of the previous embodiments.
The embodiment of the invention also provides a machine-readable storage medium, which stores machine-executable instructions that, when being called and executed by a processor, cause the processor to implement the above-mentioned character picture classification method, and the specific implementation can be referred to the method embodiment and will not be described herein.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and/or the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not described in detail herein.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for classifying character pictures, the method comprising:
extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data;
matching the characteristic data with picture characteristics in a preset sample library;
if the picture features matched with the feature data do not exist in the preset sample library, determining the classification of the character pictures to be processed according to the similarity between the feature data and the picture features;
the preset sample library comprises a plurality of picture classifications, each picture classification comprises a plurality of sample pictures, each sample picture corresponds to the picture characteristic, and the picture characteristic comprises a characteristic value of each pixel point in the sample picture;
and determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features, wherein the step comprises the following steps:
determining a similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification;
calculating the similarity between the character picture to be processed and the picture classification according to the similarity index;
classifying the pictures with high similarity as the classification of the character pictures to be processed;
the characteristic value corresponding to each pixel point in the sample picture is a first numerical value or a second numerical value;
according to the picture characteristics in each picture classification, determining a similarity index corresponding to the preset sample library, wherein the step comprises the following steps:
for each picture classification in the preset sample library, executing the following steps: for each pixel point, calculating the number of the characteristic values which are the first numerical values and the number of the characteristic values which are the second numerical values in the sample pictures corresponding to the current picture classification; according to the number of the first numerical values and the number of the second numerical values, the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point are obtained;
and determining the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point corresponding to each picture classification as the similarity index.
2. The method according to claim 1, wherein before the step of extracting the features from the acquired character image to be processed to obtain the feature data, the method further comprises:
and carrying out normalization processing on the obtained character pictures to be processed to obtain the character pictures to be processed with preset pixel numbers.
3. The method according to claim 1, wherein the step of extracting the features of the acquired character picture to be processed to obtain the feature data includes:
performing binarization processing on the character picture to be processed;
determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing;
and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
4. The method according to claim 1, wherein the step of calculating the similarity of the character picture to be processed and the picture classification from the similarity index comprises:
obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value is the first value and the probability that the characteristic value is the second value on each pixel point corresponding to each picture classification;
the step of classifying the pictures with high similarity as the classification of the character pictures to be processed comprises the following steps:
and classifying the picture with the highest score, and determining the classification of the character picture to be processed.
5. The method according to claim 4, wherein the step of obtaining the score of the character picture to be processed on each picture category according to the probability that the feature value is the first value and the probability that the feature value is the second value on each pixel point corresponding to each picture category comprises:
for each of the picture classifications, performing the steps of: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability of the characteristic value being the first value and the probability of the characteristic value being the second value on each pixel point corresponding to the current picture classification;
and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
6. A character picture classifying apparatus, characterized in that the apparatus comprises:
the feature extraction module is used for extracting the features of the acquired character pictures to be processed to obtain feature data;
the complete matching module is used for matching the characteristic data with the picture characteristics in the preset sample library;
the similarity matching module is used for determining the classification of the character pictures to be processed according to the similarity between the characteristic data and the picture characteristics if the picture characteristics matched with the characteristic data do not exist in the preset sample library;
the preset sample library comprises a plurality of picture classifications, each picture classification comprises a plurality of sample pictures, each sample picture corresponds to the picture characteristic, and the picture characteristic comprises a characteristic value of each pixel point in the sample picture;
and determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features, wherein the step comprises the following steps:
determining a similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification;
calculating the similarity between the character picture to be processed and the picture classification according to the similarity index;
classifying the pictures with high similarity as the classification of the character pictures to be processed;
the characteristic value corresponding to each pixel point in the sample picture is a first numerical value or a second numerical value;
according to the picture characteristics in each picture classification, determining a similarity index corresponding to the preset sample library, wherein the step comprises the following steps:
for each picture classification in the preset sample library, executing the following steps: for each pixel point, calculating the number of the characteristic values which are the first numerical values and the number of the characteristic values which are the second numerical values in the sample pictures corresponding to the current picture classification; according to the number of the first numerical values and the number of the second numerical values, the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point are obtained;
and determining the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point corresponding to each picture classification as the similarity index.
7. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the character picture classification method of any one of claims 1-5.
8. A machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the character picture classification method of any one of claims 1-5.
CN201911314877.7A 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment Active CN111091128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911314877.7A CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911314877.7A CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111091128A CN111091128A (en) 2020-05-01
CN111091128B true CN111091128B (en) 2023-09-22

Family

ID=70396496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911314877.7A Active CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111091128B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122352A (en) * 2011-03-01 2011-07-13 西安电子科技大学 Characteristic value distribution statistical property-based polarized SAR image classification method
CN102880874A (en) * 2012-09-29 2013-01-16 重庆新媒农信科技有限公司 Character recognition method and character recognizer
CN104376260A (en) * 2014-11-20 2015-02-25 东华大学 Malicious code visualized analyzing method based on Shannon information entropy
CN105631449A (en) * 2015-12-21 2016-06-01 华为技术有限公司 Method, device and equipment for segmenting picture
CN106599940A (en) * 2016-11-25 2017-04-26 东软集团股份有限公司 Picture character identification method and apparatus thereof
CN106874909A (en) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 A kind of recognition methods of image character and its device
CN107239784A (en) * 2017-07-03 2017-10-10 福建中金在线信息科技有限公司 A kind of image identification method, device, electronic equipment and readable storage medium storing program for executing
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN108108760A (en) * 2017-12-19 2018-06-01 山东大学 A kind of fast human face recognition
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN109543770A (en) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 Dot character recognition methods and device
CN109766893A (en) * 2019-01-09 2019-05-17 北京数衍科技有限公司 Picture character recognition methods suitable for receipt of doing shopping
CN110516100A (en) * 2019-08-29 2019-11-29 武汉纺织大学 A kind of calculation method of image similarity, system, storage medium and electronic equipment
CN110532413A (en) * 2019-07-22 2019-12-03 平安科技(深圳)有限公司 Information retrieval method, device based on picture match, computer equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8655029B2 (en) * 2012-04-10 2014-02-18 Seiko Epson Corporation Hash-based face recognition system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122352A (en) * 2011-03-01 2011-07-13 西安电子科技大学 Characteristic value distribution statistical property-based polarized SAR image classification method
CN102880874A (en) * 2012-09-29 2013-01-16 重庆新媒农信科技有限公司 Character recognition method and character recognizer
CN104376260A (en) * 2014-11-20 2015-02-25 东华大学 Malicious code visualized analyzing method based on Shannon information entropy
CN105631449A (en) * 2015-12-21 2016-06-01 华为技术有限公司 Method, device and equipment for segmenting picture
CN106599940A (en) * 2016-11-25 2017-04-26 东软集团股份有限公司 Picture character identification method and apparatus thereof
CN106874909A (en) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 A kind of recognition methods of image character and its device
CN107239784A (en) * 2017-07-03 2017-10-10 福建中金在线信息科技有限公司 A kind of image identification method, device, electronic equipment and readable storage medium storing program for executing
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN108108760A (en) * 2017-12-19 2018-06-01 山东大学 A kind of fast human face recognition
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN109543770A (en) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 Dot character recognition methods and device
CN109766893A (en) * 2019-01-09 2019-05-17 北京数衍科技有限公司 Picture character recognition methods suitable for receipt of doing shopping
CN110532413A (en) * 2019-07-22 2019-12-03 平安科技(深圳)有限公司 Information retrieval method, device based on picture match, computer equipment
CN110516100A (en) * 2019-08-29 2019-11-29 武汉纺织大学 A kind of calculation method of image similarity, system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111091128A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN112861648B (en) Character recognition method, character recognition device, electronic equipment and storage medium
US8818033B1 (en) System and method for detecting equations
CN111177445B (en) Standard primitive determination method, primitive identification method, device and electronic equipment
WO2016033710A1 (en) Scene text detection system and method
CN108717744B (en) Method and device for identifying seal serial number on financial document and terminal equipment
CN111428122B (en) Picture retrieval method and device and electronic equipment
CN110647895B (en) Phishing page identification method based on login box image and related equipment
US9922263B2 (en) System and method for detection and segmentation of touching characters for OCR
CN113129298B (en) Method for identifying definition of text image
CN117315377B (en) Image processing method and device based on machine vision and electronic equipment
CN113221601A (en) Character recognition method, device and computer readable storage medium
CN111311573B (en) Branch determination method and device and electronic equipment
CN111091128B (en) Character picture classification method and device and electronic equipment
CN112149678A (en) Character recognition method and device for special language and recognition model training method and device
CN117094342A (en) Image bar code detection method, device, equipment and storage medium
CN109871779B (en) Palm print identification method and electronic equipment
CN111488574A (en) Malicious software classification method, system, computer equipment and storage medium
JP2016184396A (en) Method and apparatus for removing mark in document image
CN115620039A (en) Image labeling method, device, equipment, medium and program product
CN111986176B (en) Crack image identification method, system, terminal and readable storage medium
CN115272682A (en) Target object detection method, target detection model training method and electronic equipment
CN114743205A (en) Image tampering detection method and device
CN113496132B (en) Two-dimensional code identification method and device, electronic equipment and storage medium
CN114495144A (en) Method and device for extracting form key-value information in text image
CN111428067B (en) Document picture acquisition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant