CN112990175B

CN112990175B - Method, device, computer equipment and storage medium for recognizing handwritten Chinese characters

Info

Publication number: CN112990175B
Application number: CN202110357440.2A
Authority: CN
Inventors: 邱泰儒; 姚旭峰; 沈小勇; 吕江波
Original assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2023-05-30
Anticipated expiration: 2041-04-01
Also published as: CN112990175A

Abstract

The application relates to a method, a device, computer equipment and a storage medium for recognizing handwritten Chinese characters. The method comprises the following steps: acquiring an image to be identified; the image to be identified comprises handwritten Chinese characters; extracting target image characteristics of the image to be identified; the target image features are used for representing text features of the image to be identified; dividing the target image features to obtain semantic information features of the handwritten Chinese characters; and determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character. By adopting the method, the aim of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information characteristics of the handwritten Chinese characters in the image to be recognized is fulfilled, and the recognition is only performed on the semantic information characteristics of the handwritten Chinese characters in the image to be recognized, so that the recognition accuracy of the handwritten Chinese characters is improved.

Description

Method, device, computer equipment and storage medium for recognizing handwritten Chinese characters

Technical Field

The present invention relates to the field of character recognition technology, and in particular, to a method, an apparatus, a computer device, and a storage medium for recognizing handwritten chinese characters.

Background

Text is one of the most important information carriers today, which is ubiquitous in both daily life and office teaching, such as chinese characters.

At present, a method for recognizing handwritten Chinese characters generally cuts out each handwritten Chinese character in a text line, and then recognizes the handwritten Chinese characters one by one, so as to obtain the content of the whole text line; however, recognition of handwritten chinese characters is easily affected by the writing style of the writer, and if the text line is merely divided into individual characters for recognition, the recognition accuracy of the handwritten chinese characters may be low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a handwritten chinese character recognition method, apparatus, computer device, and storage medium that can improve the recognition accuracy of handwritten chinese characters.

A method of recognition of handwritten chinese characters, the method comprising:

acquiring an image to be identified; the image to be identified comprises handwritten Chinese characters;

extracting target image characteristics of the image to be identified; the target image features are used for representing text features of the image to be identified;

dividing the target image features to obtain semantic information features of the handwritten Chinese characters;

And determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character.

In one embodiment, the extracting the target image feature of the image to be identified includes:

inputting the image to be identified into a feature extraction model to obtain image features output by neural network layers at least two preset positions in the feature extraction model;

and carrying out aggregation processing on the image features output by the neural network layers at the at least two preset positions to obtain target image features of the image to be identified.

In one embodiment, before the target image feature is subjected to segmentation processing to obtain the semantic information feature of the handwritten chinese character, the method further includes:

performing convolution processing on the target image features to obtain target image features after the convolution processing;

the step of dividing the target image feature to obtain semantic information features of the handwritten Chinese character comprises the following steps:

dividing the target image features after the convolution processing to obtain a first image feature and a second image feature; the first image features are used for representing semantic information features of the handwritten Chinese characters, the second image features are used for representing font appearance features of the handwritten Chinese characters, and feature dimensions of the first image features and the second image features are the same;

And identifying the first image characteristic as the semantic information characteristic of the handwritten Chinese character.

In one embodiment, the determining the recognition result of the handwritten chinese character in the image to be recognized according to the semantic information features of the handwritten chinese character includes:

converting the semantic information characteristics of the handwritten Chinese characters into corresponding text sequence characteristics;

and obtaining a combination of characters corresponding to each column of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the obtaining the combination of the characters corresponding to each sequence feature in the text sequence feature as the recognition result of the handwritten chinese character in the image to be recognized includes:

inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized; the pre-trained text prediction model is used for obtaining characters corresponding to each column of features in the text sequence features, and combining the characters corresponding to each column of features to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the pre-trained text prediction model is trained by:

acquiring sample semantic information features and sample font appearance features; the sample semantic information features comprise first semantic information features of a first text image, second semantic information features of a second text image and third semantic information features of a third text image, and the sample font appearance features comprise first font appearance features of the first text image, second font appearance features of the second text image and third font appearance features of the third text image; the second text image is identical to the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the authors of the text content are the same;

inputting a first text sequence feature corresponding to the first semantic information feature into a text prediction model to be trained to obtain a recognition result of a handwritten Chinese character in the first text image;

obtaining a target loss value according to the sample semantic information features, the sample font appearance features and the recognition result of the handwritten Chinese characters in the first text image;

And adjusting model parameters of the text prediction model to be trained according to the target loss value, and repeatedly training the text prediction model with the model parameters adjusted until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as the pre-trained text prediction model.

In one embodiment, the obtaining the target loss value according to the sample semantic information feature, the sample font appearance feature and the recognition result of the handwritten chinese character in the first text image includes:

obtaining a first loss value according to the first semantic information feature, the second semantic information feature and the third semantic information feature;

obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic;

obtaining a third loss value according to a difference value between a recognition result of the handwritten Chinese character in the first text image and an actual result of the handwritten Chinese character;

and obtaining the target loss value according to the first loss value, the second loss value and the third loss value.

An apparatus for recognition of handwritten chinese characters, the apparatus comprising:

the image acquisition module is used for acquiring an image to be identified; the image to be identified comprises handwritten Chinese characters;

the feature extraction module is used for extracting target image features of the image to be identified; the target image features are used for representing text features of the image to be identified;

the feature segmentation module is used for carrying out segmentation processing on the target image features to obtain semantic information features of the handwritten Chinese characters;

and the character recognition module is used for determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

The method, the device, the computer equipment and the storage medium for identifying the handwritten Chinese characters are characterized by acquiring an image to be identified comprising the handwritten Chinese characters and extracting target image characteristics of the image to be identified; the target image features are used for representing text features of the image to be identified; then, dividing the target image features to obtain semantic information features of the handwritten Chinese characters; finally, determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character; therefore, the aim of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters in the image to be recognized is fulfilled, the semantic information features of the handwritten Chinese characters in the image to be recognized are recognized, the appearance feature information of the handwritten Chinese characters is not considered, the recognition accuracy of the handwritten Chinese characters is improved, and the defect of lower recognition accuracy of the handwritten Chinese characters caused by the influence of the writing style of writers in the recognition process is avoided.

Drawings

FIG. 1 is a flow chart of a method for recognizing handwritten Chinese characters in one embodiment;

FIG. 2 is a flow diagram of training steps for a text prediction model in one embodiment;

FIG. 3 is a training schematic of a text prediction model in one embodiment;

FIG. 4 is a flow chart of a method for recognizing handwritten Chinese characters in yet another embodiment;

FIG. 5 is a block diagram of an apparatus for recognizing handwritten Chinese characters in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a method for recognizing handwritten chinese characters is provided, where the method is applied to a server for illustration, and the server may be implemented by a separate server or a server cluster formed by a plurality of servers; it will be appreciated that the method may also be applied to a terminal, and may also be applied to a system comprising a terminal and a server, and implemented by interaction of the terminal and the server. In this embodiment, the method includes the steps of:

Step S101, obtaining an image to be identified; the image to be recognized includes handwritten Chinese characters.

Wherein the image to be identified refers to an image comprising a string of handwritten chinese characters, such as an image comprising chinese characters written by a real user; in an actual scene, the image to be identified can be uploaded by a terminal, can be on a network, or can be stored locally.

Wherein, the handwritten Chinese character refers to a Chinese character written by a real user; it should be noted that, the handwritten chinese character mentioned in the present application refers to a text line of the handwritten chinese character, and is composed of a plurality of single character forms of the handwritten chinese character.

Specifically, the terminal generates a character recognition request according to an image to be recognized including handwritten Chinese characters uploaded by a user, and sends the character recognition request to a corresponding server; and the server analyzes the received character recognition request to obtain an image to be recognized.

Of course, the server may also obtain an image to be recognized including the handwritten chinese characters from the local database, and recognize the image to be recognized.

Step S102, extracting target image characteristics of an image to be identified; the target image features are used to represent text features of the image to be identified.

The target image features refer to image features for representing text features of an image to be identified, and particularly refer to image features comprising depth feature information and multi-scale feature information of the image to be identified.

Specifically, the server performs feature extraction processing on the image to be identified through a preset target image feature extraction instruction to obtain target image features of the image to be identified, wherein the target image features are used as text features of the image to be identified; the preset target image feature extraction instruction is an instruction for extracting target image features of an image to be identified.

Of course, the server may also input the image to be identified into the feature extraction model, and convolve the image to be identified through the feature extraction model to obtain the target image feature of the image to be identified; the feature extraction model is a neural network model for extracting target image features of an image to be identified.

Step S103, dividing the target image features to obtain semantic information features of the handwritten Chinese characters.

The semantic information features of the handwritten Chinese characters are used for representing content information of the handwritten Chinese characters. It should be noted that, if the text feature includes a semantic information feature and a font appearance feature, the target image feature for characterizing the text feature of the image to be recognized includes the semantic information feature, that is, the semantic information feature of the handwritten chinese character may be obtained by performing segmentation processing on the target image feature.

Specifically, the server acquires a preset semantic font decoupling instruction, and segments semantic information features from target image features for representing text features of an image to be identified according to the preset semantic font decoupling instruction, and the semantic information features are used as semantic information features of handwritten Chinese characters. Therefore, the method is beneficial to determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information characteristics of the handwritten Chinese characters, and the appearance characteristics of fonts are not required to be considered, so that the recognition accuracy of the handwritten Chinese characters is improved.

Step S104, determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information features of the handwritten Chinese character.

The recognition result of the handwritten Chinese characters in the image to be recognized refers to text content corresponding to the handwritten Chinese characters in the image to be recognized.

Specifically, the server converts semantic information features of the handwritten Chinese characters into corresponding text sequence features, the text sequence features are input into a text prediction model, and the text sequence features are processed through the text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized. Therefore, the purpose of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters in the image to be recognized is achieved, and the recognition is only carried out on the semantic information features of the handwritten Chinese characters in the image to be recognized, so that the recognition accuracy of the handwritten Chinese characters is improved.

For example, the server converts the semantic information features of the handwritten Chinese characters to obtain text sequence features of the handwritten Chinese characters; mapping each sequence feature in the text sequence features of the handwritten Chinese characters to obtain characters corresponding to each sequence feature in the text sequence features; and combining characters corresponding to each sequence feature in the text sequence features to obtain text content corresponding to the handwritten Chinese characters, wherein the text content is used as a recognition result of the handwritten Chinese characters in the image to be recognized.

In the method for recognizing the handwritten Chinese characters, the image to be recognized comprising the handwritten Chinese characters is obtained, and the target image characteristics of the image to be recognized are extracted; the target image features are used for representing text features of the image to be identified; then, dividing the target image features to obtain semantic information features of the handwritten Chinese characters; finally, determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character; therefore, the aim of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters in the image to be recognized is fulfilled, the semantic information features of the handwritten Chinese characters in the image to be recognized are recognized, the appearance feature information of the handwritten Chinese characters is not considered, the recognition accuracy of the handwritten Chinese characters is improved, and the defect of lower recognition accuracy of the handwritten Chinese characters caused by the influence of the writing style of writers in the recognition process is avoided.

In one embodiment, the step S101 extracts the target image features of the image to be identified, which specifically includes: inputting the image to be identified into a feature extraction model to obtain image features output by neural network layers at least two preset positions in the feature extraction model; and carrying out aggregation processing on the image features output by the neural network layers at least at two preset positions to obtain target image features of the image to be identified.

The feature extraction model is a neural network model comprising a plurality of neural network layers and is used for extracting image features of an image to be identified. It should be noted that, the at least two neural network layers at preset positions refer to two or more neural network layers at specific positions; in a practical scenario, three continuous neural network layers are specifically referred to.

Specifically, the server inputs the image to be identified into a feature extraction model, and outputs the image features of the image to be identified through a third layer of neural network layer, a fourth layer of neural network layer and a fifth layer of neural network layer of the feature extraction model; the image features output by the third neural network layer and the fourth neural network layer comprise multi-scale feature information, and the image features output by the fifth neural network layer comprise depth feature information; and the third neural network layer, the fourth neural network layer and the fifth neural network layer output the image features of the image to be identified through the image feature stitching instruction, so that the image features containing multi-scale feature information and depth feature information are obtained and used as target image features of the image to be identified and used for representing text features of the image to be identified.

For example, the server inputs the image to be identified into a residual network (ResNet 50), and uses FPN (Feature Pyramid Network ) to aggregate the image features output by the 3 rd, 4 th and 5 th residual modules of the residual network to obtain image features containing multi-scale feature information and depth feature information, wherein the image features are used as target image features for representing text features of the image to be identified. For example, the server performs up-sampling amplification on the image features output by the 5 th residual error module, and then performs splicing and convolution operation with the image features output by the 4 th residual error module; and then carrying out up-sampling amplification on the obtained result, and then splicing and convoluting the result with the image characteristics output by the 3 rd residual error module to obtain the target image characteristics of the image to be identified.

According to the technical scheme provided by the embodiment, the image to be identified is input into the feature extraction model to obtain the target image features containing the multi-scale feature information and the depth feature information, so that the handwriting Chinese character identification result obtained based on the target image features is more accurate, and the identification accuracy of the handwriting Chinese character is further improved.

In one embodiment, the step S103, before performing segmentation processing on the target image feature to obtain the semantic information feature of the handwritten chinese character, further includes: carrying out convolution processing on the target image characteristics to obtain target image characteristics after the convolution processing; then, in the step S103, the segmentation process is performed on the target image feature to obtain the semantic information feature of the handwritten chinese character, which specifically includes: dividing the target image features after convolution processing to obtain a first image feature and a second image feature; the first image feature is used for representing semantic information features of the handwritten Chinese character, the second image feature is used for representing font appearance features of the handwritten Chinese character, and feature dimensions of the first image feature and the second image feature are the same; the first image feature is identified as a semantic information feature of the handwritten chinese character.

The convolution processing is performed on the target image features to reduce the dimension of the target image features. The font appearance characteristics of the handwritten Chinese characters are used for representing the appearance information of the handwritten Chinese characters.

Specifically, the server inputs the target image characteristics into a convolution network, and carries out convolution processing on the target image characteristics through the convolution network to obtain target image characteristics after the convolution processing; inputting the target image features after convolution processing into a semantic font decoupling network, and dividing the target image features after convolution processing through the semantic font decoupling network to obtain two groups of image features, namely a first image feature for representing semantic information features of the handwritten Chinese characters and a second image feature for representing font appearance features of the handwritten Chinese characters; finally, the first image feature is identified as a semantic information feature of the handwritten chinese character, and the second image feature is identified as a font appearance feature of the handwritten chinese character.

For example, the server performs dimension reduction processing on the target image features through a 1×1 convolution network to obtain dimension reduced target image features; and (3) halving the dimension-reduced target image features from the feature dimensions to obtain two groups of first image features and second image features with the same feature dimensions, wherein the first image features and the second image features are respectively used for representing semantic information features and font appearance features of the handwritten Chinese characters, for example, a 1024-dimension target image feature is divided into two 512-dimension image features.

According to the technical scheme provided by the embodiment, the target image features are subjected to convolution processing, the semantic information features of the handwritten Chinese characters are segmented from the target image features after the convolution processing, the subsequent determination of the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters is facilitated, the font appearance features are not required to be considered, and therefore the recognition accuracy of the handwritten Chinese characters is improved.

In one embodiment, the step S104 determines the recognition result of the handwritten chinese character in the image to be recognized according to the semantic information features of the handwritten chinese character, and specifically includes: converting semantic information features of the handwritten Chinese characters into corresponding text sequence features; and obtaining the combination of characters corresponding to each column of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

Wherein the text sequence feature is comprised of a plurality of columns of features.

Specifically, the server inputs semantic information features of the handwritten Chinese characters into a two-way long-short-term memory network, and converts the semantic information features of the handwritten Chinese characters through the two-way long-term memory network to obtain text sequence features of the handwritten Chinese characters; inquiring the mapping relation between the features and the characters to obtain characters corresponding to each column of features in the text sequence features; and combining characters corresponding to each column of features in the text sequence features to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

For example, the server inputs the semantic information features of the handwritten Chinese characters into a two-way long-short-term memory network, firstly, a hidden state is obtained through convolution pooling and other operations, then the hidden state can iterate 35 times in the long-short-term memory network, 1 vector of C dimension is output each time, and finally, 35 XC sequence features are obtained by stitching, and are used as text sequence features of the handwritten Chinese characters; and mapping each column of characteristics of the text sequence characteristics to corresponding characters to obtain a final recognition result.

According to the technical scheme provided by the embodiment, the purpose of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters in the image to be recognized is achieved, the semantic information features of the handwritten Chinese characters in the image to be recognized are only recognized, and the recognition accuracy of the handwritten Chinese characters is improved.

In one embodiment, the method for obtaining the combination of characters corresponding to each sequence feature in the text sequence features as the recognition result of the handwritten Chinese characters in the image to be recognized specifically includes: inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of handwritten Chinese characters in an image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each column of features in the text sequence features, and combining the characters corresponding to each column of features to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

Wherein the pre-trained text prediction model is a model for predicting the text content of handwritten chinese characters, such as an attention-based sequence prediction model.

Specifically, the server inputs text sequence features into a pre-trained text prediction model, maps each column of features of the text sequence features to corresponding characters based on an attention mechanism through the pre-trained text prediction model to obtain characters corresponding to each column of features in the text sequence features, for example, for the text sequence features comprising A, B, C, D and E columns of features, the characters corresponding to the A columns of features are beautiful, the characters corresponding to the B columns of features are beautiful, the characters corresponding to the C columns of features are beautiful, the characters corresponding to the D columns of features are heaven, and the characters corresponding to the D columns of features are null; and combining characters corresponding to each column of characteristics of the text sequence characteristics to obtain combined characters, such as beautiful sky, which are used as recognition results of handwritten Chinese characters in the images to be recognized.

According to the technical scheme provided by the embodiment, only the text sequence features converted from the semantic information features of the handwritten Chinese characters in the image to be recognized are recognized, the appearance feature information of the handwritten Chinese characters is not considered, the recognition accuracy of the handwritten Chinese characters is improved, and the defect that the recognition accuracy of the handwritten Chinese characters is low due to the fact that the handwriting style of a writer is influenced in the recognition process is overcome.

In one embodiment, as shown in fig. 2, the method for recognizing handwritten chinese characters provided in the present application further includes a training step of a text prediction model, which specifically includes the following steps:

step S201, obtaining sample semantic information features and sample font appearance features.

The sample semantic information features comprise first semantic information features of the first text image, second semantic information features of the second text image and third semantic information features of the third text image, and the sample font appearance features comprise first font appearance features of the first text image, second font appearance features of the second text image and third font appearance features of the third text image; the second text image is identical to the text content in the first text image, but the author of the text content is different; the third text image is not identical to the text content in the first text image, but the author of the text content is identical.

The first text image, the second text image and the third text image all comprise handwritten Chinese characters, the handwritten Chinese characters are derived from a data set in the form of a text line of the handwritten Chinese characters, and the data set in the form of the text line of the handwritten Chinese characters is obtained by manually generating the data set in the form of a single word of the handwritten Chinese characters.

It should be noted that, the difficulty of handwriting chinese character recognition mainly focuses on the excessive intra-class gap, the excessively small inter-class gap, and the lack of text line data. For the problem of excessive intra-class gap, handwritten data may vary greatly in the style of the fonts of different writers due to being collected from the different writers, thereby causing the same word to present a completely different appearance in the data of the different writers. For the problem of too small gap between classes, the Chinese character data is different from English data, the types of the commonly used Chinese characters are up to 7000 more, and a large number of characters with very similar fonts exist; at the same time, two different characters of similar glyphs may exhibit minimal differences in appearance in different writer data due to the problem of style differences among the writers. For the problem of lack of text line data, since text data different from street view can be obtained in large quantity, a large quantity of manual writing and labeling processes are needed for handwriting Chinese data; the number of text line form datasets currently used for handwritten chinese character recognition is more difficult to obtain and less numerous than streetscape text datasets.

Step S202, inputting a first text sequence feature corresponding to the first semantic information feature into a text prediction model to be trained, and obtaining a recognition result of the handwritten Chinese characters in the first text image.

Step S203, obtaining a target loss value according to the sample semantic information features, the sample font appearance features and the recognition result of the handwritten Chinese characters in the first text image.

Specifically, obtaining a target loss value according to the sample semantic information feature, the sample font appearance feature and the recognition result of the handwritten Chinese character in the first text image, including: obtaining a first loss value according to the first semantic information feature, the second semantic information feature and the third semantic information feature; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to the difference value between the recognition result of the handwritten Chinese character in the first text image and the actual result of the handwritten Chinese character; and obtaining a target loss value according to the first loss value, the second loss value and the third loss value.

For example, the server combines the first loss function based on the first semantic information feature, the second semantic information feature, and the third semantic information feature to obtain a first loss value; based on the first font appearance feature, the second font appearance feature and the third font appearance feature, combining the second loss function to obtain a second loss value; based on a difference value between a recognition result of the handwritten Chinese character in the first text image and an actual result of the handwritten Chinese character, combining a third loss function to obtain a third loss value; obtaining a first product between a first loss value and a first coefficient corresponding to the first loss value, a second product between a second loss value and a second coefficient corresponding to the second loss value, and a third product between a third loss value and a third coefficient corresponding to the third loss value; and adding the first product, the second product and the third product to obtain the target loss value.

And step S204, adjusting model parameters of the text prediction model to be trained according to the target loss value, and repeatedly training the text prediction model with the model parameters adjusted until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, wherein the trained text prediction model is used as a pre-trained text prediction model.

Specifically, if the target loss value is smaller than the preset threshold, adjusting the model parameters of the text prediction model to be trained according to the target loss value, and repeatedly executing the steps S201 to S203 to repeatedly train the text prediction model with the model parameters adjusted until the target loss value obtained according to the trained text prediction model is smaller than the preset threshold, and stopping training; and if the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, taking the trained text prediction model as a pre-trained text prediction model.

For example, referring to fig. 3, fig. 3 is a handwritten chinese character recognition network based on semantic font decoupling, which is composed of three networks, namely a feature extraction network, a sequence modeling network, and a text prediction network; specifically, referring to fig. 3, the server first obtains a triplet composed of a text picture a to be recognized, a text picture P that is different from a writer of the text picture a to be recognized but has the same text, and a text picture N that is the same as the writer of the text picture a to be recognized but has the different text; then, inputting the triplets into a backbone network with shared weights, such as a residual network, and carrying out feature extraction processing on each text picture through the backbone network with shared weights to obtain a feature map containing multi-scale features of each text picture, wherein the feature map corresponds to the text features of each picture; then inputting the text characteristics of each text picture into a semantic font decoupling module, performing dimension reduction processing on the text characteristics of each text picture through a 1X 1 convolution network, and dividing to obtain two groups of characteristic representations, wherein the two groups of characteristic representations respectively represent the semantic information characteristics and the font appearance characteristics of each text picture, such as the semantic information characteristics and the font appearance characteristics of the text picture A to be identified, the semantic information characteristics and the font appearance characteristics of the text picture P, and the semantic information characteristics and the font appearance characteristics of the text picture N; one of the semantic information features of the text picture A to be identified is input into a two-way long-short-term memory network (BiLSTM), and the corresponding text sequence feature is output through the two-way long-short-term memory network for subsequent text identification; the other branch of the semantic information features of the text picture A to be identified is combined with the semantic information features of the text picture P and the text picture N to be calculated, and a first loss value Lmem is obtained; and combining the font appearance characteristics of the text picture A to be identified with the font appearance characteristics of the text picture P and the text picture N to calculate to obtain a second loss value Lfont. Then, inputting the text sequence characteristics of the text picture A to be recognized into an attention-based sequence prediction module, automatically acquiring semantic information in the text sequence characteristics through the attention-based sequence prediction module to obtain a final recognition result, and combining the final recognition result with a character label of the text picture A to be recognized to calculate to obtain a third loss value Lrec, thereby obtaining a final target loss value L=Lrec+λ1×Lmem+λ2×Lfont. And finally, training the handwritten Chinese character recognition network based on semantic font decoupling according to the target loss value to obtain a trained handwritten Chinese character recognition network.

Further, after the pre-trained text prediction model is obtained, in order to reduce the parameter number of the text prediction model, the text prediction model is more convenient to deploy on the mobile terminal application platform, and the text prediction model can be compressed in a model pruning mode, so that the model size of the text prediction model is reduced, the running speed of the text prediction model on the mobile terminal application platform is increased, and the aim of quickly identifying the handwritten Chinese characters is fulfilled.

According to the technical scheme provided by the embodiment, the text prediction model is trained for multiple times, so that the accuracy of a recognition result obtained through the trained text prediction model is improved, and the recognition accuracy of the handwritten Chinese characters is improved.

In one embodiment, as shown in fig. 4, another method for recognizing handwritten chinese characters is provided, which is described by taking the application of the method to a server as an example, and includes the following steps:

step S401, obtaining an image to be identified; the image to be recognized includes handwritten Chinese characters.

Step S402, inputting the image to be identified into a feature extraction model to obtain image features output by a neural network layer at least two preset positions in the feature extraction model.

Step S403, the image features output by the neural network layers at least at two preset positions are aggregated to obtain target image features of the image to be identified.

Step S404, performing convolution processing on the target image features to obtain the target image features after the convolution processing.

Step S405, dividing the target image features after convolution processing to obtain a first image feature and a second image feature; the first image feature is used for representing semantic information features of the handwritten Chinese character, the second image feature is used for representing font appearance features of the handwritten Chinese character, and feature dimensions of the first image feature and the second image feature are the same.

Step S406, the first image feature is identified as the semantic information feature of the handwritten Chinese character.

Step S407, converting the semantic information features of the handwritten Chinese characters into corresponding text sequence features.

Step S408, inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of handwritten Chinese characters in the image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each column of features in the text sequence features, and combining the characters corresponding to each column of features to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

According to the identification method of the handwritten Chinese characters, the purpose of determining the identification result of the handwritten Chinese characters in the image to be identified according to the semantic information characteristics of the handwritten Chinese characters in the image to be identified is achieved, the identification is only carried out on the semantic information characteristics of the handwritten Chinese characters in the image to be identified, the appearance characteristic information of the handwritten Chinese characters is not considered, the identification accuracy of the handwritten Chinese characters is improved, and the defect that the identification accuracy of the handwritten Chinese characters is low due to the influence of the writing style of writers in the identification process is avoided.

In one embodiment, the application also proposes a Semantic font decoupling based handwritten Chinese character recognition network (Semantic-Font Decoupled Network, SFDN for short). The network incorporates a semantic font decoupling module (sematic-font decoupling module) for decoupling the semantic information of the text character itself and font information of different writer styles so that the model can more robustly recognize chinese handwritten character data from different writers. In addition, in the training process, a triple loss function (triple loss) is introduced into the network to minimize the intra-class distance of the same character, and the inter-class distance of different characters is maximized, so that the model can more accurately distinguish handwriting data which is difficult to recognize. Finally, in order to achieve high running efficiency of the model on the mobile terminal application platform, the proposed network model is compressed by a model pruning method, so that the size of the model is reduced to one third of the original size.

The above embodiment can achieve the following technical effects: (1) Aiming at the inter-class distance problem in the character class in the handwritten Chinese character recognition task, the handwritten Chinese character recognition network based on semantic font decoupling provided by the technical scheme decouples the semantic information and the font appearance information of the characters, so that the accuracy of the model on a handwritten Chinese character reference data set CASIA-HWDB reaches 82.11%, and the effect of the current optimal state-of-art is achieved; (2) By using a simpler system frame and pruning the model, the running speed of the application platform of the mobile terminal can be greatly increased, and the running speed reaches 27.8FPS on the premise that the accuracy can still be kept at 80.10%, so that the real-time deducing speed of the mobile terminal is achieved. (3) The semantic font decoupling module is provided, can decouple the semantic information of the text character and the font information of different writer styles, and is combined with a Triplet Loss to improve the data characteristics of the handwritten Chinese character, so that the robustness of the model is improved; (4) A data set in the form of a hand-written Chinese character text line, which is obtained by manually generating a data set in the form of a hand-written Chinese character single word, is constructed and consists of a series of triplet data, and can be used for training and evaluating hand-written Chinese character recognition.

It should be understood that, although the steps in the flowcharts of fig. 1, 2, and 4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps of fig. 1, 2, 4 may include steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.

In one embodiment, as shown in fig. 5, there is provided a recognition apparatus of handwritten chinese characters, comprising: an image acquisition module 510, a feature extraction module 520, a feature segmentation module 530, and a character recognition module 540, wherein:

an image acquisition module 510, configured to acquire an image to be identified; the image to be recognized includes handwritten Chinese characters.

The feature extraction module 520 is configured to extract target image features of an image to be identified; the target image features are used to represent text features of the image to be identified.

The feature segmentation module 530 is configured to perform segmentation processing on the target image feature to obtain a semantic information feature of the handwritten chinese character.

The character recognition module 540 is configured to determine a recognition result of the handwritten chinese character in the image to be recognized according to the semantic information features of the handwritten chinese character.

In one embodiment, the feature extraction module 520 is further configured to input the image to be identified into a feature extraction model, so as to obtain image features output by the neural network layer at least two preset positions in the feature extraction model; and carrying out aggregation processing on the image features output by the neural network layers at least at two preset positions to obtain target image features of the image to be identified.

In one embodiment, the recognition device for handwritten chinese characters provided in the present application further includes a convolution processing module, configured to perform convolution processing on the target image feature, to obtain a target image feature after the convolution processing;

the feature segmentation module 530 is further configured to segment the convolved target image feature to obtain a first image feature and a second image feature; the first image feature is used for representing semantic information features of the handwritten Chinese character, the second image feature is used for representing font appearance features of the handwritten Chinese character, and feature dimensions of the first image feature and the second image feature are the same; the first image feature is identified as a semantic information feature of the handwritten chinese character.

In one embodiment, the character recognition module 540 is further configured to convert semantic information features of the handwritten chinese character into corresponding text sequence features; and obtaining the combination of characters corresponding to each column of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the character recognition module 540 is further configured to input the text sequence feature into a pre-trained text prediction model, to obtain a recognition result of the handwritten chinese character in the image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each column of features in the text sequence features, and combining the characters corresponding to each column of features to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the recognition device for handwritten chinese characters provided in the present application further includes a model training module, configured to obtain sample semantic information features and sample font appearance features; the sample semantic information features comprise first semantic information features of the first text image, second semantic information features of the second text image and third semantic information features of the third text image, and the sample font appearance features comprise first font appearance features of the first text image, second font appearance features of the second text image and third font appearance features of the third text image; the second text image is identical to the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the authors of the text content are the same; inputting a first text sequence feature corresponding to the first semantic information feature into a text prediction model to be trained to obtain a recognition result of the handwritten Chinese characters in the first text image; obtaining a target loss value according to the sample semantic information characteristics, the sample font appearance characteristics and the recognition result of the handwritten Chinese characters in the first text image; and adjusting model parameters of the text prediction model to be trained according to the target loss value, and repeatedly training the text prediction model with the model parameters adjusted until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as a pre-trained text prediction model.

In one embodiment, the model training module is further configured to obtain a first loss value according to the first semantic information feature, the second semantic information feature, and the third semantic information feature; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to the difference value between the recognition result of the handwritten Chinese character in the first text image and the actual result of the handwritten Chinese character; and obtaining a target loss value according to the first loss value, the second loss value and the third loss value.

For specific limitations on the recognition device of the handwritten chinese character, reference may be made to the above limitation on the recognition method of the handwritten chinese character, and no further description is given here. The above-described respective modules in the recognition apparatus for handwritten chinese characters may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data such as target image characteristics, semantic information characteristics, recognition results and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of recognition of handwritten chinese characters.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of recognition of handwritten chinese characters, the method comprising:

determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character; the recognition result is obtained through processing a pre-trained text prediction model, and the pre-trained text prediction model is obtained through training in the following way:

obtaining a first loss value according to the first semantic information feature, the second semantic information feature and the third semantic information feature; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to a difference value between a recognition result of the handwritten Chinese character in the first text image and an actual result of the handwritten Chinese character; obtaining a target loss value according to the first loss value, the second loss value and the third loss value;

2. The method of claim 1, wherein the extracting the target image features of the image to be identified comprises:

3. The method of claim 1, further comprising, prior to segmenting the target image feature to obtain semantic information features of the handwritten chinese character:

4. The method according to claim 1, wherein determining the recognition result of the handwritten chinese character in the image to be recognized according to the semantic information features of the handwritten chinese character comprises:

5. The method according to claim 4, wherein the obtaining the combination of characters corresponding to each column of features in the text sequence feature as the recognition result of the handwritten chinese character in the image to be recognized includes:

6. An apparatus for recognition of handwritten chinese characters, the apparatus comprising:

the character recognition module is used for determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character; the recognition result is obtained through processing a pre-trained text prediction model;

The model training module is used for acquiring sample semantic information characteristics and sample font appearance characteristics; the sample semantic information features comprise first semantic information features of a first text image, second semantic information features of a second text image and third semantic information features of a third text image, and the sample font appearance features comprise first font appearance features of the first text image, second font appearance features of the second text image and third font appearance features of the third text image; the second text image is identical to the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the authors of the text content are the same; inputting a first text sequence feature corresponding to the first semantic information feature into a text prediction model to be trained to obtain a recognition result of a handwritten Chinese character in the first text image; obtaining a first loss value according to the first semantic information feature, the second semantic information feature and the third semantic information feature; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to a difference value between a recognition result of the handwritten Chinese character in the first text image and an actual result of the handwritten Chinese character; obtaining a target loss value according to the first loss value, the second loss value and the third loss value; and adjusting model parameters of the text prediction model to be trained according to the target loss value, and repeatedly training the text prediction model with the model parameters adjusted until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as the pre-trained text prediction model.

7. The apparatus of claim 6, wherein the feature extraction module is further to: inputting the image to be identified into a feature extraction model to obtain image features output by neural network layers at least two preset positions in the feature extraction model; and carrying out aggregation processing on the image features output by the neural network layers at the at least two preset positions to obtain target image features of the image to be identified.

8. The apparatus of claim 6, wherein the means for recognizing handwritten chinese characters further comprises a convolution processing module to: performing convolution processing on the target image features to obtain target image features after the convolution processing;

the feature segmentation module is further configured to: dividing the target image features after the convolution processing to obtain a first image feature and a second image feature; the first image features are used for representing semantic information features of the handwritten Chinese characters, the second image features are used for representing font appearance features of the handwritten Chinese characters, and feature dimensions of the first image features and the second image features are the same; and identifying the first image characteristic as the semantic information characteristic of the handwritten Chinese character.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.