CN117727053A

CN117727053A - Multi-category Chinese character single sample font identification method

Info

Publication number: CN117727053A
Application number: CN202410176517.XA
Authority: CN
Inventors: 闫飞; 张华�
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2024-02-08
Filing date: 2024-02-08
Publication date: 2024-03-19
Anticipated expiration: 2044-02-08
Also published as: CN117727053B

Abstract

The invention discloses a multi-category Chinese character single sample font identification method, which belongs to the technical field of font identification and comprises the following steps: s1, constructing a Chinese character font data set and processing the Chinese character font data set into a corresponding font image sample set; s2, constructing a Chinese character font recognition model, and training the Chinese character font recognition model by using a font image sample set; the Chinese character font recognition model adopts a twin measurement network structure, introduces a coordinate attention module and adopts a multiple loss function; and S3, carrying out font recognition on the fonts to be recognized by using the training and testing completed Chinese character font recognition model. The Chinese character font recognition method provided by the invention solves the problems of three aspects of new font recognition expansion, similar font effective characteristic extraction and handwriting font compatibility of the Chinese character font recognition model.

Description

Multi-category Chinese character single sample font identification method

Technical Field

The invention belongs to the technical field of font identification, and particularly relates to a multi-category Chinese character single sample font identification method.

Background

Today, where digital technology is rapidly evolving, the rapid maturation of technologies such as the internet, digital media, video media, etc. has led to the fact that visual elements occupy a significant proportion of the weight of an information carrier, while text as its important presentation form is also evolving constantly, presenting a diverse form. Meanwhile, the protection of the copyright of fonts in China is also becoming stricter, so that urgent demands are put forward for automatic identification of fonts.

The character set of Chinese characters has huge character quantity, complex components and strokes and various changes. In the national standard of the existing Chinese fonts, one set of Chinese fonts at least comprises 6763 common characters. Different Chinese characters contain a large number of parts with different structures, and even different variants of the same part or stroke. Therefore, the Chinese character fonts have the characteristics of high complexity and high perceptual experience requirements. Designers often need to learn through a great deal of practice to have a more accurate sense of fonts. The manual font recognition has the problems of large workload and low recognition rate, and an automatic font recognition technology is urgently needed to replace the traditional manual font recognition.

The character is used as an important component of character attribute, and the character style characterization and recognition have important significance in the aspects of character searching, character recommending, character replacing, character recognizing, character copyright protecting, document analyzing, document recovering and the like.

The solutions to the problem of font recognition are currently proposed and can be broadly classified into two types, i.e., font recognition based on hand design features and font recognition based on deep learning. In which character recognition based on manually designed features, i.e., a feature extractor is manually designed using human experience to extract features of each character and to classify the fonts accordingly. However, due to the limitation of manual feature design, some useful information is easy to lose, and in addition, the design of a feature descriptor with high accuracy requires fine engineering design and a great deal of field expertise; and font recognition based on deep learning, namely automatically extracting font characteristics by using a deep neural network, and classifying fonts by using the characteristics. However, the method based on the deep convolution model has a good recognition effect on the trained font types, and can not recognize the new fonts which are not trained by the model. To expand and identify new fonts, the added font samples are added into the original training data to perform training again, and the rapidly-increased number of Chinese characters cannot be dealt with.

Disclosure of Invention

Aiming at the defects in the prior art, the multi-category Chinese character single sample font identification method provided by the invention solves the problems of inconvenient expansion of new font identification, difficult extraction of effective characteristics of similar fonts and incompatible handwriting font identification in the traditional Chinese character font style identification task.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: a multi-category Chinese character single sample font identification method comprises the following steps:

s1, constructing a Chinese character font data set and processing the Chinese character font data set into a corresponding font image sample set;

the Chinese character font data set comprises a data subset for performing font recognition training and testing on the Chinese character font recognition model and a data subset for performing single sample font recognition testing on the Chinese character font recognition model;

s2, constructing a Chinese character font recognition model, and training the Chinese character font recognition model by using a font image sample set;

the Chinese character font recognition model adopts a twin measurement network structure, introduces a coordinate attention module and adopts a multiple loss function;

and S3, carrying out font recognition on the fonts to be recognized by using the training and testing completed Chinese character font recognition model.

Further, in the step S1, the data subset XIKE-CFS-1 for performing font recognition training and testing on the chinese character font recognition model includes two similar fonts in each font category in the standard printing font, a soft pen handwriting font and a hard pen handwriting font;

the chinese character fonts in the data subset XIKE-CFS-2, which performs a single sample recognition test on the chinese character font recognition model, include standard print fonts, soft pen handwriting fonts, and hard pen handwriting fonts.

Further, in the step S2, the chinese character font recognition model includes sequentially connecting two parallel feature extraction sub-networks, a feature metric sub-network, a third full connection layer, a fourth full connection layer, and an output layer;

two feature extraction sub-networks form a twin measurement network structure of the Chinese character font recognition model, and a coordinate attention module is introduced into the feature extraction sub-networks;

and the feature measurement sub-network measures Euclidean distance between the output feature vectors of the two feature extraction sub-networks through a multiple loss function, and the Euclidean distance is used as a difference measurement between two input font samples.

Further, the feature extraction sub-network comprises a first feature extraction module, a coordinate attention module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially connected;

the first feature extraction module, the second feature extraction module, the third feature extraction module and the fourth feature extraction module comprise a convolution layer, a maximum pooling layer, a ReLU activation function, a batch normalization layer and a Dropout layer which are sequentially connected.

Further, the coordinate attention module comprises two parallel one-dimensional global average pooling layers, a multi-channel one-dimensional vector splicing layer, a convolution compression layer, an information coding layer, two parallel channel number adjusting layers, two parallel Sigmoid functions and a characteristic weighting layer which are sequentially connected, and the characteristic weighting layer is also connected with the input of the coordinate attention module.

Further, the multiple loss functions include a feature extraction loss function and a similarity discrimination loss function.

Further, the feature extraction loss functionThe method comprises the following steps:

in the method, in the process of the invention,for the number of sample pairs>For the number of categories，/>Is the firstnClass label of sample 1 in each sample pair, +.>Is the firstnSample 1 in each sample pair is predicted to belong to a category through a characteristic extraction sub-networkkProbability of->Is the firstnClass label of sample 2 in each sample pair, +.>Is the firstnSample 2 in each sample pair is predicted to belong to category through characteristic extraction sub-networkkIs a probability of (2).

Further, the similarity discrimination loss functionThe method comprises the following steps:

in the method, in the process of the invention,for a sample pair whether it belongs to the same class of labels, +.>Dimension-reducing feature vector for sample pair>Andeuropean distance,/, of->Is a distance threshold.

Further, the multiple loss functionThe method comprises the following steps:

in the method, in the process of the invention,parameters for controlling the specific gravity of the feature extraction loss function in the multiple loss functions. The beneficial effects of the invention are as follows:

(1) The Chinese character font recognition method provided by the invention solves the problems of three aspects of new font recognition expansion, similar font effective characteristic extraction and handwriting font compatibility of the Chinese character font recognition model.

(2) The SMFNet of the Chinese character font recognition model adopts a twin measurement network structure, the extraction capacity of the model for the Chinese character font inter-frame structural features is enhanced through the coordinate attention module, and the generalization performance of the model for extracting the Chinese character font sample features is improved through the multiple loss function.

(3) The Chinese character font recognition model SMFNet provided by the invention obtains higher recognition accuracy in the Chinese character font data set, realizes single sample recognition of the Chinese character font style with high accuracy, and has higher recognition accuracy compared with the existing recognition model.

(4) The Chinese character font recognition model SMFNet provided by the invention supports single sample font recognition and has better new font recognition expansion capability.

Drawings

FIG. 1 is a flow chart of a method for recognizing single sample fonts of multi-category Chinese characters according to an embodiment of the present invention.

Fig. 2 is a block diagram of a Chinese character font recognition model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of key region features between different fonts according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of differences in frame structure characteristics between different fonts according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a coordinate attention module structure according to an embodiment of the present invention.

Fig. 6 is a schematic diagram showing the influence of different attention modules on model recognition accuracy according to an embodiment of the present invention.

Fig. 7 is a schematic diagram showing an influence of a position of a coordinate attention module after a convolution layer on model recognition accuracy according to an embodiment of the present invention.

Fig. 8 is a schematic diagram showing the influence of the number of coordinate attention module insertion on the model identification accuracy according to the embodiment of the present invention.

FIG. 9 is a diagram of an embodiment of the present inventionSchematic diagram of the influence of different values on the recognition accuracy of the model A.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

Example 1:

the embodiment of the invention provides a multi-category Chinese character single sample font identification method, as shown in figure 1, comprising the following steps:

In step S1 of the embodiment of the present invention, in order to enable the chinese character font recognition model provided by the embodiment of the present invention to have the capability of recognizing standard print fonts, soft-pen handwriting fonts and hard-pen handwriting fonts, and further optimize the extraction effect of the model on similar font features, a chinese character font data set XIKE-CFS is constructed, including a data subset XIKE-CFS-1 for performing font recognition training and testing on the chinese character font recognition model and a data subset XIKE-CFS-2 for performing single sample recognition testing on the chinese character font recognition model.

In this embodiment, the chinese fonts in the data subset XIKE-CFS-1 include two similar fonts in each font class in the standard print font, a soft-pen handwriting font, and a hard-pen handwriting font; the chinese fonts in the data subset XIKE-CFS-2 include standard print fonts, soft-pen handwriting fonts, and hard-pen handwriting fonts.

In this embodiment, there are 40 types of Chinese fonts in the Chinese font data set XIKE-CFS, each type of font contains 1000 samples, and total 40000 samples, and the specific case is shown in Table 1 as an example;

wherein the data subset XIKE-CFS-1 comprises 7 pairs of similar standard printing fonts, 6 soft pen handwriting fonts and 6 hard pen handwriting fonts, and 26 fonts in total; in the process of manufacturing the data subset XIKE-CFS-1, three common fonts including a script, a young circle and a Chinese new Wei are added on the basis of four domestic common standard printing fonts including a regular script, a Song Ti, a bold and an imitated Song body. Meanwhile, in order to improve the extraction capability of the font recognition model on similar font characteristics, 7 fonts are taken as basic categories, 2 types of similar fonts are selected from each category, for example, 14 fonts are formed in total in a bold type, and a standard printing font part is formed together; in order to enable the font recognition model to recognize standard printing fonts and simultaneously recognize soft and hard pen handwriting fonts, 6 soft pen handwriting fonts and 6 hard pen handwriting fonts are added.

TABLE 1 XIKE-CFS Chinese font style dataset

In the embodiment of the invention, when the Chinese character font data set is processed, the Chinese character font recognition model is mainly used for recognizing the fonts of daily Chinese characters, and the Chinese character font data set XIKE-CFS randomly extracts 1000 Chinese characters from 3755 common Chinese characters in the primary character library according to the font file to manufacture a corresponding font image sample. In this embodiment, the font sample image size in the XIKE-CFS dataset is set to 100×100, and is stored in JPG format. Because the difference of the fonts of the Chinese characters is mainly represented by the character pattern characteristics, the sample image adopts a gray image, the background is set to be black, the characters are set to be white, and the setting of the background and the character colors is determined according to the filling in the convolution process in the subsequent characteristic extraction sub-network.

In step S2 of the embodiment of the present invention, as shown in fig. 2, the chinese character font recognition model includes sequentially connecting two parallel feature extraction sub-networks, a feature metric sub-network, a third full connection layer, a fourth full connection layer, and an output layer;

Specifically, in this embodiment, a twin metric network structure is selected as a basic frame of a chinese character font recognition model, in this embodiment, a feature extraction sub-network forms the twin metric network structure of the chinese character font recognition model, and two feature extraction sub-networks have the same structure and share weights, and during training, a sample X is obtained ₁ 、X ₂ The two feature extraction sub-networks are fed in pairs to obtain feature vectors GW (X ₁ )、GW(X ₂ ) By measuring two bitsDistance E of sign vector _w To measure the similarity of the two samples and to determine whether the two samples are of one type. The twin metric network can be used to calculate the degree of difference between two Chinese characters and support single sample learning, facilitating the expansion of the font recognition model for new font recognition.

In this embodiment, the feature extraction sub-network includes a first feature extraction module, a coordinate attention module, a second feature extraction unit, a third feature extraction unit, a fourth feature extraction unit, a global averaging pooling layer, a first full-connection layer, and a second full-connection layer that are sequentially connected;

Specifically, in table 2, the first feature extraction module includes Conv1 and maxpool+relu+bn+dropout; the second feature extraction module comprises Conv2 and Maxpool+Relu+BN+Dropout; the third feature extraction module comprises Conv3 and Maxpool+Relu+BN+Dropout; the fourth feature extraction module comprises Conv4 and Maxpool+Relu+BN+Dropout; GAP is global average pooling layer; FC1 and FC2 are a first fully connected layer and a second fully connected layer, respectively.

TABLE 2 feature extraction sub-network architecture

Specifically, in this embodiment, two feature extraction sub-networks are used to extract font features, two kanji sample images with resolution of 100×100 are sent to the two feature extraction sub-networks, respectively, and after the kanji sample images enter the feature extraction sub-networks, in order to ensure the size of feature output, 0 pixel filling is performed first. After pixel filling, the image enters a convolution layer and then a maximum pooling layer, and a Relu activation function, batch standardization and Dropout are used in the process to increase the nonlinearity of a model, accelerate the learning efficiency of the model, inhibit overfitting and improve the stability. The structure is stacked four times, and the number of channels, the number of convolution kernels, the size of the convolution kernels and the stride of each time are slightly different; the font image sample is convolved for four times to obtain a characteristic tensor with a channel of 256, a global average pooling is adopted to obtain a characteristic vector, and then the characteristics are further extracted through two full-connection layers to obtain a final font characteristic vector; the two font sample images respectively obtain two feature vectors, each feature vector comprises 26 numerical values, euclidean distance of the two feature vectors is calculated, and the distance is used as the measurement of the difference of the two fonts.

In the embodiment of the invention, a coordinate attention module is introduced into the feature extraction sub-network, a coordinate attention mechanism is used for avoiding the loss of position information caused by two-dimensional global pooling, the channel attention is decomposed into two parallel one-dimensional feature coding processes, and the space coordinate position information is integrated into the generated attention pattern, so that the attention of the lightweight network learns macroscopic dependency relationship, and a great amount of calculation expenditure is avoided. Based on the attribute of pictographic characters of Chinese characters, the positions of the components forming the Chinese characters in a character and the collocation relationship between the components, namely the inter-frame structure, are important in the characteristic difference of the fonts of the Chinese characters except for the local key areas such as the starting point, the end point, the turning point, the crossing point and the like. The key region features and the differences of the structure features between different fonts of the Chinese characters are shown in figures 3-4. In this embodiment, the coordinate attention module can enhance learning of macro-dependency relationship of the feature extraction sub-network on the kanji component, so as to enhance the extraction capability of the model on kanji inter-font frame structural features.

In this embodiment, as shown in fig. 5, the coordinate attention module includes two parallel one-dimensional global average pooling layers, a multi-channel one-dimensional vector splicing layer, a convolution compression layer, an information encoding layer, two parallel channel number adjusting layers, two parallel Sigmoid functions, and a feature weighting layer that are sequentially connected, where the feature weighting layer is further connected to an input of the coordinate attention module. The convolution compression layer is a 1×1 convolution layer, the channel number adjustment layer is a 1×1 convolution layer, and the information coding layer comprises a BN layer and a Nonlinear layer Nonlinear.

In this embodiment, based on the coordinate attention module shown in fig. 5, for the input chinese character font feature tensor, two one-dimensional global average pooling layers X Avg Pool (X-direction global average pooling layer) and Y Avg Pool (Y-direction global average pooling layer) are utilized, so that the input features in the vertical and horizontal directions are aggregated into two independent direction perception feature graphs, which capture the dependency relationship of the input chinese character font feature graph along one spatial direction. Thus, the location information is saved in the generated attention map. After two multi-channel one-dimensional vectors are spliced by a multi-channel one-dimensional vector splicing layer (Concat), the number of channels is compressed by using a 1×1 convolution layer (Conv 2 d). Spatial information in the vertical and horizontal directions is then encoded by BN and Nonlinear layers Nonlinear. And then the feature vector is re-divided into vectors in two directions, the channel number of the feature vector is adjusted by using a 1 multiplied by 1 convolution layer (Conv 2 d), the feature vector is weighted with the original input Chinese character font feature tensor through a feature weighting layer (weight) and then output, so that the space position representation capability of the feature is enhanced, and the effective expression of the Chinese character font inter-frame structure feature is realized.

In the embodiment of the invention, multiple loss functions of the Chinese character font recognition model comprise a feature extraction loss function and a similarity discrimination loss function; the multiple loss function can strengthen the capability of the model to extract font style features and difference features, so that the accuracy of model font identification is improved, and compared with a similar model adopting a single loss function, the multiple loss function can enable the model to better avoid overfitting.

In the embodiment, the feature extraction loss function adopts a cross entropy loss function to strengthen the font style feature extraction capability of the Chinese character font recognition model, so that the sample dimension reduction feature vector better reflects the sample font style feature, and the embedded space is optimized; in this embodiment, the feature extraction loss functionThe method comprises the following steps:

in the method, in the process of the invention,for the number of sample pairs>For category number->Is the firstnClass label of sample 1 in each sample pair, +.>Is the firstnSample 1 in each sample pair is predicted to belong to a category through a characteristic extraction sub-networkkProbability of->Is the firstnClass label of sample 2 in each sample pair, +.>Is the firstnSample 2 in each sample pair is predicted to belong to category through characteristic extraction sub-networkkProbability of (2); wherein (1)>And->Belongs to the category ofkThe value is 1, otherwise the value is 0.

In this embodiment, the similarity discriminating loss function uses a contrast loss function to strengthen learning of font difference features, so that subsequent sample pair similarity calculation is facilitated, and the loss function can effectively process a similarity relationship of paired data samples in a twin network. And performing similarity judgment by calculating the dimension reduction feature vector distance of the Chinese character font sample pair. The similarity discrimination loss function can well express the matching degree of the samples. In this embodiment, the similarity discrimination loss functionThe method comprises the following steps:

in the method, in the process of the invention,for a sample pair whether it belongs to the same class of labels, +.>Dimension-reducing feature vector for sample pair>Andeuropean distance,/, of->Is a distance threshold. Wherein, when the sample pair belongs to the same category +.>When the pairs of samples do not belong to the same class +.>；/>For the distance threshold, the Euclidean distance of samples not belonging to the same class is limited to +.>In between, the loss value is not generated when different types of samples are too simple, so that the learning efficiency of difficult samples is improved; euclidean distance->The expression of (2) is:

in the method, in the process of the invention,for the characteristic dimension of the sample, +.>Centering the sampleiFeature vector of individual samples, +_>For the pair of->A feature vector of the corresponding other sample.

In this embodiment, the multiple loss functions in this embodiment are obtained by discriminating the loss functions according to the feature extraction function and the similarityThe method comprises the following steps:

in the method, in the process of the invention,parameters for controlling the specific gravity of the feature extraction loss function in the multiple loss functions.

In the present embodiment of the present invention,default to 1; when training the Chinese character font recognition model, the feature extraction loss function and the similarity discrimination loss function are calculated simultaneously, and the recognition accuracy of the font recognition model is gradually improved through back propagation of update parameters.

Example 2:

the embodiment of the invention provides an experimental and analytical example of the font identification method in the embodiment 1:

in this embodiment, the SMFNet Chinese character font recognition model is trained and tested by using the XIKE-CFS-1 data set, wherein the proportion of training and testing samples is 8:2, namely, randomly selecting 800 pieces of the font sample for training, and using the rest 200 pieces for testing. In order to improve the generalization performance of the model, the training samples are randomly rotated at angles of 0, 90, 180 and 270, and then randomly and horizontally turned. The epoch was set to 500 and the base size was 32 during training, and the learning rate was set to 0.0005 using Adam optimizer. Single sample testing was performed using the XIKE-CFS-2 dataset, which included 14 kanji fonts that did not appear in the training set, with 14000 test samples all used for single sample testing. In addition, in order to avoid the accidental performance of the single experimental result, the model performance is objectively evaluated, and all experimental accuracy rates are obtained by taking an average value as a final result after repeating the test for 100 times on the test set.

1. Loss function comparison experiment:

in order to evaluate the influence of multiple loss functions on the recognition accuracy, under the condition that the SMFNet Chinese character font recognition model does not need to be provided with a attention module, comparing the recognition accuracy of the comparison loss functions and the multiple loss function model, wherein the multiple loss functions are used forDefault to 1 and experimental results are shown in table 3. As can be seen from Table 3, the recognition accuracy is obviously improved by adopting the multiple loss function, the single sample recognition accuracy is improved by 1.34%, the training sample recognition accuracy is improved by 0.73%, and the multiple loss function model is obviously used for faster convergence according to the change of the loss function along with the epoch in the training process.

TABLE 3 influence of different loss functions on model identification accuracy

2. Attention module comparative experiment:

in order to evaluate the influence of the coordinate attention module CA and the light weight common attention modules SE and CBAM on the recognition performance of the SMFNet Chinese character font recognition model, the three attention modules are respectively inserted into the SMFNet fourth convolution layer BN and a multiple loss function is usedSet to default value 1) for comparison, the experimental results are shown in fig. 6. The recognition accuracy of the three attention mechanisms to the training samples is relatively close, the SE accuracy is 98.20% at the highest, the accuracy of CBAM and SE is different by 0.05%, and the accuracy of CA and SE is different by 0.12%. But for fonts with untrained models, the single sample recognition accuracy differs significantly. The single sample recognition accuracy of CA is 87.32% at maximum, with SE and CBAM accuracy approaching, but about 5 percent worse than CA. The experimental results reflect that for the SMFNet Chinese character font recognition model, the performance of the SE and the CBAM attention module is close, and the attention module CA provided by the embodiment has obvious single sample recognition performance advantage after the model is not trained.

3. Coordinate attention module CA insert position comparison experiment:

the feature extraction network in the SMFNet chinese character font recognition model includes 4 convolutional layers, each of which is in turn followed by relu, maxpool, BN. In order to clarify the influence of the insertion position of the coordinate attention module on the character recognition performance, the coordinate attention module is inserted at different positions behind the first convolution layer of the SMFNet model, and the recognition precision change of the model is compared. The model uses a multiple loss function that is,set to default value 1 and the experimental results are shown in fig. 7. The recognition accuracy of the model for the trained fonts is 98.14% at most, but the recognition accuracy of the model for the untrained fonts is only 89.82% for the single sample before the coordinate attention module is inserted into the BN from the experimental result. The accuracy of the model before the coordinate attention module is inserted into the relu is 90.53% at most for untrained single sample identification, and is nearly 1 percentage point higher than that of the single sample identification of other two positions. But the recognition accuracy for the training fonts is 97.84% which is 0.3% different from the model before the BN is inserted, which is relatively close. Therefore, before the coordinate attention module is inserted into relu, the generalization capability of the model is stronger, and the recognition expansion of the SMFNet Chinese character font recognition model on new fonts is facilitated. The coordinate attention module insertion location in the SMFNet model is determined to be prior to relu.

4. Attention module insertion number comparison experiment:

to optimize the number of coordinate attention modules used in the SMFNet chinese character font recognition model, multiple loss functions are used,setting to a default value of 1, and carrying out attention module insertion quantity comparison experiments before the coordinate attention module is inserted into the Relu. According to the structural characteristics of the 4 convolution layers, the model performances of the five inserted coordinate attention modules are compared. The number and location of the five models inserted into the attention module are as follows: the model A inserts a CA module in the 1 st convolution layer; the model B inserts a CA module in the 4 th convolution layer; the model C is respectively inserted into the CA modules in the 1 st convolution layer and the 2 nd convolution layer; the model D is respectively inserted into CA modules in the 3 rd convolution layer and the 4 th convolution layer; model E inserts CA modules at the 1 st, 2 nd, 3 rd, 4 th convolution layers, respectively. As shown in fig. 8, the experimental result shows that the model a has the best recognition performance, the single sample recognition accuracy of the untrained fonts is up to 90.53%, which is nearly 3% higher than the model B, D, E, and the recognition accuracy of the trained fonts is 97.84%, which is less than 98.11% of the model D, E, but the difference is only 0.27%, which is very close. Model D, E has a test recognition accuracy of 98.11% at the highest for trained fonts, but the two models have a single sample recognition rate of 87.20% and 87.52% lower for untrained fonts. In addition, as can be seen from the change of the recognition accuracy of each model in fig. 8, as the number of the coordinate attention modules increases, the recognition accuracy of the model for trained fonts increases in a small extent, and the recognition accuracy of the model for untrained fonts decreases rapidly for single samples, which reflects that the excessive number of the coordinate attention modules can reduce the generalization capability of the model. For the font recognition task, the SMFNet chinese character font recognition model has a better effect by inserting the coordinate attention module, model a, in the first convolution layer.

5. Multiple loss function optimization experiments:

to adjust speciallyThe specific gravity of the sign extraction loss function in the multiple loss functions realizes the optimal recognition performance of the model A, and the model A is applied toThe recognition accuracy of the model A under different values is compared by taking the values at intervals of 0.5 to 6.0 and the experimental result is shown in figure 9. Using multiple loss functions, when->When the value is 4.5, the single sample recognition accuracy of the model reaches 92.84% at maximum, and the recognition accuracy of the trained sample is 97.50%, when +.>When the value is 0.5, the single sample recognition accuracy of the model is 90.46%, and the highest recognition accuracy of the trained sample reaches 98.06%. Compared with the two, the single sample recognition accuracy of the model is 2.38%, the difference is larger, and the recognition accuracy of the trained sample is 0.56% and is relatively close. Therefore->The model had the best performance at a value of 4.5. It can furthermore be seen from fig. 9 that as +.>The recognition accuracy of the model for the trained data is in a slow descending trend due to the increase of the value, and the recognition accuracy of the single sample is in a general ascending and descending trend. Thus, +.>The specific gravity of the feature extraction loss function in the multiple loss functions can be controlled, and the adjustment of the font style feature and font difference feature extraction capacity of the model is realized. The character style has better generalization performance, and is beneficial to improving the recognition capability of a model single sample.

6. Ablation experiment:

based on the above analysis, the optimal parameters of the SMFNet Chinese character font recognition model can be determined, wherein multiple loss functionsThe value is 4.5, the number of the coordinate attention modules is 1, and the insertion position is before the 1 st convolution layer Relu. To further analyze the contributions of multiple loss functions and coordinate attention modules in model identification, ablation experiments were performed. The experiment was performed using only the contrast loss function, the SMFNet without the attention module as a reference model, and the comparison was performed with the model with multiple loss functions and the coordinate attention module added, and the experimental results are shown in table 4. The experimental result shows that the accuracy of the model training sample is improved by 0.73% by using the multiple loss function alone, and the accuracy of the model training sample is improved by 1.34% by using the single sample. The accuracy of the training samples is only slightly improved by using the coordinate attention module alone, and even the accuracy of a single sample is reduced. When the multiple loss functions and the coordinate attention module are used in combination, the model performance is obviously improved, the accuracy of training samples is improved by 0.49%, and the accuracy of single samples is improved by 2.9%. This illustrates that the multiple loss function and coordinate attention module can improve the generalization capability of model feature extraction, helping to improve the model's ability to expand recognition of new fonts.

Table 4 ablation experiments

7. Comparative experiments with other methods:

in order to verify the performance of the SMFNet chinese character font recognition model in other chinese character font recognition tasks, the NCFS chinese character font public dataset is used for training and comparing with the common classification model results. The NCFS dataset involved three parts of chinese ancient calligraphers fonts, standard computer fonts and chinese ancient text, for a total of 18 fonts, each containing nearly 1000 samples, and according to 8: the ratio of 2 is randomly divided into a training set and a test set. The trained font recognition accuracy for the NCFS dataset for the different models is shown in table 5, where SwordNet has the best recognition accuracy for the trained fonts up to 99.03%, SMFNet recognition accuracy 98.05% different from SwordNet by about 1 percentage point, but SMFNet has the least number of parameters among all models, and SwordNet is approximately 19 times as high as SMFNet. The SMFNet model parameters are even less than the classical lightweight network model, only 1/5 of the effientnet, 2/3 of the ShuffleNet, but have significant advantages in recognition accuracy over these two lightweight network models, about 8 percent above 89.89% of the effientnet, and about 7 percent above 90.71% of the ShuffleNet. In addition, SMFNet is the only model capable of carrying out single sample recognition on untrained fonts among 8 models participating in comparison experiments, and the accuracy rate of single sample Chinese character font recognition on an XIKE-CFS-2 data set reaches 92.84%. The SMFNet model therefore performs better than the common classification model in terms of parameters and single sample expansion capability.

Table 5 results of performance comparisons with other models

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. A multi-category Chinese character single sample font identification method is characterized by comprising the following steps:

2. The method according to claim 1, wherein in step S1, the chinese character fonts in the data subset XIKE-CFS-1 for performing font recognition training and testing on the chinese character font recognition model include two similar fonts in each font category in the standard print font, a soft pen handwriting font, and a hard pen handwriting font;

3. The method according to claim 2, wherein in the step S2, the chinese character font recognition model includes sequentially connecting two parallel feature extraction sub-networks, a feature metric sub-network, a third full-connection layer, a fourth full-connection layer, and an output layer;

4. The method for recognizing single-sample fonts of multi-category Chinese characters according to claim 3, wherein the feature extraction sub-network comprises a first feature extraction module, a coordinate attention module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially connected;

5. The method according to claim 4, wherein the coordinate attention module comprises two parallel one-dimensional global averaging pooling layers, a multi-channel one-dimensional vector stitching layer, a convolution compression layer, an information encoding layer, two parallel channel number adjusting layers, two parallel Sigmoid functions and a feature weighting layer which are sequentially connected, and the feature weighting layer is further connected with the input of the coordinate attention module.

6. The method for recognizing single-sample fonts of multi-class Chinese characters according to claim 4, wherein the multiple loss functions comprise feature extraction loss functions and similarity discrimination loss functions.

7. The method for recognizing single-sample fonts of multi-category Chinese characters according to claim 6, wherein said feature extraction loss functionThe method comprises the following steps:

in the method, in the process of the invention,for the number of sample pairs>For category number->Is the firstnThe class label of sample 1 in each sample pair,is the firstnSample 1 in each sample pair is predicted to belong to a category through a characteristic extraction sub-networkkProbability of->Is the firstnClass label of sample 2 in each sample pair, +.>Is the firstnSample 2 in each sample pair is predicted to belong to category through characteristic extraction sub-networkkIs a probability of (2).

8. The method for recognizing single-sample fonts of multi-category Chinese characters according to claim 7, wherein said similarity discrimination loss functionThe method comprises the following steps:

in the method, in the process of the invention,for a sample pair whether it belongs to the same class of labels, +.>Dimension-reducing feature vector for sample pair>And->European distance,/, of->Is a distance threshold.

9. The method for recognizing single-sample fonts of multi-class Chinese characters according to claim 8, wherein said multiple loss functionThe method comprises the following steps: