CN107220641B

CN107220641B - Multi-language text classification method based on deep learning

Info

Publication number: CN107220641B
Application number: CN201610169483.7A
Authority: CN
Inventors: 金连文; 冯子勇; 阳赵阳; 孙俊
Original assignee: South China University of Technology SCUT; Fujitsu Ltd
Current assignee: South China University of Technology SCUT; Fujitsu Ltd
Priority date: 2016-03-22
Filing date: 2016-03-22
Publication date: 2020-06-26
Anticipated expiration: 2036-03-22
Also published as: CN107220641A

Abstract

The invention introduces a multilingual text classification method based on deep learning. The method specifically comprises the following steps: acquiring a multi-language text training image set; performing line segmentation, height normalization and binarization processing on the image text; increasing the complexity of the training image set and expanding the sample space; designing a deep convolutional neural network, and training by using a training image set; and cutting the text pictures to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result. The invention can make the computer accurately classify the different language texts by designing the deep convolutional neural network and learning and distinguishing the characteristics of the multilingual texts.

Description

Multi-language text classification method based on deep learning

Technical Field

The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to a method for classifying multilingual texts.

Background

The characters break through the time and space limitations of spoken language, so that people can use paper documents to completely inherit the intelligence and mental wealth of people, the people can perfect an education system, improve the intelligence of the people, develop scientific technology and enter the civilized society.

With the rapid development of computer technology, document analysis technology is also widely applied to daily life such as storage and retrieval of paper documents. Digital documents have transitioned from original plain text documents to text-picture shuffling, handwritten print shuffling, multilingual document shuffling, and the like.

In real life, a large number of documents are mixed and arranged for multiple languages. The characters of different countries in the document, especially some very similar characters, such as Chinese and Japanese, English and Russian, are difficult to distinguish by using a common method.

Convolutional neural networks are one type of artificial neural networks, and have become a hot research point in the field of current speech analysis and image recognition. The weight sharing network structure of the system is more similar to a biological neural network, the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional neural networks are multi-layered perceptrons specifically designed to recognize two-dimensional shapes, and the network structure is highly invariant to translation, scaling, tilting, or other forms of deformation.

In recent decades, the research work of artificial neural networks, especially convolutional neural networks, has been deepened and made great progress, and the practical problems which are difficult to solve by many modern computers in the fields of voice analysis, image recognition and the like have been successfully solved, and the intelligent characteristics are shown.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art, and provides a multilingual text classification method based on deep learning by utilizing the advantages of a deep convolutional neural network in the field of image recognition.

The invention adopts the following technical scheme: a multilingual text classification method based on deep learning comprises the following steps: (1) acquiring a multi-language text training image set; (2) segmenting image text lines, normalizing height and binarizing; (3) increasing the complexity of the training image set and expanding the sample space; (4) designing a deep convolutional neural network, and training by using a training image set; (5) and cutting the text pictures to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result.

Preferably, the step (2) includes the steps of:

(21) if the training image set contains a plurality of lines of texts, dividing the picture containing the plurality of lines of texts into a single-line text picture, and centering the single-line text in the vertical direction;

(22) highly normalizing a single line of a text picture into H pixels;

(23) converting the normalized picture into a gray-scale image, and performing local self-adaptive binarization on the gray-scale image:

giving the size of a local window, calculating a corresponding threshold value according to the statistical characteristics of each pixel in the local window and the following formula, and carrying out binarization on the image:

wherein img is a pixel point of the gray image to be processed; WS is the set local window size; e1 is the average value of the pixels in the local window; e2 is a mean adjustment parameter; q1 is the standard deviation between pixels in the local window; q2 is a standard deviation adjustment parameter.

Preferably, the step (3) includes the steps of:

(31) cutting a text picture with the height of H and the width of W according to the step length delta W into a picture with the width of W, and amplifying the width of the text picture with the height of H and the width of W to W;

(32) setting the initial offset of picture cutting as delta W, wherein the range of the delta W is delta W/3 to delta W/2, repeating the step (31), expanding the sample space, and cutting the pictures with the height of H and the width of more than W into N pictures with the size of W × H;

preferably, the parameter ranges of the steps (31), (32) are as follows: w ranges from 90 to 100 pixels, aw ranges from 20 to 30 pixels, and aw ranges from 7 to 15 pixels.

(33) The method comprises the steps of cutting pictures of a training image set, adding noise to the cut pictures to enlarge a sample space, grouping W × H pictures cut by the pictures of the training image set and NS pictures generated after noise addition into a group and marking as imgs_NS。

Preferably, the noise adding process of step (33) includes the following:

line interference: the number range of the lines is 3 to 8, the width range of the lines is 1 to 5 pixels, and the appearance positions and angles of the lines are randomly generated;

noise interference: the frequency range of the occurrence of the noise points is 0.05 to 0.2, and the occurrence positions are randomly generated and accord with uniform distribution;

and (3) rotation treatment: the rotating angle range is-15 degrees to 15 degrees;

gaussian blur: the blur radius ranges from 2 to 5 pixels.

Preferably, the step (4) includes the steps of:

(41) designing a deep convolutional neural network:

Input(96x32)->50C5P2S1->ReLU->MP2->30C5P2S1->ReLU->MP2->20C5P2S1->ReLU->MP2->300N->ReLU->Dropout(0.5)->6N->Softmax/Output(6x1)

wherein, Input (96x32) represents an Input layer, and the size of an Input picture received is 96x32 pixels; 50C5P2S1 shows that the kernel size is 5x5, the zero padding size is 2, the step length is 1, 50 convolutional layers of feature maps are output, and feature extraction is carried out on an input image; the ReLU represents a linear correction activation layer, and corrects the features obtained by convolution to accelerate the learning speed of the neural network; MP2 represents the largest pooling layer with kernel size of 2x2 and step length of 2, and performs maximum extraction on the corrected features to reduce network complexity; 300N represents a full-connection layer with 300 dimensionality output, and the features obtained by the previous layer are learned according to different weights; dropout (0.5) represents a random inhibition layer with an inhibition ratio of 50% to prevent the network from over-learning the training samples to cause the classification capability to be reduced; Softmax/Output (6x1) indicates that the Output layer is a Softmax layer, and the Output is the probability distribution of the input pictures classified into any one of 6 categories;

(42) deep convolutional neural network training, the process is as follows:

(421) from group B imgs_NSRandomly drawing one piece of image to form a batch of training sample imgs_BB, training the neural network designed in the step (41), wherein the value of B is 64, 100 or 256;

(422) the neural network designed in the step (41) adopts a self-adaptive gradient descent method to adjust the learning rate of the network, the initial learning rate is set as base _ lr, the penalty coefficient of the learning parameters is lambda, the maximum training iteration times are max _ iters, and the learning rate updating mode is as follows:

the learning rate influences the iterative step length of the neural network for finding the optimal solution in the training sample space; the base _ lr determines the initial learning rate of the neural network, and the value is 0.01, 0.005 or 0.003; lambda is used for preventing the neural network from over-learning the training set sample, and the value of lambda is 0.01, 0.005, 0.003 or 0.001; max _ iters is the number of learning iterations required to be performed when the neural network classification accuracy reaches a required threshold, and the range is 10000-30000; curr _ lr is the current learning rate; gamma, power are learning rate adjustment parameters, ranging from 0.0005 to 0.0001 and 0.70 to 0.80, respectively; iter is the current iteration number.

Preferably, the step (5) includes the steps of:

(51) img for text picture to be classified_testIntercepting the total N according to the mode of the step (31)_testPicture imgs of W × H size_W×H；

(52) Will N_testZhang imgs_W×HInputting the depth convolution neural network designed in the step (4) to obtain N_testA probability distribution of possible component classifications; setting the threshold value to P_testLess than P in the probability distribution_testIs set to 0, and then N is set_testAnd (4) averaging the group probability distribution, and outputting the category with the maximum probability average as a final classification result.

Preferably, the parameter ranges of said step (52) are as follows: l is 6, said P_testIn the range of 0.2 to 0.35

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the deep convolutional neural network is used, the complex processes of feature extraction and data reconstruction in the traditional recognition algorithm are avoided, the feature expression for distinguishing the multi-language text can be well learned, and the recognition accuracy is improved.

(2) By adopting the deep convolutional neural network, better local characteristics can be extracted and the translation invariance is realized, so that the identification performance and robustness of the invention are improved.

(3) The algorithm has high recognition rate and strong robustness, can effectively distinguish the characteristics of the multi-language text from the centralized learning of the training images, and improves the generalization capability of the system after introducing the node random inhibition, thereby obtaining better classification performance.

Drawings

FIG. 1 is a flow chart of a multi-lingual text classification method of the present invention;

FIG. 2 is a flow chart of the pretreatment of the present invention;

FIG. 3 is an example of a pre-processing procedure of the present invention;

FIG. 4 is a diagram of the deep convolutional neural network of the present invention;

FIG. 5 is a flowchart of the text image classification according to the present invention;

FIG. 6 is an example of a text picture classification process of the present invention;

Detailed Description

The present invention will be further described with reference to the following drawings and examples, but the embodiments of the present invention are not limited thereto.

Examples

Referring to fig. 1, taking the text for distinguishing chinese, japanese, english, russian, korean, and arabic as an example, the multilingual text classification method of the present invention includes the following steps:

(1) acquiring a multi-language text training image set;

in the step, the image acquisition is mainly carried out in the modes of electronic document screenshot, paper document photographing, paper document scanning and the like to obtain text images of a plurality of different languages, and the number of the images of each language is approximately equal; and the collected images are classified into L types according to the language type, where L is 6 in this embodiment.

(2) Segmenting image text lines, normalizing height and binarizing;

the step (2) comprises the following steps:

(21) if the picture of the training set contains a plurality of lines of texts, the picture is divided into a single-line text picture, and the single-line text picture is approximately centered in the vertical direction;

(22) highly normalizing a single line of a text picture to 32 pixels;

(23) converting the height normalization picture into a gray-scale image, and performing local self-adaptive binarization on the gray-scale image:

giving the size of a local window, calculating a corresponding threshold value according to the statistical characteristics of each pixel in the window and the following formula, and carrying out binarization on the gray level image:

img is pixel points of a gray level image to be processed, and the value range of each pixel point is 0 (black) to 255 (white); WS is the size of a set local window, and 21 pixels are taken; e1 is the average value of the pixels in the window; e2 is a mean value adjusting parameter, and is taken as 0.9; q1 is the standard deviation between pixels in the window; q2 is a standard deviation adjusting parameter and is 0.9.

(3) The complexity of a training image set is increased, and the sample space is enlarged;

the step (3) comprises the following steps:

(31) cutting a text picture with the height of 32 pixels and the width of more than 96 pixels into pictures with the width of 96 pixels according to step length of 24 pixels; amplifying the width of a text picture with the height of 32 pixels and the width of less than 96 pixels to 96 pixels;

(32) setting the initial offset of picture cutting as delta S (the range of the delta S is 8-12 pixels), repeating the step (f), and expanding the sample space; a picture with the height of 32 pixels and the width of more than 96 pixels is cut into N pictures with the size of 96x 32;

(33) adding noise to the cut pictures to enlarge the sample space, grouping a cut picture with the size of 96x32 and the NS pictures generated after adding noise into a group, and marking as imgs_NS(ii) a The specific noise adding treatment comprises the following steps:

and (3) rotation treatment: the range of the rotation angle is-15 degrees to 15 degrees, and the negative angle represents counterclockwise rotation;

gaussian blur: the blur radius ranges from 2 to 5 pixels.

The steps (2) and (3) constitute the pretreatment process of the present invention, as shown in fig. 2; the image after the cutting and noise addition processing is shown in fig. 3, and in the example shown in fig. 3, N is 3 and NS is 3.

(4) Designing a deep convolutional neural network, and training by using a training image set;

the step (4) comprises the following steps:

(41) designing a deep convolutional neural network, for example, classifying Chinese, Japanese, English, Korean, Russian and Arabic, as shown in FIG. 4:

(42) deep convolutional neural network training, the process comprises the following specific steps:

(421) from 100 groups imgs_NSRandomly drawing one piece of image to form a batch of training sample imgs_B100 pieces in total, used for training the neural network designed in the step (41);

(422) learning the neural network in the step (41) by adopting a self-adaptive gradient descent method, setting an initial learning rate as base _ lr, a learning parameter penalty coefficient as lambda, and the maximum training iteration times as max _ iters, wherein the learning rate updating mode is as follows:

the learning rate influences the iterative step length of the neural network for finding the optimal solution in the training sample space; the base _ lr determines the initial learning rate of the neural network, and the value is 0.01; lambda is used for preventing the neural network from over-learning the training set sample, and the value is 0.001; max _ iters is the number of learning iterations required to be performed when the neural network classification accuracy reaches a required threshold, and the value is 10000; curr _ lr is the current learning rate; gamma and power are learning rate adjusting parameters, and values are 0.0001 and 0.75 respectively; iter is the current iteration number.

(5) And (4) cutting a text picture to be classified, inputting the deep convolution neural network designed in the step (4), averaging the probability distribution according to the probability distribution obtained by convolution, and outputting a classification result.

Step (5) comprises the following steps, as shown in fig. 5:

(51) cutting a text picture according to the mode of the step (31), and cutting out 3 pictures with the size of 96x 32;

(52) inputting the 3 intercepted pictures into the deep convolution neural network designed in the step (4) to obtain 3 groups of possible probability distribution of classification; setting the threshold value to be 0.3, setting the probability value smaller than 0.3 in the probability distribution to be 0, then averaging the 3 groups of probability distributions, and outputting the category with the maximum probability average value as the final classification result.

In the example shown in fig. 6, the text picture to be classified is a japanese picture, and 3 pictures are obtained after cutting in a window sliding manner; inputting 3 cut pictures into the deep convolution neural network designed by the invention, respectively calculating the Arabic probability, the Chinese probability, the English probability, the Japanese probability, the Korean probability and the Russian probability for the convolution results of the 3 pictures, calculating the mean value of three groups of probability distributions, maximizing the mean value of the Japanese probability, and outputting a classification result as a Japanese image.

The embodiments of the present invention are not limited to the above-described embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and they are included in the scope of the present invention.

Claims

1. A multilingual text classification method based on deep learning is characterized by comprising the following steps:

(1) acquiring a multi-language text training image set;

(2) segmenting, normalizing and binarizing image text lines;

(3) increasing the complexity of the training image set and expanding the sample space;

the step (3) comprises the following steps:

(33) dividing the picture of a training image set into W × H pictures with size and noise, adding noise to the divided pictures to enlarge the sample space, grouping the W × H pictures with size and noise added to the pictures of a training image set into a group, and marking as imgs_NS；

(5) cutting a text picture to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result;

the step (5) comprises the following steps:

2. The multilingual text-classification method of claim 1, wherein said step (2) comprises the steps of:

(22) normalizing a single line of text pictures into H pixels;

3. The multilingual text classification method of claim 2, wherein the H ranges from 30 to 36, WS ranges from 18 to 22 pixels, and e2 ranges from 0.85 to 0.95; q2 ranges from 0.85 to 0.95.

4. The multilingual text-classification method of claim 1, wherein W ranges from 90 to 100 pixels, aw ranges from 20 to 30 pixels, and aw ranges from 7 to 15 pixels.

5. The multilingual text-classification method of claim 1, wherein the noising process of step (33) comprises the following:

gaussian blur: the blur radius ranges from 2 to 5 pixels.

6. The multilingual text-classification method of claim 1, wherein said step (4) comprises the steps of:

(41) designing a deep convolutional neural network:

(42) deep convolutional neural network training, the process is as follows:

(422) the neural network designed in the step (41) adopts a self-adaptive gradient descent method to adjust the learning rate of the network; setting the initial learning rate as base _ lr, the penalty coefficient of the learning parameters as lambda, and the maximum training iteration times as max _ iters, wherein the learning rate updating mode is as follows:

curr_lr＝base_lr×(1+gamma×iter)^-power

7. The multilingual text-classification method of claim 1, wherein said step (1) is: obtaining text images of a plurality of different languages through electronic document screenshot, paper document photographing or paper document scanning, wherein the number of the images of each language is equal; and the collected images are classified into L types according to the language types.

8. The multilingual text-classification method of claim 7, wherein L is 6 and P is 6_testIn the range of 0.2 to 0.35.