CN107220641B - Multi-language text classification method based on deep learning - Google Patents

Multi-language text classification method based on deep learning Download PDF

Info

Publication number
CN107220641B
CN107220641B CN201610169483.7A CN201610169483A CN107220641B CN 107220641 B CN107220641 B CN 107220641B CN 201610169483 A CN201610169483 A CN 201610169483A CN 107220641 B CN107220641 B CN 107220641B
Authority
CN
China
Prior art keywords
text
neural network
training
picture
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610169483.7A
Other languages
Chinese (zh)
Other versions
CN107220641A (en
Inventor
金连文
冯子勇
阳赵阳
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Fujitsu Ltd
Original Assignee
South China University of Technology SCUT
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Fujitsu Ltd filed Critical South China University of Technology SCUT
Priority to CN201610169483.7A priority Critical patent/CN107220641B/en
Publication of CN107220641A publication Critical patent/CN107220641A/en
Application granted granted Critical
Publication of CN107220641B publication Critical patent/CN107220641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention introduces a multilingual text classification method based on deep learning. The method specifically comprises the following steps: acquiring a multi-language text training image set; performing line segmentation, height normalization and binarization processing on the image text; increasing the complexity of the training image set and expanding the sample space; designing a deep convolutional neural network, and training by using a training image set; and cutting the text pictures to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result. The invention can make the computer accurately classify the different language texts by designing the deep convolutional neural network and learning and distinguishing the characteristics of the multilingual texts.

Description

Multi-language text classification method based on deep learning
Technical Field
The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to a method for classifying multilingual texts.
Background
The characters break through the time and space limitations of spoken language, so that people can use paper documents to completely inherit the intelligence and mental wealth of people, the people can perfect an education system, improve the intelligence of the people, develop scientific technology and enter the civilized society.
With the rapid development of computer technology, document analysis technology is also widely applied to daily life such as storage and retrieval of paper documents. Digital documents have transitioned from original plain text documents to text-picture shuffling, handwritten print shuffling, multilingual document shuffling, and the like.
In real life, a large number of documents are mixed and arranged for multiple languages. The characters of different countries in the document, especially some very similar characters, such as Chinese and Japanese, English and Russian, are difficult to distinguish by using a common method.
Convolutional neural networks are one type of artificial neural networks, and have become a hot research point in the field of current speech analysis and image recognition. The weight sharing network structure of the system is more similar to a biological neural network, the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional neural networks are multi-layered perceptrons specifically designed to recognize two-dimensional shapes, and the network structure is highly invariant to translation, scaling, tilting, or other forms of deformation.
In recent decades, the research work of artificial neural networks, especially convolutional neural networks, has been deepened and made great progress, and the practical problems which are difficult to solve by many modern computers in the fields of voice analysis, image recognition and the like have been successfully solved, and the intelligent characteristics are shown.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art, and provides a multilingual text classification method based on deep learning by utilizing the advantages of a deep convolutional neural network in the field of image recognition.
The invention adopts the following technical scheme: a multilingual text classification method based on deep learning comprises the following steps: (1) acquiring a multi-language text training image set; (2) segmenting image text lines, normalizing height and binarizing; (3) increasing the complexity of the training image set and expanding the sample space; (4) designing a deep convolutional neural network, and training by using a training image set; (5) and cutting the text pictures to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result.
Preferably, the step (2) includes the steps of:
(21) if the training image set contains a plurality of lines of texts, dividing the picture containing the plurality of lines of texts into a single-line text picture, and centering the single-line text in the vertical direction;
(22) highly normalizing a single line of a text picture into H pixels;
(23) converting the normalized picture into a gray-scale image, and performing local self-adaptive binarization on the gray-scale image:
giving the size of a local window, calculating a corresponding threshold value according to the statistical characteristics of each pixel in the local window and the following formula, and carrying out binarization on the image:
Figure BDA0000947427670000021
wherein img is a pixel point of the gray image to be processed; WS is the set local window size; e1 is the average value of the pixels in the local window; e2 is a mean adjustment parameter; q1 is the standard deviation between pixels in the local window; q2 is a standard deviation adjustment parameter.
Preferably, the step (3) includes the steps of:
(31) cutting a text picture with the height of H and the width of W according to the step length delta W into a picture with the width of W, and amplifying the width of the text picture with the height of H and the width of W to W;
(32) setting the initial offset of picture cutting as delta W, wherein the range of the delta W is delta W/3 to delta W/2, repeating the step (31), expanding the sample space, and cutting the pictures with the height of H and the width of more than W into N pictures with the size of W × H;
preferably, the parameter ranges of the steps (31), (32) are as follows: w ranges from 90 to 100 pixels, aw ranges from 20 to 30 pixels, and aw ranges from 7 to 15 pixels.
(33) The method comprises the steps of cutting pictures of a training image set, adding noise to the cut pictures to enlarge a sample space, grouping W × H pictures cut by the pictures of the training image set and NS pictures generated after noise addition into a group and marking as imgsNS
Preferably, the noise adding process of step (33) includes the following:
line interference: the number range of the lines is 3 to 8, the width range of the lines is 1 to 5 pixels, and the appearance positions and angles of the lines are randomly generated;
noise interference: the frequency range of the occurrence of the noise points is 0.05 to 0.2, and the occurrence positions are randomly generated and accord with uniform distribution;
and (3) rotation treatment: the rotating angle range is-15 degrees to 15 degrees;
gaussian blur: the blur radius ranges from 2 to 5 pixels.
Preferably, the step (4) includes the steps of:
(41) designing a deep convolutional neural network:
Input(96x32)->50C5P2S1->ReLU->MP2->30C5P2S1->ReLU->MP2->20C5P2S1->ReLU->MP2->300N->ReLU->Dropout(0.5)->6N->Softmax/Output(6x1)
wherein, Input (96x32) represents an Input layer, and the size of an Input picture received is 96x32 pixels; 50C5P2S1 shows that the kernel size is 5x5, the zero padding size is 2, the step length is 1, 50 convolutional layers of feature maps are output, and feature extraction is carried out on an input image; the ReLU represents a linear correction activation layer, and corrects the features obtained by convolution to accelerate the learning speed of the neural network; MP2 represents the largest pooling layer with kernel size of 2x2 and step length of 2, and performs maximum extraction on the corrected features to reduce network complexity; 300N represents a full-connection layer with 300 dimensionality output, and the features obtained by the previous layer are learned according to different weights; dropout (0.5) represents a random inhibition layer with an inhibition ratio of 50% to prevent the network from over-learning the training samples to cause the classification capability to be reduced; Softmax/Output (6x1) indicates that the Output layer is a Softmax layer, and the Output is the probability distribution of the input pictures classified into any one of 6 categories;
(42) deep convolutional neural network training, the process is as follows:
(421) from group B imgsNSRandomly drawing one piece of image to form a batch of training sample imgsBB, training the neural network designed in the step (41), wherein the value of B is 64, 100 or 256;
(422) the neural network designed in the step (41) adopts a self-adaptive gradient descent method to adjust the learning rate of the network, the initial learning rate is set as base _ lr, the penalty coefficient of the learning parameters is lambda, the maximum training iteration times are max _ iters, and the learning rate updating mode is as follows:
Figure BDA0000947427670000041
the learning rate influences the iterative step length of the neural network for finding the optimal solution in the training sample space; the base _ lr determines the initial learning rate of the neural network, and the value is 0.01, 0.005 or 0.003; lambda is used for preventing the neural network from over-learning the training set sample, and the value of lambda is 0.01, 0.005, 0.003 or 0.001; max _ iters is the number of learning iterations required to be performed when the neural network classification accuracy reaches a required threshold, and the range is 10000-30000; curr _ lr is the current learning rate; gamma, power are learning rate adjustment parameters, ranging from 0.0005 to 0.0001 and 0.70 to 0.80, respectively; iter is the current iteration number.
Preferably, the step (5) includes the steps of:
(51) img for text picture to be classifiedtestIntercepting the total N according to the mode of the step (31)testPicture imgs of W × H sizeW×H
(52) Will NtestZhang imgsW×HInputting the depth convolution neural network designed in the step (4) to obtain NtestA probability distribution of possible component classifications; setting the threshold value to PtestLess than P in the probability distributiontestIs set to 0, and then N is settestAnd (4) averaging the group probability distribution, and outputting the category with the maximum probability average as a final classification result.
Preferably, the parameter ranges of said step (52) are as follows: l is 6, said PtestIn the range of 0.2 to 0.35
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the deep convolutional neural network is used, the complex processes of feature extraction and data reconstruction in the traditional recognition algorithm are avoided, the feature expression for distinguishing the multi-language text can be well learned, and the recognition accuracy is improved.
(2) By adopting the deep convolutional neural network, better local characteristics can be extracted and the translation invariance is realized, so that the identification performance and robustness of the invention are improved.
(3) The algorithm has high recognition rate and strong robustness, can effectively distinguish the characteristics of the multi-language text from the centralized learning of the training images, and improves the generalization capability of the system after introducing the node random inhibition, thereby obtaining better classification performance.
Drawings
FIG. 1 is a flow chart of a multi-lingual text classification method of the present invention;
FIG. 2 is a flow chart of the pretreatment of the present invention;
FIG. 3 is an example of a pre-processing procedure of the present invention;
FIG. 4 is a diagram of the deep convolutional neural network of the present invention;
FIG. 5 is a flowchart of the text image classification according to the present invention;
FIG. 6 is an example of a text picture classification process of the present invention;
Detailed Description
The present invention will be further described with reference to the following drawings and examples, but the embodiments of the present invention are not limited thereto.
Examples
Referring to fig. 1, taking the text for distinguishing chinese, japanese, english, russian, korean, and arabic as an example, the multilingual text classification method of the present invention includes the following steps:
(1) acquiring a multi-language text training image set;
in the step, the image acquisition is mainly carried out in the modes of electronic document screenshot, paper document photographing, paper document scanning and the like to obtain text images of a plurality of different languages, and the number of the images of each language is approximately equal; and the collected images are classified into L types according to the language type, where L is 6 in this embodiment.
(2) Segmenting image text lines, normalizing height and binarizing;
the step (2) comprises the following steps:
(21) if the picture of the training set contains a plurality of lines of texts, the picture is divided into a single-line text picture, and the single-line text picture is approximately centered in the vertical direction;
(22) highly normalizing a single line of a text picture to 32 pixels;
(23) converting the height normalization picture into a gray-scale image, and performing local self-adaptive binarization on the gray-scale image:
giving the size of a local window, calculating a corresponding threshold value according to the statistical characteristics of each pixel in the window and the following formula, and carrying out binarization on the gray level image:
Figure BDA0000947427670000061
img is pixel points of a gray level image to be processed, and the value range of each pixel point is 0 (black) to 255 (white); WS is the size of a set local window, and 21 pixels are taken; e1 is the average value of the pixels in the window; e2 is a mean value adjusting parameter, and is taken as 0.9; q1 is the standard deviation between pixels in the window; q2 is a standard deviation adjusting parameter and is 0.9.
(3) The complexity of a training image set is increased, and the sample space is enlarged;
the step (3) comprises the following steps:
(31) cutting a text picture with the height of 32 pixels and the width of more than 96 pixels into pictures with the width of 96 pixels according to step length of 24 pixels; amplifying the width of a text picture with the height of 32 pixels and the width of less than 96 pixels to 96 pixels;
(32) setting the initial offset of picture cutting as delta S (the range of the delta S is 8-12 pixels), repeating the step (f), and expanding the sample space; a picture with the height of 32 pixels and the width of more than 96 pixels is cut into N pictures with the size of 96x 32;
(33) adding noise to the cut pictures to enlarge the sample space, grouping a cut picture with the size of 96x32 and the NS pictures generated after adding noise into a group, and marking as imgsNS(ii) a The specific noise adding treatment comprises the following steps:
line interference: the number range of the lines is 3 to 8, the width range of the lines is 1 to 5 pixels, and the appearance positions and angles of the lines are randomly generated;
noise interference: the frequency range of the occurrence of the noise points is 0.05 to 0.2, and the occurrence positions are randomly generated and accord with uniform distribution;
and (3) rotation treatment: the range of the rotation angle is-15 degrees to 15 degrees, and the negative angle represents counterclockwise rotation;
gaussian blur: the blur radius ranges from 2 to 5 pixels.
The steps (2) and (3) constitute the pretreatment process of the present invention, as shown in fig. 2; the image after the cutting and noise addition processing is shown in fig. 3, and in the example shown in fig. 3, N is 3 and NS is 3.
(4) Designing a deep convolutional neural network, and training by using a training image set;
the step (4) comprises the following steps:
(41) designing a deep convolutional neural network, for example, classifying Chinese, Japanese, English, Korean, Russian and Arabic, as shown in FIG. 4:
Input(96x32)->50C5P2S1->ReLU->MP2->30C5P2S1->ReLU->MP2->20C5P2S1->ReLU->MP2->300N->ReLU->Dropout(0.5)->6N->Softmax/Output(6x1)
wherein, Input (96x32) represents an Input layer, and the size of an Input picture received is 96x32 pixels; 50C5P2S1 shows that the kernel size is 5x5, the zero padding size is 2, the step length is 1, 50 convolutional layers of feature maps are output, and feature extraction is carried out on an input image; the ReLU represents a linear correction activation layer, and corrects the features obtained by convolution to accelerate the learning speed of the neural network; MP2 represents the largest pooling layer with kernel size of 2x2 and step length of 2, and performs maximum extraction on the corrected features to reduce network complexity; 300N represents a full-connection layer with 300 dimensionality output, and the features obtained by the previous layer are learned according to different weights; dropout (0.5) represents a random inhibition layer with an inhibition ratio of 50% to prevent the network from over-learning the training samples to cause the classification capability to be reduced; Softmax/Output (6x1) indicates that the Output layer is a Softmax layer, and the Output is the probability distribution of the input pictures classified into any one of 6 categories;
(42) deep convolutional neural network training, the process comprises the following specific steps:
(421) from 100 groups imgsNSRandomly drawing one piece of image to form a batch of training sample imgsB100 pieces in total, used for training the neural network designed in the step (41);
(422) learning the neural network in the step (41) by adopting a self-adaptive gradient descent method, setting an initial learning rate as base _ lr, a learning parameter penalty coefficient as lambda, and the maximum training iteration times as max _ iters, wherein the learning rate updating mode is as follows:
Figure BDA0000947427670000071
the learning rate influences the iterative step length of the neural network for finding the optimal solution in the training sample space; the base _ lr determines the initial learning rate of the neural network, and the value is 0.01; lambda is used for preventing the neural network from over-learning the training set sample, and the value is 0.001; max _ iters is the number of learning iterations required to be performed when the neural network classification accuracy reaches a required threshold, and the value is 10000; curr _ lr is the current learning rate; gamma and power are learning rate adjusting parameters, and values are 0.0001 and 0.75 respectively; iter is the current iteration number.
(5) And (4) cutting a text picture to be classified, inputting the deep convolution neural network designed in the step (4), averaging the probability distribution according to the probability distribution obtained by convolution, and outputting a classification result.
Step (5) comprises the following steps, as shown in fig. 5:
(51) cutting a text picture according to the mode of the step (31), and cutting out 3 pictures with the size of 96x 32;
(52) inputting the 3 intercepted pictures into the deep convolution neural network designed in the step (4) to obtain 3 groups of possible probability distribution of classification; setting the threshold value to be 0.3, setting the probability value smaller than 0.3 in the probability distribution to be 0, then averaging the 3 groups of probability distributions, and outputting the category with the maximum probability average value as the final classification result.
In the example shown in fig. 6, the text picture to be classified is a japanese picture, and 3 pictures are obtained after cutting in a window sliding manner; inputting 3 cut pictures into the deep convolution neural network designed by the invention, respectively calculating the Arabic probability, the Chinese probability, the English probability, the Japanese probability, the Korean probability and the Russian probability for the convolution results of the 3 pictures, calculating the mean value of three groups of probability distributions, maximizing the mean value of the Japanese probability, and outputting a classification result as a Japanese image.
The embodiments of the present invention are not limited to the above-described embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and they are included in the scope of the present invention.

Claims (8)

1. A multilingual text classification method based on deep learning is characterized by comprising the following steps:
(1) acquiring a multi-language text training image set;
(2) segmenting, normalizing and binarizing image text lines;
(3) increasing the complexity of the training image set and expanding the sample space;
the step (3) comprises the following steps:
(31) cutting a text picture with the height of H and the width of W according to the step length delta W into a picture with the width of W, and amplifying the width of the text picture with the height of H and the width of W to W;
(32) setting the initial offset of picture cutting as delta W, wherein the range of the delta W is delta W/3 to delta W/2, repeating the step (31), expanding the sample space, and cutting the pictures with the height of H and the width of more than W into N pictures with the size of W × H;
(33) dividing the picture of a training image set into W × H pictures with size and noise, adding noise to the divided pictures to enlarge the sample space, grouping the W × H pictures with size and noise added to the pictures of a training image set into a group, and marking as imgsNS
(4) Designing a deep convolutional neural network, and training by using a training image set;
(5) cutting a text picture to be classified, inputting the designed deep convolutional neural network, averaging the probability distribution according to the probability distribution learned by the neural network, and outputting a classification result;
the step (5) comprises the following steps:
(51) img for text picture to be classifiedtestIntercepting the total N according to the mode of the step (31)testPicture imgs of W × H sizeW×H
(52) Will NtestZhang imgsW×HInputting the depth convolution neural network designed in the step (4) to obtain NtestA probability distribution of possible component classifications; setting the threshold value to PtestLess than P in the probability distributiontestIs set to 0, and then N is settestAnd (4) averaging the group probability distribution, and outputting the category with the maximum probability average as a final classification result.
2. The multilingual text-classification method of claim 1, wherein said step (2) comprises the steps of:
(21) if the training image set contains a plurality of lines of texts, dividing the picture containing the plurality of lines of texts into a single-line text picture, and centering the single-line text in the vertical direction;
(22) normalizing a single line of text pictures into H pixels;
(23) converting the normalized picture into a gray-scale image, and performing local self-adaptive binarization on the gray-scale image:
giving the size of a local window, calculating a corresponding threshold value according to the statistical characteristics of each pixel in the local window and the following formula, and carrying out binarization on the image:
Figure FDA0002435157550000021
wherein img is a pixel point of the gray image to be processed; WS is the set local window size; e1 is the average value of the pixels in the local window; e2 is a mean adjustment parameter; q1 is the standard deviation between pixels in the local window; q2 is a standard deviation adjustment parameter.
3. The multilingual text classification method of claim 2, wherein the H ranges from 30 to 36, WS ranges from 18 to 22 pixels, and e2 ranges from 0.85 to 0.95; q2 ranges from 0.85 to 0.95.
4. The multilingual text-classification method of claim 1, wherein W ranges from 90 to 100 pixels, aw ranges from 20 to 30 pixels, and aw ranges from 7 to 15 pixels.
5. The multilingual text-classification method of claim 1, wherein the noising process of step (33) comprises the following:
line interference: the number range of the lines is 3 to 8, the width range of the lines is 1 to 5 pixels, and the appearance positions and angles of the lines are randomly generated;
noise interference: the frequency range of the occurrence of the noise points is 0.05 to 0.2, and the occurrence positions are randomly generated and accord with uniform distribution;
and (3) rotation treatment: the rotating angle range is-15 degrees to 15 degrees;
gaussian blur: the blur radius ranges from 2 to 5 pixels.
6. The multilingual text-classification method of claim 1, wherein said step (4) comprises the steps of:
(41) designing a deep convolutional neural network:
Input(96x32)->50C5P2S1->ReLU->MP2->30C5P2S1->ReLU->MP2->20C5P2S1->ReLU->MP2->300N->ReLU->Dropout(0.5)->6N->Softmax/Output(6x1)
wherein, Input (96x32) represents an Input layer, and the size of an Input picture received is 96x32 pixels; 50C5P2S1 shows that the kernel size is 5x5, the zero padding size is 2, the step length is 1, 50 convolutional layers of feature maps are output, and feature extraction is carried out on an input image; the ReLU represents a linear correction activation layer, and corrects the features obtained by convolution to accelerate the learning speed of the neural network; MP2 represents the largest pooling layer with kernel size of 2x2 and step length of 2, and performs maximum extraction on the corrected features to reduce network complexity; 300N represents a full-connection layer with 300 dimensionality output, and the features obtained by the previous layer are learned according to different weights; dropout (0.5) represents a random inhibition layer with an inhibition ratio of 50% to prevent the network from over-learning the training samples to cause the classification capability to be reduced; Softmax/Output (6x1) indicates that the Output layer is a Softmax layer, and the Output is the probability distribution of the input pictures classified into any one of 6 categories;
(42) deep convolutional neural network training, the process is as follows:
(421) from group B imgsNSRandomly drawing one piece of image to form a batch of training sample imgsBB, training the neural network designed in the step (41), wherein the value of B is 64, 100 or 256;
(422) the neural network designed in the step (41) adopts a self-adaptive gradient descent method to adjust the learning rate of the network; setting the initial learning rate as base _ lr, the penalty coefficient of the learning parameters as lambda, and the maximum training iteration times as max _ iters, wherein the learning rate updating mode is as follows:
curr_lr=base_lr×(1+gamma×iter)-power
the learning rate influences the iterative step length of the neural network for finding the optimal solution in the training sample space; the base _ lr determines the initial learning rate of the neural network, and the value is 0.01, 0.005 or 0.003; lambda is used for preventing the neural network from over-learning the training set sample, and the value of lambda is 0.01, 0.005, 0.003 or 0.001; max _ iters is the number of learning iterations required to be performed when the neural network classification accuracy reaches a required threshold, and the range is 10000-30000; curr _ lr is the current learning rate; gamma, power are learning rate adjustment parameters, ranging from 0.0005 to 0.0001 and 0.70 to 0.80, respectively; iter is the current iteration number.
7. The multilingual text-classification method of claim 1, wherein said step (1) is: obtaining text images of a plurality of different languages through electronic document screenshot, paper document photographing or paper document scanning, wherein the number of the images of each language is equal; and the collected images are classified into L types according to the language types.
8. The multilingual text-classification method of claim 7, wherein L is 6 and P is 6testIn the range of 0.2 to 0.35.
CN201610169483.7A 2016-03-22 2016-03-22 Multi-language text classification method based on deep learning Active CN107220641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610169483.7A CN107220641B (en) 2016-03-22 2016-03-22 Multi-language text classification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610169483.7A CN107220641B (en) 2016-03-22 2016-03-22 Multi-language text classification method based on deep learning

Publications (2)

Publication Number Publication Date
CN107220641A CN107220641A (en) 2017-09-29
CN107220641B true CN107220641B (en) 2020-06-26

Family

ID=59928347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610169483.7A Active CN107220641B (en) 2016-03-22 2016-03-22 Multi-language text classification method based on deep learning

Country Status (1)

Country Link
CN (1) CN107220641B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364066B (en) * 2017-11-30 2019-11-08 中国科学院计算技术研究所 Artificial neural network chip and its application method based on N-GRAM and WFST model
CN110796129A (en) * 2018-08-03 2020-02-14 珠海格力电器股份有限公司 Text line region detection method and device
CN109359695A (en) * 2018-10-26 2019-02-19 东莞理工学院 A kind of computer vision 0-O recognition methods based on deep learning
CN109376658B (en) * 2018-10-26 2022-03-08 信雅达科技股份有限公司 OCR method based on deep learning
CN109977762B (en) * 2019-02-01 2022-02-22 汉王科技股份有限公司 Text positioning method and device and text recognition method and device
CN109919037B (en) * 2019-02-01 2021-09-07 汉王科技股份有限公司 Text positioning method and device and text recognition method and device
CN109948615B (en) * 2019-03-26 2021-01-26 中国科学技术大学 Multi-language text detection and recognition system
CN111062264A (en) * 2019-11-27 2020-04-24 重庆邮电大学 Document object classification method based on dual-channel hybrid convolution network
CN112347262B (en) * 2021-01-11 2021-04-13 北京江融信科技有限公司 Text classification method and system, intention classification system and robot
US11790894B2 (en) * 2021-03-15 2023-10-17 Salesforce, Inc. Machine learning based models for automatic conversations in online systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445715A (en) * 2002-03-15 2003-10-01 微软公司 System and method for mode recognising
JP2014049118A (en) * 2012-08-31 2014-03-17 Fujitsu Ltd Convolution neural network classifier system, training method for the same, classifying method, and usage
CN104102919A (en) * 2014-07-14 2014-10-15 同济大学 Image classification method capable of effectively preventing convolutional neural network from being overfit
CN105205448A (en) * 2015-08-11 2015-12-30 中国科学院自动化研究所 Character recognition model training method based on deep learning and recognition method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445715A (en) * 2002-03-15 2003-10-01 微软公司 System and method for mode recognising
JP2014049118A (en) * 2012-08-31 2014-03-17 Fujitsu Ltd Convolution neural network classifier system, training method for the same, classifying method, and usage
CN104102919A (en) * 2014-07-14 2014-10-15 同济大学 Image classification method capable of effectively preventing convolutional neural network from being overfit
CN105205448A (en) * 2015-08-11 2015-12-30 中国科学院自动化研究所 Character recognition model training method based on deep learning and recognition method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的图像分类;李晓普;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315;第25-26页 *
复杂环境下基于角点回归的全卷积神经网络的车牌定位;罗斌等;《数据采集与处理》;20160130;第31卷(第1期);第65-72页 *

Also Published As

Publication number Publication date
CN107220641A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107220641B (en) Multi-language text classification method based on deep learning
Borisyuk et al. Rosetta: Large scale system for text detection and recognition in images
CN107622104B (en) Character image identification and marking method and system
Kamble et al. Handwritten Marathi character recognition using R-HOG Feature
CN106446896B (en) Character segmentation method and device and electronic equipment
US11790675B2 (en) Recognition of handwritten text via neural networks
Burie et al. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
Joshi et al. Deep learning based Gujarati handwritten character recognition
CN107220655A (en) A kind of hand-written, printed text sorting technique based on deep learning
Banumathi et al. Handwritten Tamil character recognition using artificial neural networks
Choudhury et al. Handwritten bengali numeral recognition using hog based feature extraction algorithm
Sampath et al. Decision tree and deep learning based probabilistic model for character recognition
Costa Filho et al. A fully automatic method for recognizing hand configurations of Brazilian sign language
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
Zhuang et al. A handwritten Chinese character recognition based on convolutional neural network and median filtering
Zhang et al. OCR with the Deep CNN Model for Ligature Script‐Based Languages like Manchu
Vinokurov Using a convolutional neural network to recognize text elements in poor quality scanned images
Kamble et al. Geometrical features extraction and knn based classification of handwritten marathi characters
Nandhini et al. Sign language recognition using convolutional neural network
Anggraeny et al. Texture feature local binary pattern for handwritten character recognition
US20220027662A1 (en) Optical character recognition using specialized confidence functions
Zhou et al. Morphological Feature Aware Multi-CNN Model for Multilingual Text Recognition.
Jameel et al. A REVIEW ON RECOGNITION OF HANDWRITTEN URDU CHARACTERS USING NEURAL NETWORKS.
Ajao et al. Yoruba handwriting word recognition quality evaluation of preprocessing attributes using information theory approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant