CN114153974A

CN114153974A - Character-level text classification method based on capsule network

Info

Publication number: CN114153974A
Application number: CN202111489903.7A
Authority: CN
Inventors: 郭欣; 吴玉佳; 季萌; 张璇; 董雷; 陈瑛
Original assignee: Sanda University
Current assignee: Sanda University
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-08

Abstract

The invention discloses a character-level text classification method based on a capsule network, which comprises the steps of extracting character-level text features by utilizing a convolution layer; reconstructing the text features to obtain capsule vectors; acquiring spatial hierarchical relationship information between documents by using the capsule vector; and importing the spatial hierarchical relationship information into a data set for test classification, and outputting to obtain a classification result. The method solves the problem that the text classification model is limited in further improvement of the performance of the text classification model due to the lack of utilization of spatial hierarchical relation information among features.

Description

Character-level text classification method based on capsule network

Technical Field

The invention relates to the technical field of text classification, in particular to a character-level text classification method based on a capsule network.

Background

The text classification is to automatically classify the documents to be classified according to the classes set by the user. According to the type of processed document, there may be a theme classification (e.g., sports, finance) and an emotion classification (e.g., emotion rating in merchandise or movie reviews), etc. The classification can be classified into sentence classification or chapter classification, etc. according to the granularity of the processing objects. Yet another type of text classification is facet-level text classification, which is a classification of several facets of a document, such as a document that may have multiple facets of emotional rating and quality of service simultaneously. In order to complete the text classification task, a user is required to predefine categories first, then the characteristics of the text are extracted, and a text classifier is constructed. The classifier determines a given text as one or more specific categories among all categories. Due to the wide application of text classification in spam classification, automatic question answering, information retrieval, emotion analysis and personalized recommendation, the text classification quickly becomes a hot research problem in the field of natural language processing.

Although existing character-level based text classification methods have achieved competitive experimental results, even these models have demonstrated performance approaching that of word-level text classification models using pre-training tools. However, only the convolution is used to extract the local features at the character level, and the spatial hierarchical relationship information of the original character text features is not obtained, and the spatial hierarchical relationship information may contain the character sequence or some position information beneficial to classification. Therefore, the existing character-level-based text classification model still has further room for improvement. In fact, the most original text characteristic information of the text can be retained to the greatest extent by using characters instead of words as the input of the neural network. However, although the local character-level text features with expressive ability can be extracted well only by using the convolutional network, the spatial hierarchical relationship information among the features is not considered, and the lack of the spatial hierarchical relationship information may affect the classification result. If the spatial hierarchical relationship information of the character-level text local features with strong distinguishing capability can be further obtained on the basis of obtaining the character-level text local features. Then, it is possible to further enhance the classification capability of the text classification model.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments, and in this section as well as in the abstract and the title of the invention of this application some simplifications or omissions may be made to avoid obscuring the purpose of this section, the abstract and the title of the invention, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made keeping in mind the above problems occurring in the prior art and/or the problems occurring in the prior art.

Therefore, the technical problem to be solved by the invention is that the existing character-level-based convolutional neural network text classification model lacks the information of spatial hierarchical relationship among features, and the problem of limiting the further improvement of the performance of the text classification model is solved.

In order to solve the technical problems, the invention provides the following technical scheme: a character-level text classification method based on a capsule network comprises the following steps,

extracting text features of a character level by using the convolution layer;

reconstructing the text features to obtain capsule vectors;

acquiring spatial hierarchical relationship information between documents by using the capsule vector;

and importing the spatial hierarchical relationship information into a data set for test classification, and outputting to obtain a classification result.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: before extracting the text features, constructing a character-level capsule network model, which comprises the following steps:

extracting 5 convolution layers of local features of the text to obtain character-level text features;

the character-level text features are abstract text local features of similar character combinations.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: also include 2 convolutional layers for learning the spatial hierarchical relationship between features;

reconstructing local features of the extracted high-level character-level text into capsule vector representation to obtain an initial capsule layer;

the initial capsule layer is formed by packing the local features of the 16 character-level texts in adjacent position spaces into a whole, namely a 16-dimensional capsule vector.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: further comprising communicating the initial capsule layer with the digital capsule layer;

performing iterative training by using a routing algorithm to enable the capsule vector to acquire spatial hierarchical relationship information;

and obtaining a classification result according to the length of the digital capsule layer.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: extracting the text features comprises:

defining an original character level document D ═ T₁,T₂,…,T_nA document of c categories;

the document comprises n sentences T_d＝{i₁,i₂,…,i_L}，T_d(1≤d≤n)；

The sentence T_d＝{i₁,i₂,…,i_LContains L characters, the character length is L, the character i_L∈R^MExpressed by a word vector with M dimensions;

using convolution kernel W ∈ R^H×MAnd extracting character-level text features, wherein the width of the convolution kernel W is M, and the height of the convolution kernel W is H.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: when the text features are extracted, input and output operations need to be carried out, wherein the input is an original text sentence T_d∈R^L×MThe input characteristic length is L, and the width is M;

each character is provided with a code of length M using a one-hot code.

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: the output is a pass function

The obtained final classification result

As a preferred solution of the character-level text classification method based on the capsule network of the present invention, wherein: the selection of the original text sentence comprises the following steps:

selecting 128 characters with the highest weight by using a character selection algorithm;

embedding the characters into M dimensions with the length of 1024;

and converting each sample data into a matrix with the size of 1024 × M, and transmitting the matrix into the character-level capsule network model. The invention has the beneficial effects that: the character-level text is classified through the character-level capsule network model, the data extraction precision is improved through fully extracting character-level text features, meanwhile, the capsule vectors can well acquire the spatial position hierarchical relation information of the text features, and the model classification performance is greatly improved by utilizing 7 layers of convolution layers for extracting the character-level text features.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

fig. 1 is a diagram illustrating a character-level capsule network model in a character-level text classification method based on a capsule network according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating classification of partial word fragments in a character-level text classification method based on a capsule network according to an embodiment of the present invention;

fig. 3 is a frame diagram of reconstructing local features of a text into capsule vector representations in a character-level text classification method based on a capsule network according to an embodiment of the present invention;

FIG. 4 is a graph comparing the effect of different numbers of characters on the model performance in the character-level text classification method based on capsule network according to an embodiment of the present invention;

fig. 5 is a comparison graph of the influence of character-level text features extracted by different network layer numbers on the model performance in the character-level text classification method based on the capsule network according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Next, the present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially according to the general scale for convenience of illustration when describing the embodiments of the present invention, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Example 1

Referring to fig. 1, 2 and 3, for a first embodiment of the present invention, there is provided a character-level text classification method based on a capsule network, including:

and S1, extracting text features at character level by utilizing the convolution layer. The steps to be described are as follows:

referring to fig. 1, fig. 1 is a diagram of a character-level capsule network model for text classification, the input of the model being a character-level text and the output being a category of the text.

In fig. 1, the model is functionally divided into three parts:

(1) 5 convolution layers for extracting text local features, which are abstract text local features of some similar character combinations in order to obtain character-level text features in the embodiment;

(2) 2 convolution layers used for learning spatial hierarchical relations among the features, and reconstructing local features of extracted high-level character-level texts into capsule vector representations to obtain an initial capsule layer, wherein the initial capsule layer is formed by packaging local features of 16 character-level texts in adjacent position spaces into a whole, namely a 16-dimensional capsule vector;

(3) the initial capsule layer and the digital capsule layer are communicated, in the embodiment, a full connection layer is not used, but a routing algorithm is used for iterative training, so that the capsule vector obtains spatial hierarchical relationship information, the stronger category representation capability is realized, and finally, a classification result is obtained according to the length of the digital capsule layer.

In the character-level capsule network model, an original character-level document D ═ T is given₁,T₂,…,T_nIt is a document containing c categories, each of which includes n sentences T_d(1. ltoreq. d. ltoreq.n), each sentence T_d＝{i₁,i₂,…,i_LContains L characters of length L, each character i_L∈R^MExpressed by a word vector with M dimensions; use ofConvolution kernel W ∈ R^H×MFor extracting character-level text features, the width of the convolution kernel W is M, which is associated with the character i_kSince each character is indivisible and has a height H, the associated definition of the character-level capsule network model CharCaps is as follows:

inputting: the input to the model is an original text sentence T_d∈R^L×MAnd the length of the input characteristic is L, the width of the input characteristic is M, and one-hot coding is used for setting a code with the length of M for each character.

And (3) outputting: the whole training process of the model can define a function

The output of the model is the final class

Thus, for any one text sentence T_d∈R^L×MCan pass through a function

Obtaining the final classification result

In this embodiment, L is generally 1024, which means that 1024 characters can already obtain most text feature information of a certain sample, the length of the sample is less than 1024 and is filled with zero, and the length of the sample exceeding 1024 is truncated; selecting the 128 characters with the highest weight by a character selection algorithm, forming the input of each sample by the characters, and aiming at any one text sentence T_d∈R^L×MIt is an M-dimensional embedding with length of 1024 composed of 128 characters at most, and finally, each sample data will be converted into matrix with size of 1024 × M and transmitted into the model CharCaps.

According to the schematic of fig. 1, 5 convolutional layers are provided in this embodiment for extracting character-level text features, where W e R is the number of convolution kernels for each convolution kernel^H×MFor text sequence T_d＝{i₁,i₂,…,i_LPerforming convolution operation to obtain a text local feature y_nThe following are:

y_n＝f(W·T+b)

where b ∈ R is a bias term and f (W.T + b) is a Linear rectification function (ReLU) whose target is the activation signature.

Using a plurality of convolution kernels of width M and different heights H (typically, H ∈ {2,3,4,5}), a plurality of features can be obtained, forming a set of feature maps Y, as follows:

Y＝[y₁,y₂,…,y_n]

the feature graph Y is used as the input of the next layer, the embodiment uses 5 convolution layers to extract local features of the text at the character level from the character sequence, and through the 5 layers of convolution operations, the text features similar to words and the prefix and affix features are extracted, even the fragments of some words include some special symbol features.

Preferably, the step size of each convolutional layer is set to 1 through a plurality of convolutional layers, the network weight is randomly initialized by using a normal distribution mode, meanwhile, in order to reduce the overfitting of the model, a Dropout operation is set between the convolutional layers, the Dropout operation ensures the mutual adaptation among neurons by randomly discarding some neurons in the forward propagation process, and the Dropout ratio is generally set to 0.22, which is determined repeatedly through experiments; the character-level capsule network performs model optimization by propagating the acquired gradients back, setting a learning rate to control the optimization process.

It will be appreciated that the goal of the feature extraction layer is to convert the original character-level text data through multiple convolutional layers into some fixed character combinations, which may not have actual natural language meaning, but because of the end-to-end training, the feature extraction layer is able to extract some characters or special symbol combinations that are useful for classification.

In fact, the convolution operation is a mechanism for simulating biological vision processing images, which is good at extracting local features from original data information, while natural language can also be regarded as similar information with other images, and various unprocessed character texts can be used as similar image pixel point data, so that the most abundant original text feature information can be reserved; character-level text features are good at recognizing misspellings because it can extract word fragments while also recognizing some tokens, and referring to fig. 2, the part of speech of some words need only be extracted to judge the emotional level of the word.

Preferably, in a large data set, the CharCaps of the character-level capsule network performs better, because for letters, the larger the data set size is, the more sufficient the extracted features are, including information such as complete words, word fragments, root affixes and special symbols, and the character-level text features not only can reduce the loss of information, but also have stronger adaptability to word shorthand or misspelling.

Further, in this embodiment, it should be noted that, in the above step, 5 layers of convolutional layers are used to extract local features of a character-level text, next, text features having spatial position relationship information are further extracted from the extracted character-level text features, then, these high-level text features are reconstructed into capsule vector representations, and finally, a text classification model that does not depend on a pre-training word vector tool and can process multiple languages and special symbols is constructed, so as to improve the performance of the character-level text classification model.

Specifically, the method comprises the following steps:

s2: and reconstructing the text features to obtain capsule vectors. Wherein, it is required to be noted that:

the second part of the model frame is a capsule layer, local features extracted by the front 5 convolutional layers are recombined for one time, and then 2 convolutional layers are used for further extracting more abstract text features; after the reorganization, a 16-dimensional vector representation is constructed, forming an initial capsule layer, the process of reorganization is shown in the dashed box with reference to fig. 3.

In fig. 3, corresponding digital capsule layers are set according to the number of output categories, but a full connection layer and a pooling layer are not provided between the capsule layers in this embodiment, and instead, a dynamic routing algorithm is used to transmit data between the initial capsule layer and the digital capsule layer, and iteration is performed through the routing algorithm to adjust network parameters for effective classification; the goal of the dynamic routing algorithm is to find a set of mappings representing the l-th layer of capsules (e.g., the initial layer of capsules) and the l + 1-th layer of capsules (e.g., the digital layer of capsules); for example, in fig. 3, 256 local features may be reconstructed as 16 vectors of 16 dimensions.

S3: and acquiring spatial hierarchical relationship information between the documents by using the capsule vector. Wherein it is to be noted that:

the routing algorithm is essentially communication between two capsule layers, and is mainly used for replacing a full connection layer, and the length of the routing algorithm is used for expressing the probability of the existence of a category in the last capsule layer; the routing algorithm is to find a coefficient A_ijIn the initial state, the coefficient A_ijIs equal to

It means that the next layer of capsules is the weighted sum of each capsule of the previous layer, and the initial weights are all

The routing algorithm aims to find the most appropriate weight coefficient; first, A_ijPasses a variable B_ijTo obtain a_ijIs obtained by the following formula, variable B_ijThe initial value is 0.

Obtaining the coefficient A_ijThen, an intermediate variable s is obtained by calculation_jThe calculation method is as follows:

in obtaining s_jThen according to s_jV is obtained by calculation_jDigital glueCapsules in the capsule layer v_jThe calculation method is as follows:

calculating according to the following formula to obtain a new coefficient B_ijAnd at this point, an iterative process of routing is completed.

B_ij＝B_ij+W_ij·u_i·v_j

Wherein, W_ijIs a fixed shared weight matrix.

Usually, the character-level capsule network with the best performance can be obtained through three or so routing iterations, the last layer of the capsule network is the class layer, and the length of the capsule is used for representing the probability of each class.

The length of the digital capsule layer output represents the probability of the existence of a certain class, coefficient A_ijIs updated by a routing algorithm, but other parameters of the whole network and a shared weight matrix W_ijThe updating is performed according to a loss function, which is defined as follows:

Loss_k＝E_k·max(0,m⁺-||V_c||)²+λ(1-E_k)·max(0,||V_c||-m^-)²

wherein m is⁺0.9 is the upper boundary, m^-0.1 is the lower boundary, | | V_cI is the probability that the input sample is judged as class c, and L is₂Norm distance, E_kIndicating whether a category exists, if so, E_k1, otherwise, E_k＝0。

For the missing class loss function, the weight of λ is adjusted downward to reduce the length of the motion vector for all digital capsules, where λ is 0.5 in this embodiment, and the total loss is the sum of all digital capsule losses.

Preferably, the input for each sample in the character-level capsule network model consists of 128 characters, including 26 english letters, wherein the letters are non-size-specific, and 10 digits and 92 special symbols selected from 1140 characters, the conventional 26 letters and 10 digits being the most common, and they both need to be retained because they are important characters; but there are more than 1140 characters, and these other special symbols include some mathematical symbols, unit symbols, arrow symbols, greek letters, russian letters, chinese characters, korean symbols, currency symbols, and so on; in fact, the frequency of occurrence of a large number of characters is very low, and the contribution to classification is not large, for this reason, this embodiment follows the TF-IDF algorithm, and performs sorting according to the weights of the special symbols, and finally selects 92 special symbols, which are selected from 1140 special symbols according to the descending order of the weights.

The sequencing idea is as follows: if the frequency of a certain character in a certain class of documents is very high and the character rarely appears in other classes of documents, the character is considered to be very high in weight and have good class distinguishing capability, and the character is suitable for classification; therefore, for each special symbol, not only its occurrence frequency but also the inverse document frequency are considered, and finally the product of the frequency and the inverse document frequency is obtained to obtain the weight of the special symbol, the special symbol with high weight is considered to be the special symbol to be reserved, and the frequency of the occurrence of the special symbol and the inverse document frequency are comprehensively considered to obtain a comprehensive weight w, which is the product of the frequency and the inverse document frequency, as follows:

wherein N is the number of times a particular symbol appears in a document, N represents the total number of document categories in a corpus, Q represents the number of documents containing the particular symbol +1, and the denominator +1 is to avoid the denominator being 0.

Preferably, the comprehensive weight of each special symbol is obtained according to the above formula, 92 special symbols with the highest weight are selected as the special symbols to be reserved in the present embodiment, and the total number of characters of all samples is no more than 128 by adding 26 english letters and 10 digits.

And S4, importing the spatial hierarchical relationship information into a data set for test classification, and outputting to obtain a classification result.

Preferably, the character-level text is classified through the character-level capsule network model, the data extraction precision is improved by fully extracting the character-level text features, meanwhile, the capsule vector can well acquire the spatial position hierarchical relationship information of the text features, and the model classification performance is greatly improved by utilizing 7 layers of convolution layers for extracting the character-level text features.

Example 2

Referring to fig. 4 and 5, a second embodiment of the present invention, which is different from the first embodiment, provides an experiment of a character-level text classification method based on a capsule network, including:

in this example, the method proposed by the present invention was evaluated by applying the model proposed by the present invention (i.e., the capsule network model proposed in example 1 above) to 5 public large reference datasets.

Specifically, since the processing object is a character, the preprocessing is performed by using a one-hot encoding method, and each character representation in the sample becomes a 128-dimensional vector representation.

In this experiment, Evaluation metrics uses accuracy for measurement, and previous research shows that a character-level text classification model generally performs well in a large data set, so in this embodiment, 5 large data sets are used for verification of the proposed method. They are described below:

AG News: the data set has 4 categories, which are from the AG news article corpus, with a number of training samples of 30000 and a number of testing samples of 1900 per category.

Yelp Review: there are 5 categories for the dataset, which are from Yelp reviews, Yelp dataset challenges from 2015, categories are scored as categories, 1 star to 5 stars, respectively, with 130000 training samples and 10000 test samples in each category.

Dbpedia: there are up to 14 categories of data set, DBpedia is a crowd-sourced community aimed at extracting structured information from Wikipedia, 40000 training samples and 5000 testing samples were randomly selected for each category.

Yahoo Answers: the dataset had 10 classes from the Yahoo Answers dataset, each class containing 140000 training samples and 6000 test samples.

Amazon Review: there are 5 classes of datasets from Amazon Review. The complete data set contained 600000 training samples and 130000 test samples in each category.

Table 1: summary statistics of the dataset.

Dataset	Classes	Type	Train	Test
					AG_News
	4	Topic	120000	7600
					Yahoo_Answers	10	Question	1400000	60000
Dbpedia	14	Topic	560000	70000
					Yelp_Review	5	Sentiment	650000	50000
Amazon_Review	5	Sentiment	3000000	650000

Referring to table 1, statistical information of 5 data sets is given, in table 1, the first column shows names of 5 data sets, the 2 nd column shows the number of categories of each data set, the 3 rd column shows the number of training samples, and the 4 th column shows the number of testing samples, and this embodiment uses the 5 large data sets to test the performance of the model.

In this experiment, to verify the performance of the proposed algorithm, several latest baseline algorithms were selected for comparison, including the use of pre-trained word-level models and the non-pre-trained character-level classification models. The introduction is as follows:

CharCNN a classical convolutional network structure for character-level text classification, a 9-layer neural network containing 6 convolutional layers, is the first deep learning method for the character-level text classification task.

The Word-based CNN uses a pre-training Word2vec Word vector tool as the supplement of external knowledge, the embedding size is 300, and 6 convolutional layers are used for extracting text features.

Word-based LSTM is a model structure of a recurrent neural network, and a 300-dimensional pre-training Word2vec Word vector tool is used as a supplement of external knowledge.

Word-based capsules using a Word-based Capsule network for text classification using a 300-dimensional pre-trained Word2vec Word vector tool as a complement to external knowledge, the structure of which comes from base capsules.

Hierarchical: a word-level classification model based on a pre-trained model takes knowledge from an external dictionary and maps them to word vector space, reducing model parameters.

MixText: a word-level classification model based on a pre-training model creates an enhanced training sample by inserting text in a hidden space.

Structural neurological attention: external knowledge is used in the form of topic class classification to assist classification by introducing a deep classifier based on neural attention.

GP-Dense: a character-level text classification model uses an evolution deep learning techniques for improving model performance.

The experimental configuration was as follows:

GPU server configuration used for the experiment: 2Xeon 4210CPU, 128GB memory, 2 NVIDIA GV100 GPU, have 32GB2 ═ 64G apparent memory, the algorithm proposed is realized under Ubuntu 16.04 operating system using python language, the algorithm is based on tensoflow1.9.0 and keras2.0.9 frameworks; for training, onehot encoding is used for preprocessing, the dimension of the character vector is 128 dimensions, and the related hyper-parameters are shown in table 2.

Table 2: hyper-parameters on large datasets.

Reference 2, the hyper-parameters in the experimental process are given, wherein b: a batch size; lr is a Learning rate; fn: number of filter; ks: kernel size; r: number of routing iterations d drop regularization constant of the Dropout layer E Number of training iterations.

In the experiment, a 128-dimensional character embedding vector is initialized, the AG News data set is subjected to batch processing with the size of 256, the data sets such as Yahoo Answers and Dbpedia are subjected to batch processing with the size of 16, the Yelp Review and Amazon Review are subjected to batch processing with the size of 64, an Adam optimizer is utilized, the initial learning rate is 0.0005, the model for training the Dbpedia data set is used, and the initial learning rate of other data sets is 0.0001; in the capsule layer, the convolution kernel sizes of two convolution layers are both 3, the number is 256, the routing iteration is performed 3 times to achieve the optimal performance, but 5 convolution layers in the front are used for extracting character-level text features, the sizes of the convolution layers are 5,4,3,3 and 2 respectively, a Dropout regularization is added behind each convolution layer, the Dropout ratio is 0.22, due to the fact that the model is complex and the size of the data set is large, the iteration number is difficult to set to a high value, which means very long training time, the iteration number is set to 1000 times on the AG News data set, the iteration number is set to 200 on the data sets such as Yahoo Answers and Dbpedia, and the data sets such as Yelp Review and Amazon Review are set to 400.

And (3) test results:

to provide a fair comparison to competing models, a series of experiments were conducted using traditional pre-trained word-level based models and without pre-trained character-level classification models, trying to select models that can provide comparable and competitive results, and reporting the results faithfully without making any model selection.

Table 3: and (5) experimental results.

The accuracy of the model extracted herein and the comparison method is shown in table 3, with the best results shown in bold red and the second best results shown in bold and oblique, on each data set, with underlining without using a pre-trained model.

Looking first at word-based models, which can use external knowledge to supplement textual feature representations for improving the performance of the models, on the AG News dataset, the model CharCaps proposed by the present invention achieves the best performance, the second achievement, MixText, even though it uses a pre-trained model, the model proposed by the present invention is not used; on Yahoo Answers and Dbpedia data sets, the best result is MixText using a pre-training model, but the method of the invention obtains the second achievement without using the pre-training model; on a Yelp Review data set, the best result is GP-Dense, the accuracy rate of 61.05 percent is obtained, and the model provided by the invention obtains the achievement of the second name, which is 60.9 percent; on the data set Amazon Review, Word-based LSTM obtains the best performance, Word2vec used by the data set is used as pre-training to obtain the incidence relation of words, the performance of the model is improved, the data set is large in scale, text features with context information can be extracted by the LSTM, and the accuracy is up to 59.43%. The model provided by the invention achieves the second achievement on the Amazon Review data set, and the accuracy rate also reaches 58.9%.

Then, looking at the behavior of each model without using the pre-trained model, it can be seen from the last three columns of table 3 that the method proposed by the present invention shows that the best performance was obtained on 4 of 5 data sets without using pre-training, while the Yelp Review also approaches the optimal result with an accuracy of 60.9% and an accuracy approaching the optimal accuracy of 61.05%. This shows that the method provided by the present invention is effective, and if a pre-training tool is not used, the method provided by the present invention can extract text features with strong expression ability on most data sets, thereby improving the performance of the model.

It should be noted that different numbers of characters have certain influence on the result, the traditional character-level text classification model based on the convolutional neural network only uses 69 characters, which are the total of 26 characters, 10 numbers, and 33 fixed special symbols, as the input of the model, and ignores other characters, in order to analyze the influence of different numbers of characters on the classification result, this embodiment sets 7 cases, such as 34, 64, 96, 128, 160, 224, 256, and so on, to perform experiments, and the overall experiment result is shown in fig. 4.

Referring to FIG. 4, the selection of 128 characters is optimal over 3 data sets such as AG News, Yahoo Answers, and Yelp Review, and the selection of 160 is optimal over the Dbpedia data set, 98.9%; on the Dbpedia data set, the number of characters is 128, which is close to the optimal value, and is 98.3%, the possible reason is that Dbpedia is a very large data set, which contains many special symbol information such as Chinese characters, so 160 characters can retain more information relative to 128 characters, but too many symbols are not necessary, and the accuracy cannot be further improved; the possible reason why Amazon Review is a very large data set without too many special symbols, most of which are mainly characters with 26 english letters, is that Amazon Review is optimal at 96, reaches 58.9%, and is close to the optimal value at 128, and is 58.2%.

Therefore, how many characters are selected needs to be determined according to the problem background, and the information contained in different backgrounds is different, so that the number of the characters needed to be reserved is also different.

The method is different from the method based on a convolutional neural network in that a pooling layer is not used, Relu is used as an activation function, up to 5 convolutional layers are used as character text feature extraction layers in the front, 2 convolutional layers are arranged on a capsule layer to further extract text features of higher levels, and then the extracted text features of the higher levels are reconstructed to obtain capsule vector representation.

In fact, different numbers of convolutional layers have certain influence on the performance of the model, in order to analyze the influence of different numbers of layers on extracting character-level text features, this embodiment separately sets 1 to 7 layers for experimental analysis, and the overall experimental result is shown in fig. 5.

Referring to fig. 5, 5 convolutional layers can better extract character-level text features, and on a Dbpedia data set, 6 layers are set to achieve the optimal result, which is 98.9%, and the 5-layer result is set to be 98.1%; while the other 4 datasets, such as AG News, Yahoo Answers, Yelp Review, and Amazon Review, are all optimized when set to 5 convolutional layers.

Therefore, too many convolutional layers may make features too complex, and too few convolutional layers may not extract good character-level text features or insufficiently extract text features, unlike other word-level-based models, a word-level model usually does not need so many convolutional layers, and usually only 2 to 3 layers are needed to extract good text features, but characters are different as processing objects, which is equivalent to firstly extracting a text feature similar to a word or a word and an affix by using 2 to 3 layers, then further extracting text features with stronger expressive power by using 2 to 3 layers, such as some combinations of the words or words and affixes, and finally reconstructing capsule vector text features for extracting information with spatial hierarchical relationship by using 2 convolutional layers plus features.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A character-level text classification method based on a capsule network is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

extracting text features of a character level by using the convolution layer;

reconstructing the text features to obtain capsule vectors;

2. The capsule network-based character-level text classification method of claim 1, characterized in that: before extracting the text features, constructing a character-level capsule network model, which comprises the following steps:

3. The capsule network-based character-level text classification method of claim 2, characterized in that: also include 2 convolutional layers for learning the spatial hierarchical relationship between features;

4. The capsule network-based character-level text classification method of claim 3, characterized in that: further comprising communicating the initial capsule layer with the digital capsule layer;

5. The capsule network-based character-level text classification method according to claim 1 or 4, characterized in that: extracting the text features comprises:

defining an original character level document D ═ T₁，T₂，…，T_nA document of c categories;

the document comprises n sentences T_d＝{i₁，i₂，…，i_L}，T_d(1≤d≤n)；

The sentence T_d＝{i₁，i₂，…，i_LContains L characters, the character length is L, the character i_L∈R^MExpressed by a word vector with M dimensions;

6. The capsule network-based character-level text classification method of claim 5, wherein: when the text features are extracted, input and output operations need to be carried out, wherein the input is an original text sentence T_d∈R^L×MThe input characteristic length is L, and the width is M;

each character is provided with a code of length M using a one-hot code.

7. The character-level text classification method based on the capsule network according to any one of claims 4 to 6, characterized in that: the output is a pass function

The obtained final classification result

8. The capsule network-based character-level text classification method of claim 7, wherein: the selection of the original text sentence comprises the following steps:

embedding the characters into M dimensions with the length of 1024;

and converting each sample data into a matrix with the size of 1024 × M, and transmitting the matrix into the character-level capsule network model.