CN110781304A

CN110781304A - Sentence coding method using word information clustering

Info

Publication number: CN110781304A
Application number: CN201911039124.XA
Authority: CN
Inventors: 曹杰; 郭翔; 王有权; 申冬琴; 李秀怡
Original assignee: Yunjing Business Intelligence Research Institute Nanjing Co Ltd; Nanjing University of Finance and Economics
Current assignee: Yunjing Business Intelligence Research Institute Nanjing Co Ltd; Nanjing University of Finance and Economics
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-02-11
Anticipated expiration: 2039-10-29
Also published as: CN110781304B

Abstract

The invention provides a sentence coding method using word information clustering. In one embodiment, mapping each word in a sentence sequence with a specific length into a word vector space, and obtaining a word vector of each word; acquiring a coding vector of each word vector and performing nonlinear extrusion on each coding vector to obtain a capsule; obtaining a plurality of capsules to form an original capsule layer, and extracting semantic information of words with specific semantic features from the original capsule layer by using a capsule protocol algorithm to form a first target capsule layer; and performing information conversion on the first target capsules in the first target capsule layer by utilizing a capsule protocol algorithm to form a second target capsule layer with the classified number. By adopting a capsule protocol algorithm, information is transmitted according to different requirements of the target capsule on the original capsule, longer sentence characteristics can be obtained, and the accuracy of sentence classification can be effectively improved.

Description

Sentence coding method using word information clustering

Technical Field

The invention relates to the technical field of information clustering, in particular to a sentence coding method utilizing word information clustering.

Background

Deep learning makes a major breakthrough in the natural language field by performing deep semantic modeling on a text, however, how to learn to express high-quality features is a great challenge, and from extracting sentence local sequence features by using n-garm convolution, extracting important features in a local sequence by using a maximum pooling layer, and then to modeling the text sequence by using RNN, convolution focuses more on extraction of local sequence features than RNN, but is affected by n-gram, and is not easy to capture longer sentence features, and RNN can capture longer sentence features, but extraction of sentence features is not as convolution.

Disclosure of Invention

In view of the above, embodiments of the present application provide a sentence encoding method using word information clustering.

In a first aspect, the present invention provides a sentence encoding method using word information clustering, including:

mapping each word in the sentence sequence with the specific length into a word vector space, and acquiring a word vector of each word;

acquiring a coding vector of each word vector and performing nonlinear extrusion on each coding vector to obtain a capsule;

obtaining a plurality of capsules to form an original capsule layer, extracting semantic information of words with specific semantic features from the original capsule layer by utilizing a capsule protocol algorithm, and forming a first target capsule layer;

and performing information conversion on the first target capsules in the first target capsule layer by utilizing a capsule protocol algorithm to form a second target capsule layer with the classified number.

Optionally, the obtaining of the encoding vector of each word in the word vector space includes: inputting the word vector of each word into a bi-directional LSTM (bi-LSTM) model, and respectively acquiring the sentence sequence information of forward propagation of the word vector

And back propagation sequence information

Then the two vectors are spliced to form the required coding vector h _i：

Thus, the vector output formed by the BiLSTM encoding is:

H＝[h ₁,h ₂,…h _L]。

optionally, the obtaining a plurality of capsule-forming raw capsule layers comprises:

P＝[p ₁,p ₂…p _L]

s _i＝σ(w _sp _i+b _s)

k _i＝tanh(w _kp _i+b _k)

u _i＝s _i·k _i

where P denotes the original set of capsules formed by the coding layers, P _iShowing the original capsule layeri capsules, w _sRepresenting contribution matrix parameters, b _sThe bias parameter, σ, represents the sigmod activation function, passes through the equation s ═ σ (w) _sp _i+b _s) Supply gate, w, forming the original capsule i _kA matrix of significant values representing the original capsule, b _kRepresents the offset value, and is expressed by the formula k ═ tanh (w) _kp _i+b _k) And obtaining an effective value of the original capsule i, and forming a value u which can be contributed by the capsule i through a formula u-s.

Optionally, the first target capsule comprises:

Y＝[y ₁,y ₂…y _m]

n _j＝σ(w _ny _j+b _n)

c _j＝tanh(w _cy _j+b _c)

v _j＝n _j·c _j

wherein Y represents a first set of target capsules, Y _jDenotes the jth capsule in the first target capsule layer, w _nRepresenting the parameters of the demand matrix, b _nThe bias parameter, σ, represents the sigmod activation function, and is given by the formula n ═ σ (w) _ny _j+b _n) A demand gate, w, forming a first target capsule j _cTabular state matrix parameters, b _cOffset parameter, via c ═ tanh (w) _cy _j+b _c) And forming a current state value of the first target capsule i, and forming a content value, namely v, required by the capsule j in the current state through the formula v-n-c.

Optionally, the extracting semantic information of words with specific semantic features from the initialized capsule layer as an original capsule layer by using a capsule protocol algorithm, and forming a first target capsule layer includes:

f _ij＝u _i·v _j

f _ij＝u _i·v _jrepresenting a similarity relationship between information available from the original capsule i and information required for the first target capsule j, using Representing the magnitude of its calculated similarity, F _ijNormalization by softmax function to form a _ijRepresenting the amount of information converted from the original capsule i to the first target capsule j; initializing the state value c of the first target capsule j _jAnd the value a absorbed from each capsule in the original capsule layer _ijThe new first target capsule state value is formed by addition.

Optionally, the method further comprises:

representing the probability of the content characterized by each second target capsule by the vector length of the second target capsule; calculating the vector length of each second target capsule by using the L2 norm; and determining the final class of each second target capsule according to the vector length of each second target capsule.

Optionally, the method further comprises: the textual losses of the capsule in the classification layer are calculated using a space loss function.

Optionally, the calculating the textual loss of the classification layer capsule using the interval loss function includes:

L _e＝T _emax(0,m ⁺-||v _e||) ²+λ(1-T _e)max(0,||v _e||-m ^-) ²

L _eis the loss value, T, of the e-th capsule of the classification layer _eFor indicating the function, the value is 1 or 0, when in class e, T _eIs 1, otherwise is 0, m ⁺＝0.9，λ＝0.5，m ^-＝0.1，m ⁺Is an upper bound, m ^-Is the lower bound. With total loss of individual capsules in separate layersThe sum of the losses.

In one embodiment, the method for coding sentences clustered by using word information utilizes a BILSTM network to code words in a sequence-based mode, and utilizes a capsule protocol algorithm to transmit information according to different requirements of a target capsule on an original capsule, namely, the characteristics of a high-level sentence capsule are formed according to the characteristics which can be provided by each word capsule. The sentence classification accuracy is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a sentence encoding method using word information clustering according to the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Inspired by the capsule network, the high-level capsule network is provided, an algorithm with a guiding function for the information transmission of the low-level capsule network is used for coding words in a sequence-based mode by using the BILSTM network, and then the provided capsule protocol algorithm is used for transmitting information according to different requirements of the high-level capsule on the low-level capsule, namely, the characteristics of the high-level capsule are formed according to the characteristics provided by each word capsule.

FIG. 1 is a flow chart of a sentence encoding method using word information clustering according to the present invention. As shown in fig. 1, comprises the steps of:

step S101: mapping each word in the sentence sequence with the specific length into a word vector space, and acquiring a word vector of each word;

at the word vector embedding level, for a given specific lengthSentence sequence S ═ w ₁,w ₂,w ₃…w _LEach of w _iAll are symbols, and one-hot representation is adopted, so that the direct relation between words cannot be calculated, and the direct relation cannot be directly applied to a neural network model, therefore, the first step is to map each word to a d-dimensional word vector space, so that even if the relation between the words is possessed, the words can be used as the input of the neural network model:

X＝[x ₁,x ₂,x ₃,…,x _L](1)

the word vectors of the word vector space are generated by random initialization.

Step S102: in the word vector space, obtaining the coding vector of each word vector and carrying out nonlinear extrusion on each coding vector to obtain a capsule;

in the word vector space, each x _iThe method is independent from other words in the sentence X, generally, for semantic understanding of a sentence, the dependency relationship presented by each word in the sentence X is needed, in order to obtain the dependency relationship between the words in the sentence, bi-directional LSTM (bilSTM) is adopted to carry out the operation of dividing each word X in the sentence X into two words _iInputting, respectively obtaining the sentence sequence information of forward propagation thereof And back propagation sequence information

Then the 2 vectors are spliced to form the required coding vector h in the coding layer _i

Thus, the vector output formed by the BiLSTM encoding is:

H＝[h ₁,h ₂,…h _L](5)

step S103: the method comprises the steps of obtaining a plurality of capsules to form an original capsule layer, extracting semantic information of words with specific semantic features from the original capsule layer by utilizing a capsule protocol algorithm, and forming a first target capsule layer.

H to be formed by the coding layer _iThrough which is passed

Non-linear extrusion as a capsule p _iForming an original capsule layer P ═ P ₁,p ₂…p _L]The coding layer makes each word in the sentence have a dependency relationship either directly or indirectly, and thus, each p _iAll have certain semantic information.

Since sentences often contain words that are not useful for the final task, these words also form useless semantic information. Since such useless semantic information weakens the information of important words, it is necessary to remove the useless information in the low-level capsule layer, and specifically, it is possible to remove the semantic information of words that are useless for the last task in the original capsule layer, that is, extract the semantic information of important words that can act on the last task from the original capsule layer to form the target capsule layer.

Semantic information of words with specific semantic features is extracted from the original capsule by adopting a capsule protocol algorithm to form a first target capsule. In particular, the original set of capsules P is formed by the coding layers:

P＝[p ₁,p ₂…p _L](7)

then forming the donor door s of the original capsule i _i：

s _i＝σ(w _sp _i+b _s) (8)

Wherein p is _iI capsule representing the original capsule layer, w _sRepresenting contribution matrix parameters, b _sThe bias parameter, σ, represents the sigmod activation function.

Further, a donor door k for forming the original capsule i _i：

k _i＝tanh(w _kp _i+b _k) (9)

Wherein, w _kA matrix of significant values representing the original capsule, b _kIndicating the offset value.

Supply s of original capsules i _iAnd the donor door k of the original capsule i _iMultiplication by element position forms the contributory value u of capsule i:

u _i＝s _i·k _i(10)

for a first target capsule, randomly generating a first target capsule set Y in an initial state

Y＝[y ₁,y ₂…y _m](11)

Then the demand gate n of the first target capsule j is formed _j：

n _j＝σ(w _ny _j+b _n) (12)

Wherein, y _jDenotes the jth capsule in the first target capsule layer, w _nRepresenting the parameters of the demand matrix, b _nThe bias parameter, σ, represents the sigmod activation function.

Further, a current state value c of the first target capsule i is acquired _j：

c _j＝tanh(w _cy _j+b _c) (13)

Wherein, w _cTabular state matrix parameters, b _cA bias parameter.

The demand gate n for the first target capsule j _jAnd the current state value c of the first target capsule i _jAnd multiplying according to the element positions to obtain a content value v required by the first target capsule j in the current state.

v _j＝n _j·c _j(14)

Multiplying the formula (10) and the formula (14) to obtain an information conversion function from the original capsule to the first target capsule:

f _ij＝u _i·v _j(15)

wherein the formula (15) is that the original capsule contributory value is multiplied by the content of the first target capsule according to the element position, and represents the similarity relationship between the information donated by the original capsule i and the information required by the first target capsule j.

Further, the similarity degree calculated in the formula (15) is calculated,

f is to be _ijNormalization by softmax function to form a _ijIndicating the amount of information transferred between the original capsule i to the first target capsule j,

finally, the initialized state value c of the first target capsule j is calculated _jAnd the value a absorbed from each capsule in the original capsule layer _ijThe new first target capsule state value is formed by addition.

S104: performing information conversion on a first target capsule in the first target capsule layer by utilizing a capsule protocol algorithm to form a second target capsule layer with classified number;

in one possible embodiment, the m raw capsules are subjected to a first capsule protocol algorithm to form n first target capsules. And secondly, using a capsule protocol algorithm for n first target capsules to form a second target capsule layer with L classifications, wherein m, n and L are natural numbers which are more than or equal to 1.

In one possible embodiment, the number of classifications L is set in advance.

In the classification layer, the vector length of each second target capsule is calculated, and the vector length of each second target capsule is used for representing the probability of the content characterized by the second target capsule. And determining the class of the second target capsule according to the vector length of the second target capsule. Specifically, the vector length of each second target capsule is calculated using the L2 norm, and the second target capsule is assigned to the class with the largest vector length.

In one possible embodiment, the loss of the text of each capsule in the classification layer is calculated by using a spacing loss function,

L _e＝T _emax(0,m ⁺-||v _e||) ²+λ(1-T _e)max(0,||v _e||-m ^-) ²(19)

wherein L is _eIs the loss value, T, of the e-th capsule of the classification layer _eFor indicating the function, the value is 1 or 0, when in class e, T _eIs 1, otherwise is 0, m ⁺＝0.9，λ＝0.5，m ^-＝0.1，m ⁺Is an upper bound, m ^-Is the lower bound. The total loss is the sum of the losses of the individual capsules of the classification layer.

In one possible embodiment, the model effect proposed by the present application was evaluated by performing experiments on 3 common data sets.

The 3 common data sets include:

subj is a subjective data set, and the task is to classify sentences subjectively or objectively;

TREC, TREC query dataset problem dataset, task to classify a problem into 6 categories (about people, location, numerical information, etc.);

AG' news: classifying news topics;

the model proposed by the present application was trained as described in table 1 using the Subj, TREC, AG' news data sets:

TABLE 1

Data set	C (number of classification)	I (sentence length)	train	test
					Subj	2	23	9000	1000
TREC	6	10	5452	500
					AG’S	4	233	120000	7600

Using the trained model, 3 public data sets (Subj, TREC, AG' news) were tested and the results shown in table 1 were obtained:

TABLE 2

Data set	Accuracy rate
		TREC	98.1％
AG’S	91.4％
		Subj	91.43％

Experiments were performed on 3 public data sets (Subj, TREC, AG' news) using other data models, with results as shown in table 3:

TABLE 3

The classification accuracy of the same common data set is compared with the classification accuracy of the model in table 3, so that the accuracy of the model provided by the invention application is higher.

It should be noted that the model proposed in the present application is not limited to the classification of 3 common data sets (Subj, TREC, AG' news) from experiments performed in the examples of the application. The 3 common data sets (Subj, TREC, AG' news) of the experiments performed in the examples of the application are only one specific implementation in the examples of the application.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A sentence encoding method using word information clustering, comprising:

obtaining a plurality of capsules to form an original capsule layer, and extracting semantic information of words with specific semantic features from the original capsule layer by using a capsule protocol algorithm to form a first target capsule layer;

2. The method of claim 1, wherein obtaining the code vector of each word in the word vector space comprises: inputting the word vector of each word into a bi-directional LSTM (bi-LSTM) model, and respectively acquiring the sentence sequence information of forward propagation of the word vector

And back propagation sequence information

Then the two vectors are spliced to form the required coding vector h _i：

Thus, the vector output formed by the BiLSTM encoding is:

H＝[h ₁,h ₂,…h _L]。

3. the method of claim 1, wherein said obtaining a plurality of capsule forming raw capsule layers comprises:

P＝[p ₁,p ₂…p _L]

s _i＝σ(w _sp _i+b _s)

k _i＝tanh(w _kp _i+b _k)

u _i＝s _i·k _i

where P denotes the original set of capsules formed by the coding layers, P _iI capsule representing the original capsule layer, w _sRepresenting contribution matrix parameters, b _sThe bias parameter, σ, represents the sigmod activation function, passes through the equation s ═ σ (w) _sp _i+b _s) Supply gate, w, forming the original capsule i _kA matrix of significant values representing the original capsule, b _kRepresents the offset value, and is expressed by the formula k ═ tanh (w) _kp _i+b _k) And obtaining an effective value of the original capsule i, and forming a value u which can be contributed by the capsule i through a formula u-s.

4. The method of claim 1, wherein the first target capsule comprises:

Y＝[y ₁,y ₂…y _m]

n _j＝σ(w _ny _j+b _n)

c _j＝tanh(w _cy _j+b _c)

v _j＝n _j·c _j

5. The method of claim 1, wherein extracting semantic information of words with specific semantic features from the original capsule layer using a capsule protocol algorithm to form a first target capsule layer comprises:

f _ij＝u _i·v _j

f _ij＝u _i·v _jrepresenting a similarity relationship between information available from the original capsule i and information required for the first target capsule j, using

Representing the magnitude of its calculated similarity, F _ijNormalization by softmax function to form a _ijRepresenting the amount of information converted from the original capsule i to the first target capsule j; to be the first eyeInitial state value c of target capsule j _jAnd the value a absorbed from each capsule in the original capsule layer _ijThe new first target capsule state value is formed by addition.

6. The method of claim 1, further comprising:

7. The method of claim 1, further comprising: the textual losses of the capsule in the classification layer are calculated using a space loss function.

8. The method of claim 7, wherein calculating the textual losses of the classified layered capsule using a spacing loss function comprises:

L _e＝T _emax(0,m ⁺-||v _e||) ²+λ(1-T _e)max(0,||v _e||-m ^-) ²

L _eis the loss value, T, of the e-th capsule of the classification layer _eFor indicating the function, the value is 1 or 0, when in class e, T _eIs 1, otherwise is 0, m ⁺＝0.9，λ＝0.5，m ^-＝0.1，m ⁺Is an upper bound, m ^-Is the lower bound. The total loss is the sum of the losses of the individual capsules of the classification layer.