CN110110323A

CN110110323A - A kind of text sentiment classification method and device, computer readable storage medium

Info

Publication number: CN110110323A
Application number: CN201910285262.XA
Authority: CN
Inventors: 齐云飞; 陈栋
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2019-08-09
Anticipated expiration: 2039-04-10
Also published as: CN110110323B

Abstract

This application discloses a kind of text sentiment classification methods and device, computer readable storage medium, the method includes obtaining the sentence context of text by pre-training language model, the pre-training language model is used for the one or more words for predicting to cover at random in the text；The angle embeded matrix of random initializtion a*n, a are angle number, and n is that word is embedded in number of dimensions, and the angle embeded matrix of initialization and the sentence context are carried out to notice that force function calculates, obtain modified angle embeded matrix；Discriminant classification is carried out according to modified angle embeded matrix, obtains the emotion of text all angles.The application obtains the sentence context of text by pre-training language model and carries out noticing that force function is calculated by angle embeded matrix and sentence context, is capable of handling the sentiment analysis task of multi-angle multipolarity, and does not need plenty of time extraction feature.

Description

A kind of text sentiment classification method and device, computer readable storage medium

Technical field

This application involves but be not limited to natural language processing (Natural Language Processing, NLP) technology neck Domain more particularly to a kind of text sentiment classification method and device, computer readable storage medium.

Background technique

Sentiment analysis is vital task in NLP, also sometimes referred to as " opinion mining ", and the emotion excavation based on angle is more Fine-grained sentiment analysis, it can provide deeper opinion tendency.

Most of currently a popular emotional semantic classification is the feeling polarities for determining whole sentence or article, can not be accomplished for given One Duan Wenben therefrom determines to more fine granularity the feeling polarities of each angle.And the emotional semantic classification for being based partially on angle is to pass through sentence Method analysis, linguistic feature are extracted or manual definition part rule, this kind of mode need the plenty of time to extract feature, it is desirable that are opened Hair personnel have good linguistic base.

Summary of the invention

This application provides a kind of text sentiment classification methods and device, computer readable storage medium, are capable of handling more The sentiment analysis task of angle multipolarity, and do not need plenty of time extraction feature.

This application provides a kind of text sentiment classification methods, comprising:

The sentence context of text is obtained by pre-training language model, the pre-training language model is described for predicting The one or more words covered at random in text；

The angle embeded matrix of random initializtion a*n, a are angle number, and n is that word is embedded in number of dimensions, by the angle of initialization Degree embeded matrix and the sentence context carry out noticing that force function calculates, and obtain modified angle embeded matrix；

Discriminant classification is carried out according to modified angle embeded matrix, obtains the emotion of text all angles.

In a kind of exemplary embodiment, the pre-training language model includes embeding layer, feature extraction layer and prediction interval, Wherein:

The embeding layer is used for, and word each in text is mapped as term vector, and be each term vector embedded location vector, It exports after term vector and position vector are added up to the feature extraction layer；

The feature extraction layer is used for, and receives the output of embeding layer as a result, extracting the high dimensional feature of text, and export sentence Context is to the prediction interval；

The prediction interval is used for, based on the received sentence context, is predicted the word for covering position.

In a kind of exemplary embodiment, the embeding layer includes preceding to embeding layer and backward embeding layer, and the feature is taken out Taking layer includes preceding to feature extraction layer and backward feature extraction layer, in which:

The forward direction embeding layer is used for, and each word in the text on the left of cover position is mapped as term vector, and be each Term vector embedded location vector exports after term vector and position vector add up to the forward direction feature extraction layer；

The forward direction feature extraction layer is used for, to the output of embeding layer as a result, extracting the text covered on the left of position before receiving This high dimensional feature, and the sentence context on the left of cover position is exported to the prediction interval；

The backward embeding layer is used for, and each word in the text on the right side of cover position is mapped as term vector, and be each Term vector embedded location vector exports after term vector and position vector add up to the backward feature extraction layer；

The backward feature extraction layer is used for, to the output of embeding layer as a result, extracting the text covered on the right side of position after reception This high dimensional feature, and the sentence context on the right side of cover position is exported to the prediction interval；

The prediction interval is specifically used for, by the sentence of the forward direction feature extraction layer and the backward feature extraction layer output Context is overlapped, and is predicted according to stack result the word for covering position.

It is described to infuse the angle embeded matrix of initialization and the sentence context in a kind of exemplary embodiment Force function of anticipating calculates, comprising:

Wherein, Attention () is to pay attention to force function, and s is a sentence, and softmax () is normalization exponential function, Q For the angle embeded matrix, K is sentence context, and n is embedded in number of dimensions for institute's predicate.

Present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has one Or multiple programs, one or more of programs can be executed by one or more processor, to realize such as any of the above The step of text sentiment classification method described in item.

Present invention also provides a kind of text emotion sorters, including processor and memory, in which: the processor For executing the program stored in memory, the step of to realize text sentiment classification method as described in any of the above item.

Present invention also provides a kind of text emotion sorters, including context to obtain module, attention computing module With discriminant classification module, in which:

The context obtains module and is used for, and the sentence context of text is obtained by pre-training language model, described pre- Train language model is used for the one or more words for predicting to cover at random in the text；

The attention computing module is used for, the angle embeded matrix of random initializtion a*n, and a is angle number, and n is word It is embedded in number of dimensions, the angle embeded matrix of initialization and the sentence context are carried out to notice that force function calculates, corrected Angle embeded matrix；

The discriminant classification module is used for, and carries out discriminant classification according to modified angle embeded matrix, it is each to obtain text The emotion of angle.

In a kind of exemplary embodiment, the attention computing module by the angle embeded matrix of initialization and described Sentence context carries out noticing that force function calculates, comprising:

Compared with the relevant technologies, the text sentiment classification method and device of the application, computer readable storage medium pass through Pre-training language model obtains the sentence context of text and carries out attention letter by angle embeded matrix and sentence context Number calculates, and is capable of handling the sentiment analysis task of multi-angle multipolarity, and does not need plenty of time extraction feature.

Other features and advantage will illustrate in the following description, also, partly become from specification It obtains it is clear that being understood and implementing the application.Other advantages of the application can be by specification, claims And scheme described in attached drawing is achieved and obtained.

Detailed description of the invention

Attached drawing is used to provide the understanding to technical scheme, and constitutes part of specification, with the application's Embodiment is used to explain the technical solution of the application together, does not constitute the limitation to technical scheme.

Fig. 1 is a kind of flow diagram of text sentiment classification method of the embodiment of the present invention；

Fig. 2 is a kind of structural schematic diagram of text emotion sorter of the embodiment of the present invention.

Specific embodiment

This application describes multiple embodiments, but the description is exemplary, rather than restrictive, and for this It is readily apparent that can have more in the range of embodiments described herein includes for the those of ordinary skill in field More embodiments and implementation.Although many possible feature combinations are shown in the attached drawings, and in a specific embodiment It is discussed, but many other combinations of disclosed feature are also possible.Unless the feelings specially limited Other than condition, any feature or element of any embodiment can be with any other features or element knot in any other embodiment It closes and uses, or any other feature or the element in any other embodiment can be substituted.

The application includes and contemplates the combination with feature known to persons of ordinary skill in the art and element.The application is It can also combine with any general characteristics or element through disclosed embodiment, feature and element, be defined by the claims with being formed Unique scheme of the invention.Any feature or element of any embodiment can also be with features or member from other scheme of the invention Part combination, to form the unique scheme of the invention that another is defined by the claims.It will thus be appreciated that showing in this application Out and/or any feature of discussion can be realized individually or in any suitable combination.Therefore, in addition to according to appended right It is required that and its other than the limitation done of equivalent replacement, embodiment is not limited.Furthermore, it is possible in the guarantor of appended claims It carry out various modifications and changes in shield range.

In addition, method and/or process may be rendered as spy by specification when describing representative embodiment Fixed step sequence.However, in the degree of this method or process independent of the particular order of step described herein, this method Or process should not necessarily be limited by the step of particular order.As one of ordinary skill in the art will appreciate, other steps is suitable Sequence is also possible.Therefore, the particular order of step described in specification is not necessarily to be construed as limitations on claims.This Outside, the claim for this method and/or process should not necessarily be limited by the step of executing them in the order written, art technology Personnel are it can be readily appreciated that these can sequentially change, and still remain in the spirit and scope of the embodiment of the present application.

One text sentiment classification method of embodiment

As shown in Figure 1, a kind of text sentiment classification method according to an embodiment of the present invention, includes the following steps:

Step 101: the sentence context of text is obtained by pre-training language model, the pre-training language model is used for Predict the one or more words covered at random in the text；

It should be noted that deep learning has very strong ability to express, but its born disadvantage is exactly to need largely Ground marks training sample, and marking a large amount of high quality training samples is to take time very much and cost.If training one from the beginning A neural network, mark task are undoubtedly most important.But there is no a large amount of mark training samples in practice, also without when Between go to mark training sample manually in large quantities with energy, this just greatly limits the learning ability of model.The application borrows migration Study thoughts are indicated using the high dimensional feature that pre-training language model obtains text, later on the basis that this high dimensional feature indicates Upper modeling, and network is integrally finely adjusted.

In one exemplary embodiment, the pre-training language model includes embeding layer, feature extraction layer and prediction interval, In:

In one exemplary embodiment, the embeding layer includes preceding to embeding layer and backward embeding layer, the feature extraction Layer includes preceding to feature extraction layer and backward feature extraction layer, in which:

In one exemplary embodiment, the prediction interval includes the first full articulamentum and the first Softmax layers, in which:

First full articulamentum is used for, and receives the sentence of the rear forward direction feature extraction layer and the backward feature extraction layer output Sub- context, and result is exported to the first Softmax layers；

First Softmax layers be used for, receive the first full articulamentum output as a result, to input sentence in cover position word It is predicted.

Step 102: the angle embeded matrix of random initializtion a*n, a are angle number, and n is that word is embedded in number of dimensions, will be first The angle embeded matrix of beginningization and the sentence context carry out noticing that force function calculates, and obtain modified angle embeded matrix；

In one exemplary embodiment, described to pay attention to the angle embeded matrix of initialization and the sentence context Force function calculates, comprising:

Step 103: discriminant classification being carried out according to modified angle embeded matrix, obtains the emotion of text all angles.

In one exemplary embodiment, the step 103 specifically includes:

Modified angle embeded matrix is inputted into the second full articulamentum L (n, t), t indicates every kind of polar number, obtains a* T matrix of consequence；Then Softmax layers of a*t matrix of consequence input the 2nd is obtained into final prediction result again.

Two computer readable storage medium of embodiment

The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage Have one or more program, one or more of programs can be executed by one or more processor, with realize such as with The step of upper described in any item text sentiment classification methods.

Three text emotion sorter of embodiment

The embodiment of the invention also provides a kind of text emotion sorters, including processor and memory, in which: described Processor is for executing the program stored in memory, to realize the step of the text sentiment classification method as described in any of the above item Suddenly.

Example IV text emotion sorter

As shown in Fig. 2, the embodiment of the invention also provides a kind of text emotion sorter, including context obtains module 201, attention computing module 202 and discriminant classification module 203, in which:

The context obtains module 201 and is used for, and the sentence context of text is obtained by pre-training language model, described Pre-training language model is used for the one or more words for predicting to cover at random in the text；

The attention computing module 202 is used for, the angle embeded matrix of random initializtion a*n, and a is angle number, and n is Word is embedded in number of dimensions, and the angle embeded matrix of initialization and the sentence context are carried out to notice that force function calculates, repaired Positive angle embeded matrix；

The discriminant classification module 203 is used for, and carries out discriminant classification according to modified angle embeded matrix, it is each to obtain text The emotion of a angle.

In one exemplary embodiment, the attention computing module 202 by the angle embeded matrix of initialization and institute Sentence context is stated to carry out noticing that force function calculates, comprising:

Five text sentiment classification method of embodiment

After taking training corpus, it is necessary first to be labeled to training corpus, specific notation methods are as follows:

(1) angular type and number are determined.For example, taking car data to illustrate, our predetermined angular numbers are 5, angle Degree type is respectively as follows: interior trim, power, five major class of cost performance, oil consumption and comfort.

(2) every training sample is labeled according to each angle, i.e. every training sample need it is angled to institute all into Rower note, if not referring to that it is common for defaulting the angle polarity for a certain angle in training sample.Such as:

A) training sample 1: " starting meat, 2 gears are a little dilatory to 3 gears, and backstage space can manage it satisfied, can sit three People is not crowded, and appearance and interior trim are most satisfied, and handsome, oil consumption is general ".

B) training sample 2: " face value full marks, handfeel of keys is super good, and car has abnormal sound, and double clutch startings are slow, starts start and stop Function can not be closed permanently, be turned off manually every time, and not well, oil consumption is slightly higher, but cost performance or relatively high, after all this A price should not excessive demand it is too many ".

For two training samples as above, annotation results are as shown in table 1:

Training sample	Interior trim	Power	Cost performance	Oil consumption	Comfort
						Training sample 1	It is positive	Negative sense	Generally	Negative sense	Generally
Training sample 2	It is positive	Negative sense	It is positive	Negative sense	Generally

Table 1

(3) training sample screens, and it is all general training sample that it is angled, which to remove institute, accelerates model training speed.

Specific pre-training method is as follows:

I) unsupervised pre-training language sample:

In order to make language model that there is better feature extraction ability, we by random one covered in training sample or Multiple words, train language model go to be predicted.Such as:

Original training sample: car has abnormal sound, and ceiling also has abnormal sound, and double clutch startings are slow.

Pre-training sample: car has abnormal sound, and ceiling also has abnormal sound, and it is slow that bis- [MASK] close step.

Language model output: from

It should be noted that each word can may word for word be predicted by mask, language model.Language model can according to from Left-to-right (Left-to-Right) and from right to left (Right-to-Left), which learn context, to be indicated to predict by the word of mask.

II) language model defines:

In order to predict the correctly word by mask, language model needs to learn from the left and right sides context, and two parts are tied Structure is identical but parameter is different, later by the addition of vectors of two part study.Corresponding title left part is divided into forward table and shows network, Right half be known as after to network is indicated, two network structures are identical.It should be noted that all words are to preceding to expression net on the right of mask Network is invisible, and all words in the mask left side are to rear invisible to network.

Forward direction indicates that network consists of two parts, and first part is preceding to embeding layer, this layer will cover the text on the left of position Each word is mapped as term vector (Token Embedding) in this, and is each term vector embedded location vector (Position Embedding), using Token Embedding and Position Embedding accumulated result as the input of second part； Second part is characterized abstraction, layer, and the input of this layer is the output of first part as a result, this layer can be the volume of Transformer Code device (Encoder), convolutional neural networks (Convolutional Neural Networks, CNN) or Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) (shot and long term memory artificial neural network (Long-Short Term Memory, LSTM), GRU (Gated Recurrent Unit)) etc., this feature abstraction, layer can be superimposed multilayer to improve feature extraction ability (as long as input and output dimension is identical to be superimposed, and the n dimension output of first layer can be tieed up as the n of the second layer to be inputted).One In exemplary embodiment, the application second part uses the part Encoder of Transformer, and the number of plies uses 8 layers.

It should be noted that the Transformer model of Google is for machine translation task, Transformer earliest The slow disadvantage of training that RNN is most denounced by people is improved, is realized quickly simultaneously using from attention (Self-Attention) mechanism Row.And Transformer can increase to very deep depth, sufficiently excavation deep neural network (Deep Neural Networks, DNN) model characteristic, lift scheme accuracy rate.

It is backward to indicate that network consists of two parts, first part be after to embeding layer, this layer will cover the text on the right side of position Each word is mapped as term vector in this, and is each term vector embedded location vector, by Token Embedding and Position Input of the Embedding accumulated result as second part；Second part is characterized abstraction, layer, and the input of this layer is first As a result, this layer can be Encoder, CNN or RNN (LSTM, GRU) of Transformer etc., this feature is extracted for the output divided Layer can be superimposed multilayer to improve feature extraction ability.In one exemplary embodiment, the application second part uses The part Encoder of Transformer, the number of plies use 8 layers.

Part III inputs two sentence context additions after the right and left submodel exports sentence context again To full articulamentum, Softmax prediction is carried out for the word of position [MASK] in input sentence.

III) model training:

Pre-training language model is obtained according to the training of normal neuronal network model.

The application constructs angle embeded matrix (Aspect Embedding) on the basis of pre-training language model, and All outputs of language model are modeled in conjunction with attention mechanism (Attention Mechanism).Detailed process is as follows:

Angle embeded matrix is constructed according to the number of angle, for example above angle number is 5, so building 5*n ties up square Battle array V, wherein n is that word is embedded in number of dimensions, later by all outputs of the angle embeded matrix and pre-training model second part (i.e. sentence context) carries out noticing that force function calculates:

Wherein, Q is angle embeded matrix, and K is sentence context, and n is that word is embedded in number of dimensions.

The attention calculation that the application uses can be scaling dot product (Scaled Dot-Product) mode.Angle The every a line of embeded matrix all goes out modified angle embeded matrix according to above-mentioned equation calculation, later by modified angle embeded matrix Multi-tag classifier is inputted, which is a full articulamentum, and output dimension is angle polarity number.Such as: each angle Three kinds of polarity are exported, so the matrix that 5 angle output is 5*3, every a line indicate three polarity of an angle, pass through later Softmax selects optimal polarity.

The each angle of pre-training model goes to train using cross entropy as loss.

Specifically it is expressed as follows:

Assuming that the output of pre-training model is M, M is p*n matrix, and p indicates that sentence word number (in short has several words.Than Such as, in sentence, " I likes eating apple." in, p=7), n indicates that word is embedded in number of dimensions.Assuming that angle embeded matrix is A, A a*n Matrix, a indicate angle number.F indicate Attention function (can use Scaled Dot-Product mode), so M with Output is a*n matrix after A is input in F, passes through fully-connected network L (n, t) later, and t indicates every kind of polar number, obtains a* The matrix of consequence of t.T=3 in the application finally passes through Softmax layers, obtains final prediction result.

It will appreciated by the skilled person that whole or certain steps, system, dress in method disclosed hereinabove Functional module/unit in setting may be implemented as software, firmware, hardware and its combination appropriate.In hardware embodiment, Division between the functional module/unit referred in the above description not necessarily corresponds to the division of physical assemblies；For example, one Physical assemblies can have multiple functions or a function or step and can be executed by several physical assemblies cooperations.Certain groups Part or all components may be implemented as by processor, such as the software that digital signal processor or microprocessor execute, or by It is embodied as hardware, or is implemented as integrated circuit, such as specific integrated circuit.Such software can be distributed in computer-readable On medium, computer-readable medium may include computer storage medium (or non-transitory medium) and communication media (or temporarily Property medium).As known to a person of ordinary skill in the art, term computer storage medium is included in for storing information (such as Computer readable instructions, data structure, program module or other data) any method or technique in the volatibility implemented and non- Volatibility, removable and nonremovable medium.Computer storage medium include but is not limited to RAM, ROM, EEPROM, flash memory or its His memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storages, magnetic holder, tape, disk storage or other Magnetic memory apparatus or any other medium that can be used for storing desired information and can be accessed by a computer.This Outside, known to a person of ordinary skill in the art to be, communication media generally comprises computer readable instructions, data structure, program mould Other data in the modulated data signal of block or such as carrier wave or other transmission mechanisms etc, and may include any information Delivery media.

Claims

1. a kind of text sentiment classification method characterized by comprising

The sentence context of text is obtained by pre-training language model, the pre-training language model is for predicting the text In one or more words for covering at random；

The angle embeded matrix of random initializtion a*n, a are angle number, and n is that word is embedded in number of dimensions, and the angle of initialization is embedding Enter matrix and the sentence context carries out noticing that force function calculates, obtains modified angle embeded matrix；

2. text sentiment classification method according to claim 1, which is characterized in that the pre-training language model includes embedding Enter layer, feature extraction layer and prediction interval, in which:

The embeding layer is used for, and word each in text is mapped as term vector, and be each term vector embedded location vector, by word It exports after vector sum position vector is cumulative to the feature extraction layer；

The feature extraction layer is used for, and receives the output of embeding layer as a result, extracting the high dimensional feature of text, and export sentence or more Text is to the prediction interval；

3. text sentiment classification method according to claim 2, which is characterized in that the embeding layer includes preceding to embeding layer With backward embeding layer, the feature extraction layer includes preceding to feature extraction layer and backward feature extraction layer, in which:

The forward direction embeding layer is used for, and will be covered each word in text on the left of position and is mapped as term vector, and for each word to Embedded location vector is measured, is exported after term vector and position vector are added up to the forward direction feature extraction layer；

The forward direction feature extraction layer is used for, and the output before receiving to embeding layer covers the text on the left of position as a result, extracting High dimensional feature, and the sentence context on the left of cover position is exported to the prediction interval；

The backward embeding layer is used for, and will be covered each word in text on the right side of position and is mapped as term vector, and for each word to Embedded location vector is measured, is exported after term vector and position vector are added up to the backward feature extraction layer；

The backward feature extraction layer is used for, and the output after reception to embeding layer covers the text on the right side of position as a result, extracting High dimensional feature, and the sentence context on the right side of cover position is exported to the prediction interval；

The prediction interval is specifically used for, above and below the forward direction feature extraction layer and the sentence of the backward feature extraction layer output Text is overlapped, and is predicted according to stack result the word for covering position.

4. text sentiment classification method according to claim 1, which is characterized in that the angle by initialization is embedded in square Battle array and the sentence context carry out noticing that force function calculates, comprising:

Wherein, Attention () is to pay attention to force function, and s is a sentence, and softmax () is normalization exponential function, and Q is institute Angle embeded matrix is stated, K is the sentence context, and n is embedded in number of dimensions for institute's predicate.

5. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage have one or Multiple programs, one or more of programs can be executed by one or more processor, to realize such as claim 1 to power Benefit require any one of 4 described in text sentiment classification method the step of.

6. a kind of text emotion sorter, which is characterized in that including processor and memory, in which: the processor is used for The program stored in memory is executed, to realize the text emotion classification as described in any one of claim 1 to claim 4 The step of method.

7. a kind of text emotion sorter, which is characterized in that obtain module, attention computing module and classification including context Discrimination module, in which:

The context obtains module and is used for, and the sentence context of text, the pre-training are obtained by pre-training language model Language model is used for the one or more words for predicting to cover at random in the text；

The attention computing module is used for, the angle embeded matrix of random initializtion a*n, and a is angle number, and n is word insertion The angle embeded matrix of initialization and the sentence context are carried out noticing that force function calculates, obtain modified angle by number of dimensions Spend embeded matrix；

The discriminant classification module is used for, and is carried out discriminant classification according to modified angle embeded matrix, is obtained text all angles Emotion.

8. text emotion sorter according to claim 7, which is characterized in that the pre-training language model includes embedding Enter layer, feature extraction layer and prediction interval, in which:

9. text emotion sorter according to claim 8, which is characterized in that the embeding layer includes preceding to embeding layer With backward embeding layer, the feature extraction layer includes preceding to feature extraction layer and backward feature extraction layer, in which:

10. text emotion sorter according to claim 7, which is characterized in that the general of the attention computing module The angle embeded matrix of initialization and the sentence context carry out noticing that force function calculates, comprising: