CN110502633A

CN110502633A - Network comment management method based on machine learning

Info

Publication number: CN110502633A
Application number: CN201910655834.9A
Authority: CN
Inventors: 丁甲; 常会友
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-11-26

Abstract

The network comment management method based on machine learning that the present invention relates to a kind of, network comment is managed by the method for machine learning, it is efficient, objective to have the characteristics that, and since the training to LSTM network is lasting, it can adapt to be adjusted in the variation of netspeak context, the accuracy for guaranteeing management, has the characteristics that flexibility is higher.

Description

Network comment management method based on machine learning

Technical field

The present invention relates to technical field of data processing, more particularly, to a kind of network comment pipe based on machine learning Reason method.

Background technique

Currently, the mode of manual examination and verification or keyword shielding is relied primarily in network comment management to complete.However, people The mode limitation that there are subjectivities is strong, treatment effeciency is low of work audit, and keyword shield technology is then excessively inflexible, processing is not Flexibly.

Summary of the invention

The present invention is to solve the prior art to be managed network comment by the way of manual examination and verification or keyword shielding Existing subjectivity is strong, treatment effeciency is low, the inflexible technological deficiency of processing, provides a kind of net based on machine learning Network comments on management method.

To realize the above goal of the invention, the technical solution adopted is that:

Network comment management method based on machine learning, comprising the following steps:

S1. a certain number of comment datas are filtered out, each comment data is marked by way of manual examination and verification Note classification；

S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this correspondence Every comment data is converted to Serial No. by relationship:

X_i=(token₁,token₂…token_{Sen_Length})

Token represents each number of composition comment data；It is embedded in table according to word, converts regular length for token Vector；

S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network；

S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, All input averageds are obtained new vector h by average pond layer；

S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each Probability under given class, the maximum classification of select probability is as classification results and exports；

S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network, Then step S1 is executed, until the training of LSTM network meets stop condition；

S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output carry out comment Processing.

Preferably, the mark classification of the step S1 specifically: be labeled as 1 if manual examination and verification are judged as actively comment； - 1 is labeled as if manual examination and verification are judged as negative reviews；0 is labeled as if manual examination and verification are judged as declarative sentence.

Preferably, detailed process is as follows for the step S5 progress sort operation:

Wherein, N is the number of classification；

S is the classification results vector of output, each component corresponds to the probability of each classification；

E is the vector for inputting Softmax classifier, eⁱFor each component of the vector.

Preferably, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.

Preferably, the process of the step S1~S6 being trained to LSTM network periodically or non-periodically carries out, training When using the comment data arrived be current time collected comment data.

Compared with prior art, the beneficial effects of the present invention are:

Method provided by the invention is managed network comment by the method for machine learning, and it is efficient, objective to have The characteristics of, and due to the training to LSTM network be it is lasting, can adapt to be adjusted in the variation of netspeak context, protect The accuracy for demonstrate,proving management, has the characteristics that flexibility is higher.

Detailed description of the invention

Fig. 1 is the flow diagram of method.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；

Below in conjunction with drawings and examples, the present invention is further elaborated.

Embodiment 1

As shown in Figure 1, the network comment management method based on machine learning the following steps are included:

X_i=(token₁,token₂…token_{Sen_Length})

In the present embodiment, 1 is labeled as if manual examination and verification are judged as actively comment；If manual examination and verification are judged as negative reviews Then it is labeled as -1；0 is labeled as if manual examination and verification are judged as declarative sentence.

In the present embodiment, the step S5 carries out sort operation, and detailed process is as follows:

Wherein, N is the number of classification；

In the present embodiment, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.

In the present embodiment, the process of the step S1~S6 being trained to LSTM network is periodically or non-periodically carried out, Training when using the comment data arrived be current time collected comment data.

Core of the invention is that LSTM network is a kind of RNN (Recurrent Neural Network, recurrent neural net Network) model.The neural network of the type on the basis of RNN can handle the input in time series, meanwhile, LSTM can be solved The problem of gradient explosion and gradient disappear certainly in RNN training process, therefore can have in longer time series structure Outstanding performance is used in a series of natural language processing problems such as sentiment analysis.

The state h that only one is transferred to next time series relative to RNN_t, the transmission state of LSTM network has two It is a, it is h respectively_tAnd c_t, wherein h_tIt is the hidden state on each timing node, c_tIt is cell state.

In cell body, the first step will be by forgetting that gate layer determines discarding degree to upper one layer of information.It is by following public affairs What formula was realized, wherein σ is Sigmoid function, W_layer, b_layerIt is the weighted value of corresponding neural net layer, x_tIt is training sample Feature:

f_t=σ (W_f·[h_t-1,x_t]+b_f)

According to forgetting the data that gate layer obtains determine it is the degree retained, 1 representative is fully retained, and 0 representative is given up completely.

Next, network determines which type of new information is stored in cell state, including two parts:

i_t=σ (W_i·[h_t-1,x_t]+b_i)

C′_t=tanh (W_C·[h_t-1,x_t]+b_C)

It later, will new and old cell state:

C_t=f_t×C_t-1+i_t×C′_t

New cell state be by previous step cell state, forget door state, i_t、C′_tIt codetermines.

Finally, by determine the step output valve.The output is based on present cell state:

o_t=σ (W_o·[h_t-1,x_t]+b_i)

h_t=o_t×tanh(C_t)

In the training process, the optimization method that the present embodiment uses is RMSProp algorithm, since neural network is all non-convex Under the conditions of, and RMSProp algorithm effect under the conditions of non-convex is more excellent than other algorithms, which changes gradient and is accumulated as The rolling average of exponential damping, to abandon distant history far in the past；Empirically, RMSProp is proved to effective and practical depth Its specific operation process of learning network optimization algorithm is as follows:

S1: given learning rate ∈, rate of decay ρ, network initial parameter θ, δ be a small constant, general value 10^-6。

S2: algorithm from training set up-sample m sample small lot and m sample { x (1), x (2) ... x (m) } and Corresponding label y (i).

S3: gradient is calculated:

S4: accumulative squared gradient:

r←ρr+(1-ρ)g⊙g

S5: parameter updates:

S6: gradient updating is applied:

θ←θ+Δθ

S7: before not reaching stop condition, value S2 is jumped.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. the network comment management method based on machine learning, it is characterised in that: the following steps are included:

S1. a certain number of comment datas are filtered out, each comment data is labeled point by way of manual examination and verification Class；

S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this corresponding relationship Every comment data is converted into Serial No.:

X_i=(token₁,token₂…token_{Sen_Length})

Token represents each number of composition comment data；According to word be embedded in table, by token be converted into regular length to Amount；

S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, average All input averageds are obtained new vector h by pond layer；

S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each set Probability under classification, the maximum classification of select probability is as classification results and exports；

S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output handle comment.

2. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1 Mark classification specifically: if manual examination and verification be judged as actively comment if be labeled as 1；If manual examination and verification are judged as negative reviews It is labeled as -1；0 is labeled as if manual examination and verification are judged as declarative sentence.

3. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S5 Carrying out sort operation, detailed process is as follows:

Wherein, N is the number of classification；

4. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S6 It is updated using parameter of the RMSProp algorithm to LSTM network.

5. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1 The process of~S6 being trained to LSTM network periodically or non-periodically carries out, training when using the comment data arrived be it is current when Carve collected comment data.