CN110502633A - Network comment management method based on machine learning - Google Patents

Network comment management method based on machine learning Download PDF

Info

Publication number
CN110502633A
CN110502633A CN201910655834.9A CN201910655834A CN110502633A CN 110502633 A CN110502633 A CN 110502633A CN 201910655834 A CN201910655834 A CN 201910655834A CN 110502633 A CN110502633 A CN 110502633A
Authority
CN
China
Prior art keywords
comment
network
classification
machine learning
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910655834.9A
Other languages
Chinese (zh)
Inventor
丁甲
常会友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910655834.9A priority Critical patent/CN110502633A/en
Publication of CN110502633A publication Critical patent/CN110502633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The network comment management method based on machine learning that the present invention relates to a kind of, network comment is managed by the method for machine learning, it is efficient, objective to have the characteristics that, and since the training to LSTM network is lasting, it can adapt to be adjusted in the variation of netspeak context, the accuracy for guaranteeing management, has the characteristics that flexibility is higher.

Description

Network comment management method based on machine learning
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of network comment pipe based on machine learning Reason method.
Background technique
Currently, the mode of manual examination and verification or keyword shielding is relied primarily in network comment management to complete.However, people The mode limitation that there are subjectivities is strong, treatment effeciency is low of work audit, and keyword shield technology is then excessively inflexible, processing is not Flexibly.
Summary of the invention
The present invention is to solve the prior art to be managed network comment by the way of manual examination and verification or keyword shielding Existing subjectivity is strong, treatment effeciency is low, the inflexible technological deficiency of processing, provides a kind of net based on machine learning Network comments on management method.
To realize the above goal of the invention, the technical solution adopted is that:
Network comment management method based on machine learning, comprising the following steps:
S1. a certain number of comment datas are filtered out, each comment data is marked by way of manual examination and verification Note classification;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this correspondence Every comment data is converted to Serial No. by relationship:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;It is embedded in table according to word, converts regular length for token Vector;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, All input averageds are obtained new vector h by average pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each Probability under given class, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network, Then step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output carry out comment Processing.
Preferably, the mark classification of the step S1 specifically: be labeled as 1 if manual examination and verification are judged as actively comment; - 1 is labeled as if manual examination and verification are judged as negative reviews;0 is labeled as if manual examination and verification are judged as declarative sentence.
Preferably, detailed process is as follows for the step S5 progress sort operation:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
Preferably, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.
Preferably, the process of the step S1~S6 being trained to LSTM network periodically or non-periodically carries out, training When using the comment data arrived be current time collected comment data.
Compared with prior art, the beneficial effects of the present invention are:
Method provided by the invention is managed network comment by the method for machine learning, and it is efficient, objective to have The characteristics of, and due to the training to LSTM network be it is lasting, can adapt to be adjusted in the variation of netspeak context, protect The accuracy for demonstrate,proving management, has the characteristics that flexibility is higher.
Detailed description of the invention
Fig. 1 is the flow diagram of method.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, the network comment management method based on machine learning the following steps are included:
S1. a certain number of comment datas are filtered out, each comment data is marked by way of manual examination and verification Note classification;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this correspondence Every comment data is converted to Serial No. by relationship:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;It is embedded in table according to word, converts regular length for token Vector;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, All input averageds are obtained new vector h by average pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each Probability under given class, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network, Then step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output carry out comment Processing.
In the present embodiment, 1 is labeled as if manual examination and verification are judged as actively comment;If manual examination and verification are judged as negative reviews Then it is labeled as -1;0 is labeled as if manual examination and verification are judged as declarative sentence.
In the present embodiment, the step S5 carries out sort operation, and detailed process is as follows:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
In the present embodiment, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.
In the present embodiment, the process of the step S1~S6 being trained to LSTM network is periodically or non-periodically carried out, Training when using the comment data arrived be current time collected comment data.
Core of the invention is that LSTM network is a kind of RNN (Recurrent Neural Network, recurrent neural net Network) model.The neural network of the type on the basis of RNN can handle the input in time series, meanwhile, LSTM can be solved The problem of gradient explosion and gradient disappear certainly in RNN training process, therefore can have in longer time series structure Outstanding performance is used in a series of natural language processing problems such as sentiment analysis.
The state h that only one is transferred to next time series relative to RNNt, the transmission state of LSTM network has two It is a, it is h respectivelytAnd ct, wherein htIt is the hidden state on each timing node, ctIt is cell state.
In cell body, the first step will be by forgetting that gate layer determines discarding degree to upper one layer of information.It is by following public affairs What formula was realized, wherein σ is Sigmoid function, Wlayer, blayerIt is the weighted value of corresponding neural net layer, xtIt is training sample Feature:
ft=σ (Wf·[ht-1,xt]+bf)
According to forgetting the data that gate layer obtains determine it is the degree retained, 1 representative is fully retained, and 0 representative is given up completely.
Next, network determines which type of new information is stored in cell state, including two parts:
it=σ (Wi·[ht-1,xt]+bi)
C′t=tanh (WC·[ht-1,xt]+bC)
It later, will new and old cell state:
Ct=ft×Ct-1+it×C′t
New cell state be by previous step cell state, forget door state, it、C′tIt codetermines.
Finally, by determine the step output valve.The output is based on present cell state:
ot=σ (Wo·[ht-1,xt]+bi)
ht=ot×tanh(Ct)
In the training process, the optimization method that the present embodiment uses is RMSProp algorithm, since neural network is all non-convex Under the conditions of, and RMSProp algorithm effect under the conditions of non-convex is more excellent than other algorithms, which changes gradient and is accumulated as The rolling average of exponential damping, to abandon distant history far in the past;Empirically, RMSProp is proved to effective and practical depth Its specific operation process of learning network optimization algorithm is as follows:
S1: given learning rate ∈, rate of decay ρ, network initial parameter θ, δ be a small constant, general value 10-6
S2: algorithm from training set up-sample m sample small lot and m sample { x (1), x (2) ... x (m) } and Corresponding label y (i).
S3: gradient is calculated:
S4: accumulative squared gradient:
r←ρr+(1-ρ)g⊙g
S5: parameter updates:
S6: gradient updating is applied:
θ←θ+Δθ
S7: before not reaching stop condition, value S2 is jumped.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (5)

1. the network comment management method based on machine learning, it is characterised in that: the following steps are included:
S1. a certain number of comment datas are filtered out, each comment data is labeled point by way of manual examination and verification Class;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this corresponding relationship Every comment data is converted into Serial No.:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;According to word be embedded in table, by token be converted into regular length to Amount;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, average All input averageds are obtained new vector h by pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each set Probability under classification, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network, then Step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output handle comment.
2. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1 Mark classification specifically: if manual examination and verification be judged as actively comment if be labeled as 1;If manual examination and verification are judged as negative reviews It is labeled as -1;0 is labeled as if manual examination and verification are judged as declarative sentence.
3. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S5 Carrying out sort operation, detailed process is as follows:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
4. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S6 It is updated using parameter of the RMSProp algorithm to LSTM network.
5. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1 The process of~S6 being trained to LSTM network periodically or non-periodically carries out, training when using the comment data arrived be it is current when Carve collected comment data.
CN201910655834.9A 2019-07-19 2019-07-19 Network comment management method based on machine learning Pending CN110502633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910655834.9A CN110502633A (en) 2019-07-19 2019-07-19 Network comment management method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910655834.9A CN110502633A (en) 2019-07-19 2019-07-19 Network comment management method based on machine learning

Publications (1)

Publication Number Publication Date
CN110502633A true CN110502633A (en) 2019-11-26

Family

ID=68586235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910655834.9A Pending CN110502633A (en) 2019-07-19 2019-07-19 Network comment management method based on machine learning

Country Status (1)

Country Link
CN (1) CN110502633A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN106599933A (en) * 2016-12-26 2017-04-26 哈尔滨工业大学 Text emotion classification method based on the joint deep learning model
CN107944014A (en) * 2017-12-11 2018-04-20 河海大学 A kind of Chinese text sentiment analysis method based on deep learning
CN108108351A (en) * 2017-12-05 2018-06-01 华南理工大学 A kind of text sentiment classification method based on deep learning built-up pattern
CN108460089A (en) * 2018-01-23 2018-08-28 哈尔滨理工大学 Diverse characteristics based on Attention neural networks merge Chinese Text Categorization
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN109146166A (en) * 2018-08-09 2019-01-04 南京安链数据科技有限公司 A kind of personal share based on the marking of investor's content of the discussions slumps prediction model
CN109697232A (en) * 2018-12-28 2019-04-30 四川新网银行股份有限公司 A kind of Chinese text sentiment analysis method based on deep learning
CN110008338A (en) * 2019-03-04 2019-07-12 华南理工大学 A kind of electric business evaluation sentiment analysis method of fusion GAN and transfer learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN106599933A (en) * 2016-12-26 2017-04-26 哈尔滨工业大学 Text emotion classification method based on the joint deep learning model
CN108108351A (en) * 2017-12-05 2018-06-01 华南理工大学 A kind of text sentiment classification method based on deep learning built-up pattern
CN107944014A (en) * 2017-12-11 2018-04-20 河海大学 A kind of Chinese text sentiment analysis method based on deep learning
CN108460089A (en) * 2018-01-23 2018-08-28 哈尔滨理工大学 Diverse characteristics based on Attention neural networks merge Chinese Text Categorization
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN109146166A (en) * 2018-08-09 2019-01-04 南京安链数据科技有限公司 A kind of personal share based on the marking of investor's content of the discussions slumps prediction model
CN109697232A (en) * 2018-12-28 2019-04-30 四川新网银行股份有限公司 A kind of Chinese text sentiment analysis method based on deep learning
CN110008338A (en) * 2019-03-04 2019-07-12 华南理工大学 A kind of electric business evaluation sentiment analysis method of fusion GAN and transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵富 等: "融合词性的双注意力Bi-LSTM情感分析", 《计算机应用》 *

Similar Documents

Publication Publication Date Title
Jiang et al. Training word embeddings for deep learning in biomedical text mining tasks
CN108763237A (en) A kind of knowledge mapping embedding grammar based on attention mechanism
CN106095872A (en) Answer sort method and device for Intelligent Answer System
CN103226741B (en) Public supply mains tube explosion prediction method
CN108984745A (en) A kind of neural network file classification method merging more knowledge mappings
CN106951783A (en) A kind of Method for Masquerade Intrusion Detection and device based on deep neural network
CN107122479A (en) A kind of user cipher conjecture system based on deep learning
CN106372058A (en) Short text emotion factor extraction method and device based on deep learning
CN106453293A (en) Network security situation prediction method based on improved BPNN (back propagation neural network)
CN109523021A (en) A kind of dynamic network Structure Prediction Methods based on long memory network in short-term
CN108875809A (en) The biomedical entity relationship classification method of joint attention mechanism and neural network
CN109101584A (en) A kind of sentence classification improved method combining deep learning with mathematical analysis
Napoli et al. An agent-driven semantical identifier using radial basis neural networks and reinforcement learning
CN108416460A (en) Cyanobacterial bloom prediction technique based on the random depth confidence network model of multifactor sequential-
CN108596637A (en) A kind of electric business service problem discovery system
CN115309931A (en) Paper text classification method and system based on graph neural network
CN110688484B (en) Microblog sensitive event speech detection method based on unbalanced Bayesian classification
Wang et al. A new concept of deep reinforcement learning based augmented general tagging system
CN107273922A (en) A kind of screening sample and weighing computation method learnt towards multi-source instance migration
Onose et al. SC-UPB at the VarDial 2019 evaluation campaign: Moldavian vs. Romanian cross-dialect topic identification
CN117077671B (en) Interactive data generation method and system
CN110502633A (en) Network comment management method based on machine learning
Isaac et al. A Conceptual Enhancement of LSTM Using Knowledge Distillation for Hate Speech Detection
Poli et al. Port-Hamiltonian gradient flows
CN111028086A (en) Enhanced index tracking method based on clustering and LSTM network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191126

RJ01 Rejection of invention patent application after publication