CN110502633A - Network comment management method based on machine learning - Google Patents
Network comment management method based on machine learning Download PDFInfo
- Publication number
- CN110502633A CN110502633A CN201910655834.9A CN201910655834A CN110502633A CN 110502633 A CN110502633 A CN 110502633A CN 201910655834 A CN201910655834 A CN 201910655834A CN 110502633 A CN110502633 A CN 110502633A
- Authority
- CN
- China
- Prior art keywords
- comment
- network
- classification
- machine learning
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 14
- 238000007726 management method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000012795 verification Methods 0.000 claims description 14
- 241001269238 Data Species 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 230000002045 lasting effect Effects 0.000 abstract description 2
- 210000004027 cell Anatomy 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000005056 cell body Anatomy 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The network comment management method based on machine learning that the present invention relates to a kind of, network comment is managed by the method for machine learning, it is efficient, objective to have the characteristics that, and since the training to LSTM network is lasting, it can adapt to be adjusted in the variation of netspeak context, the accuracy for guaranteeing management, has the characteristics that flexibility is higher.
Description
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of network comment pipe based on machine learning
Reason method.
Background technique
Currently, the mode of manual examination and verification or keyword shielding is relied primarily in network comment management to complete.However, people
The mode limitation that there are subjectivities is strong, treatment effeciency is low of work audit, and keyword shield technology is then excessively inflexible, processing is not
Flexibly.
Summary of the invention
The present invention is to solve the prior art to be managed network comment by the way of manual examination and verification or keyword shielding
Existing subjectivity is strong, treatment effeciency is low, the inflexible technological deficiency of processing, provides a kind of net based on machine learning
Network comments on management method.
To realize the above goal of the invention, the technical solution adopted is that:
Network comment management method based on machine learning, comprising the following steps:
S1. a certain number of comment datas are filtered out, each comment data is marked by way of manual examination and verification
Note classification;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this correspondence
Every comment data is converted to Serial No. by relationship:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;It is embedded in table according to word, converts regular length for token
Vector;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer,
All input averageds are obtained new vector h by average pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each
Probability under given class, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network,
Then step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output carry out comment
Processing.
Preferably, the mark classification of the step S1 specifically: be labeled as 1 if manual examination and verification are judged as actively comment;
- 1 is labeled as if manual examination and verification are judged as negative reviews;0 is labeled as if manual examination and verification are judged as declarative sentence.
Preferably, detailed process is as follows for the step S5 progress sort operation:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
Preferably, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.
Preferably, the process of the step S1~S6 being trained to LSTM network periodically or non-periodically carries out, training
When using the comment data arrived be current time collected comment data.
Compared with prior art, the beneficial effects of the present invention are:
Method provided by the invention is managed network comment by the method for machine learning, and it is efficient, objective to have
The characteristics of, and due to the training to LSTM network be it is lasting, can adapt to be adjusted in the variation of netspeak context, protect
The accuracy for demonstrate,proving management, has the characteristics that flexibility is higher.
Detailed description of the invention
Fig. 1 is the flow diagram of method.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, the network comment management method based on machine learning the following steps are included:
S1. a certain number of comment datas are filtered out, each comment data is marked by way of manual examination and verification
Note classification;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this correspondence
Every comment data is converted to Serial No. by relationship:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;It is embedded in table according to word, converts regular length for token
Vector;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer,
All input averageds are obtained new vector h by average pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each
Probability under given class, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network,
Then step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output carry out comment
Processing.
In the present embodiment, 1 is labeled as if manual examination and verification are judged as actively comment;If manual examination and verification are judged as negative reviews
Then it is labeled as -1;0 is labeled as if manual examination and verification are judged as declarative sentence.
In the present embodiment, the step S5 carries out sort operation, and detailed process is as follows:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
In the present embodiment, the step S6 is updated using parameter of the RMSProp algorithm to LSTM network.
In the present embodiment, the process of the step S1~S6 being trained to LSTM network is periodically or non-periodically carried out,
Training when using the comment data arrived be current time collected comment data.
Core of the invention is that LSTM network is a kind of RNN (Recurrent Neural Network, recurrent neural net
Network) model.The neural network of the type on the basis of RNN can handle the input in time series, meanwhile, LSTM can be solved
The problem of gradient explosion and gradient disappear certainly in RNN training process, therefore can have in longer time series structure
Outstanding performance is used in a series of natural language processing problems such as sentiment analysis.
The state h that only one is transferred to next time series relative to RNNt, the transmission state of LSTM network has two
It is a, it is h respectivelytAnd ct, wherein htIt is the hidden state on each timing node, ctIt is cell state.
In cell body, the first step will be by forgetting that gate layer determines discarding degree to upper one layer of information.It is by following public affairs
What formula was realized, wherein σ is Sigmoid function, Wlayer, blayerIt is the weighted value of corresponding neural net layer, xtIt is training sample
Feature:
ft=σ (Wf·[ht-1,xt]+bf)
According to forgetting the data that gate layer obtains determine it is the degree retained, 1 representative is fully retained, and 0 representative is given up completely.
Next, network determines which type of new information is stored in cell state, including two parts:
it=σ (Wi·[ht-1,xt]+bi)
C′t=tanh (WC·[ht-1,xt]+bC)
It later, will new and old cell state:
Ct=ft×Ct-1+it×C′t
New cell state be by previous step cell state, forget door state, it、C′tIt codetermines.
Finally, by determine the step output valve.The output is based on present cell state:
ot=σ (Wo·[ht-1,xt]+bi)
ht=ot×tanh(Ct)
In the training process, the optimization method that the present embodiment uses is RMSProp algorithm, since neural network is all non-convex
Under the conditions of, and RMSProp algorithm effect under the conditions of non-convex is more excellent than other algorithms, which changes gradient and is accumulated as
The rolling average of exponential damping, to abandon distant history far in the past;Empirically, RMSProp is proved to effective and practical depth
Its specific operation process of learning network optimization algorithm is as follows:
S1: given learning rate ∈, rate of decay ρ, network initial parameter θ, δ be a small constant, general value 10-6。
S2: algorithm from training set up-sample m sample small lot and m sample { x (1), x (2) ... x (m) } and
Corresponding label y (i).
S3: gradient is calculated:
S4: accumulative squared gradient:
r←ρr+(1-ρ)g⊙g
S5: parameter updates:
S6: gradient updating is applied:
θ←θ+Δθ
S7: before not reaching stop condition, value S2 is jumped.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (5)
1. the network comment management method based on machine learning, it is characterised in that: the following steps are included:
S1. a certain number of comment datas are filtered out, each comment data is labeled point by way of manual examination and verification
Class;
S2. Chinese character all in Chinese character word table is uniquely mapped with a number respectively, then according to this corresponding relationship
Every comment data is converted into Serial No.:
Xi=(token1,token2…tokenSen_Length)
Token represents each number of composition comment data;According to word be embedded in table, by token be converted into regular length to
Amount;
S3. LSTM network is constructed, by obtained term vector according in sequence order input LSTM network;
S4. the output H (h1, h2 ..., hn) LSTM network in each time series obtained inputs an average pond layer, average
All input averageds are obtained new vector h by pond layer;
S5. the vector obtained for preceding networks carries out sort operation with Softmax classification method, obtains each set
Probability under classification, the maximum classification of select probability is as classification results and exports;
S6. the label marked based on the step S5 classification results exported and manually is updated the parameter of LSTM network, then
Step S1 is executed, until the training of LSTM network meets stop condition;
S7. the operation of step S1~S5 is executed to the comment that needs manage, the classification results according to output handle comment.
2. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1
Mark classification specifically: if manual examination and verification be judged as actively comment if be labeled as 1;If manual examination and verification are judged as negative reviews
It is labeled as -1;0 is labeled as if manual examination and verification are judged as declarative sentence.
3. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S5
Carrying out sort operation, detailed process is as follows:
Wherein, N is the number of classification;
S is the classification results vector of output, each component corresponds to the probability of each classification;
E is the vector for inputting Softmax classifier, eiFor each component of the vector.
4. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S6
It is updated using parameter of the RMSProp algorithm to LSTM network.
5. the network comment management method according to claim 1 based on machine learning, it is characterised in that: the step S1
The process of~S6 being trained to LSTM network periodically or non-periodically carries out, training when using the comment data arrived be it is current when
Carve collected comment data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910655834.9A CN110502633A (en) | 2019-07-19 | 2019-07-19 | Network comment management method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910655834.9A CN110502633A (en) | 2019-07-19 | 2019-07-19 | Network comment management method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110502633A true CN110502633A (en) | 2019-11-26 |
Family
ID=68586235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910655834.9A Pending CN110502633A (en) | 2019-07-19 | 2019-07-19 | Network comment management method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110502633A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550269A (en) * | 2015-12-10 | 2016-05-04 | 复旦大学 | Product comment analyzing method and system with learning supervising function |
CN106599933A (en) * | 2016-12-26 | 2017-04-26 | 哈尔滨工业大学 | Text emotion classification method based on the joint deep learning model |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN108108351A (en) * | 2017-12-05 | 2018-06-01 | 华南理工大学 | A kind of text sentiment classification method based on deep learning built-up pattern |
CN108460089A (en) * | 2018-01-23 | 2018-08-28 | 哈尔滨理工大学 | Diverse characteristics based on Attention neural networks merge Chinese Text Categorization |
CN108984724A (en) * | 2018-07-10 | 2018-12-11 | 凯尔博特信息科技(昆山)有限公司 | It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension |
CN109146166A (en) * | 2018-08-09 | 2019-01-04 | 南京安链数据科技有限公司 | A kind of personal share based on the marking of investor's content of the discussions slumps prediction model |
CN109697232A (en) * | 2018-12-28 | 2019-04-30 | 四川新网银行股份有限公司 | A kind of Chinese text sentiment analysis method based on deep learning |
CN110008338A (en) * | 2019-03-04 | 2019-07-12 | 华南理工大学 | A kind of electric business evaluation sentiment analysis method of fusion GAN and transfer learning |
-
2019
- 2019-07-19 CN CN201910655834.9A patent/CN110502633A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550269A (en) * | 2015-12-10 | 2016-05-04 | 复旦大学 | Product comment analyzing method and system with learning supervising function |
CN106599933A (en) * | 2016-12-26 | 2017-04-26 | 哈尔滨工业大学 | Text emotion classification method based on the joint deep learning model |
CN108108351A (en) * | 2017-12-05 | 2018-06-01 | 华南理工大学 | A kind of text sentiment classification method based on deep learning built-up pattern |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN108460089A (en) * | 2018-01-23 | 2018-08-28 | 哈尔滨理工大学 | Diverse characteristics based on Attention neural networks merge Chinese Text Categorization |
CN108984724A (en) * | 2018-07-10 | 2018-12-11 | 凯尔博特信息科技(昆山)有限公司 | It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension |
CN109146166A (en) * | 2018-08-09 | 2019-01-04 | 南京安链数据科技有限公司 | A kind of personal share based on the marking of investor's content of the discussions slumps prediction model |
CN109697232A (en) * | 2018-12-28 | 2019-04-30 | 四川新网银行股份有限公司 | A kind of Chinese text sentiment analysis method based on deep learning |
CN110008338A (en) * | 2019-03-04 | 2019-07-12 | 华南理工大学 | A kind of electric business evaluation sentiment analysis method of fusion GAN and transfer learning |
Non-Patent Citations (1)
Title |
---|
赵富 等: "融合词性的双注意力Bi-LSTM情感分析", 《计算机应用》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Training word embeddings for deep learning in biomedical text mining tasks | |
CN108763237A (en) | A kind of knowledge mapping embedding grammar based on attention mechanism | |
CN106095872A (en) | Answer sort method and device for Intelligent Answer System | |
CN103226741B (en) | Public supply mains tube explosion prediction method | |
CN108984745A (en) | A kind of neural network file classification method merging more knowledge mappings | |
CN106951783A (en) | A kind of Method for Masquerade Intrusion Detection and device based on deep neural network | |
CN107122479A (en) | A kind of user cipher conjecture system based on deep learning | |
CN106372058A (en) | Short text emotion factor extraction method and device based on deep learning | |
CN106453293A (en) | Network security situation prediction method based on improved BPNN (back propagation neural network) | |
CN109523021A (en) | A kind of dynamic network Structure Prediction Methods based on long memory network in short-term | |
CN108875809A (en) | The biomedical entity relationship classification method of joint attention mechanism and neural network | |
CN109101584A (en) | A kind of sentence classification improved method combining deep learning with mathematical analysis | |
Napoli et al. | An agent-driven semantical identifier using radial basis neural networks and reinforcement learning | |
CN108416460A (en) | Cyanobacterial bloom prediction technique based on the random depth confidence network model of multifactor sequential- | |
CN108596637A (en) | A kind of electric business service problem discovery system | |
CN115309931A (en) | Paper text classification method and system based on graph neural network | |
CN110688484B (en) | Microblog sensitive event speech detection method based on unbalanced Bayesian classification | |
Wang et al. | A new concept of deep reinforcement learning based augmented general tagging system | |
CN107273922A (en) | A kind of screening sample and weighing computation method learnt towards multi-source instance migration | |
Onose et al. | SC-UPB at the VarDial 2019 evaluation campaign: Moldavian vs. Romanian cross-dialect topic identification | |
CN117077671B (en) | Interactive data generation method and system | |
CN110502633A (en) | Network comment management method based on machine learning | |
Isaac et al. | A Conceptual Enhancement of LSTM Using Knowledge Distillation for Hate Speech Detection | |
Poli et al. | Port-Hamiltonian gradient flows | |
CN111028086A (en) | Enhanced index tracking method based on clustering and LSTM network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191126 |
|
RJ01 | Rejection of invention patent application after publication |