CN107544961A - A kind of sentiment analysis method, equipment and its storage device of social media comment - Google Patents
A kind of sentiment analysis method, equipment and its storage device of social media comment Download PDFInfo
- Publication number
- CN107544961A CN107544961A CN201710756607.6A CN201710756607A CN107544961A CN 107544961 A CN107544961 A CN 107544961A CN 201710756607 A CN201710756607 A CN 201710756607A CN 107544961 A CN107544961 A CN 107544961A
- Authority
- CN
- China
- Prior art keywords
- comment
- social media
- sentiment analysis
- designated
- analysis method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides sentiment analysis method, equipment and its storage device of a kind of social media comment, methods described includes step:User comment information is obtained with specific program;Using database processing obtain user comment information and the information is divided into training set and test set;Training set and test set are pre-processed respectively and extract Feature Words;Emotional semantic classification grade and threshold value are set, are trained to obtain grader using bayes method combined training collection;To test set classify simultaneously output category result with grader;Delete the comment that emotion classification grade in classification results is less than threshold value.A kind of the sentiment analysis equipment and storage device of social media comment, for realizing a kind of sentiment analysis method of described social media comment.The present invention can have found the user comment that there is menace to be oriented in social platform in time, and menace user comment is quickly and accurately handled.
Description
Technical field
The present invention relates to network information processing field, and in particular to a kind of sentiment analysis method of social media comment, sets
Standby and its storage device.
Background technology
In news category social media, user often leaves the comment of oneself with regard to some hot tickets, such as during leading portion
Between " Sa De " event, current India's event etc..But may exist in the comment information of user and mislead the public or there is prestige
The content of side of body property, if similar comment data is retained for a long time on social media platform, may cause unnecessary public opinion
It is panic.Therefore, there is menace or misleading content in user comment content on monitoring social media platform in time, and to this
A little contents, which carry out quickly and accurately processing, just turns into urgent problem to be solved.
The content of the invention
In order to solve the above problems, the invention provides a kind of social media comment sentiment analysis method, equipment and its
Storage device, pending data information is gathered by using Python crawlers first, then in conjunction with MySQL database to adopting
The data collected are pre-processed, and are finally trained grader using bayesian theory, can effectively be solved the above problems.
Technical scheme provided by the invention is:A kind of sentiment analysis method of social media comment, methods described include step
Suddenly:User comment information is obtained with specific program;Using database processing obtain user comment information and the information is divided into
Training set and test set;The training set and test set are pre-processed respectively and extract Feature Words;Emotional semantic classification etc. is set
Level and threshold value, are trained to obtain grader using bayes method combined training collection;With the grader to entering in test set
Row classification and output category result;Delete the comment that emotion classification grade in classification results is less than threshold value.A kind of storage device, institute
State storage device store instruction and data are used for the sentiment analysis method for realizing that a kind of social media is commented on.A kind of social matchmaker
The sentiment analysis equipment of body comment, the equipment include processor and the storage device;The processor loads and performs institute
State the instruction in storage device and data are used for the sentiment analysis method for realizing that a kind of described social media is commented on.
The beneficial effects of the invention are as follows:The invention provides a kind of social media comment sentiment analysis method, equipment and
Its storage device, the user comment that there is menace to be oriented in social platform can be found in time.At the same time it can also accomplish sending out
Now menace user comment is quickly and accurately handled afterwards, and the ID of pinpoint threat comment.
Brief description of the drawings
Fig. 1 is the overall flow figure of the sentiment analysis method that social media is commented in the embodiment of the present invention;
Fig. 2 is that training set and test set of the present invention pre-process and extract Feature Words flow chart;
Fig. 3 is the classifier training schematic flow sheet of the embodiment of the present invention;
Fig. 4 is the hardware device operating diagram of the embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is further described, the particular technique details hereinafter mentioned, such as:Method, equipment etc., are only better understood from reader
Technical scheme, does not represent that present invention is limited only by following ins and outs.
The embodiment provides a kind of social media comment sentiment analysis method, equipment and its storage device,
Passing through will.Referring to Fig. 1, Fig. 1 is the overall flow figure of the sentiment analysis method that social media is commented in the embodiment of the present invention, institute
State the sentiment analysis equipment that method is commented on by a kind of social media and realize that specific steps include:
S101:User comment information is obtained with specific program;The specific program is Python crawlers, described specific
Program obtains the server address that social media is used to store comment;Set the temperature rank threshold of media event;According to described
Temperature rank threshold obtains comment;The comment obtained by theme of news classification storage.
S102:Using database processing obtain user comment information and the information is divided into described in training set and test set
Database is MySQL database;The MySQL database is divided into 8 fields, is respectively:Comment, which obtains, to be thumbed up number and is designated as
Numofzan, comment deliver that the time is designated as createtime, user name is designated as username, ID is designated as userid, this is commented
By the number commented on be designated as replycount, comment content is designated as commenttext, theme of news ID is designated as group_id and
The ID of comment is designated as onlyid;The onlyid is the unique mark of comment;Using SQL statement by the user comment of acquisition
Comment data carries out deduplication operation;Comment data after duplicate removal is designated as comment_nonrepetitive;By the comment_
Nonrepetitive points are training set and test set.
S103:The training set and test set are pre-processed respectively and extract Feature Words.
S104:Emotional semantic classification grade and threshold value are set, is trained and is classified using bayes method combined training collection
Device.
S105:With the grader to the simultaneously output category result that carries out classifying in test set.
S106:Delete the comment that emotion classification grade in classification results is less than threshold value.
Referring to Fig. 2, Fig. 2 is that training set and test set of the present invention pre-process and extract Feature Words flow chart, is specifically included:
S201:The comment content of every label information commented on and forwarding people in training set are removed, only retains the ID institutes
The comment of work.
S202:Comment after above-mentioned steps are handled is designated as comment_personal.
S203:Retain the onlyid fields of every comment data.
S204:Efficient word figure scanning is realized based on Trie tree constructions.
S205:Chinese character is all into word situation in generation comment.
S206:Form directed acyclic graph (DAG) and be designated as comment_jieba.
Above-mentioned S204~S206 is to carry out jieba participles to comment_personal.
S207:The stop words in vocabulary stopword removals comment_jieba is disabled using Harbin Institute of Technology, result is designated as
comment_stopword。
S208:Count degree adverb, the relative position of negative word in comment_stopword.
S209:If negative word, before degree adverb, the weights of the negative word are 0.5 times of former weights.
S210:If negative word, behind degree adverb, the weights of the negative word are 2 times of former weights.
S211:Word in commenting on wall scroll sorts from big to small according to weights.
S212:The Feature Words that the larger preceding 7 groups of words of weights are commented on as this are chosen, are designated as wi, i=1 ... 7.
Referring to Fig. 3, Fig. 3 is the classifier training schematic flow sheet of the embodiment of the present invention, is specifically included:
S301:Determine category set C={ C1:It is extremely negative, C2:Negative sense, C3:It is relatively negative, C4:Neutrality, C5:It is positive }, and set
Threshold value is C1.
S302:Statistics obtains the conditional probability of lower each Feature Words of all categories:
P(w1|C1), P (w2|C1)…P(w7|C1)…P(w1|C2)…P(w7|C2)…P(w1|C5)…P(w7|C5).For spy
Levy comment of the word less than 7, with NULL complementary features words, and define P (NULL | Ci)=1.
S303:Count probability of all categories in training set:P(C1), P (C2), P (C3), P (C4) and P (C5)。
S304:Comment X is divided into CiThe standard of class is P (Ci| X)=max { P (C1|X),P(C2|X)…P(C5|X)}.If comment
The classification results of opinion are C1, then lookup matching is carried out according to onlyid, and delete this comment in time.
S305:P (Ci | X) is calculated using Bayes' theorem.It is conditional sampling to set between each Feature Words, then
S306:Grader structure is completed using above-mentioned steps to every comment in training set.
Referring to Fig. 4, Fig. 4 is the hardware device operating diagram of the embodiment of the present invention, and the hardware device specifically includes:One
Sentiment analysis equipment 401, processor 402 and the storage device 403 of kind social media comment.
A kind of sentiment analysis equipment 401 of social media comment:A kind of sentiment analysis equipment of the social media comment
401 realize a kind of sentiment analysis method of social media comment.
Processor 402:The processor 402 load and perform the instruction in the storage device 403 and data be used for it is real
A kind of sentiment analysis method of existing described social media comment.
Storage device 403:The store instruction of storage device 403 and data;The storage device 403 is described for realizing
A kind of social media comment sentiment analysis method.
By performing embodiments of the invention, all technical characteristics in the claims in the present invention are obtained for detailed explain
State.
Prior art is different from, the embodiment provides a kind of sentiment analysis method of social media comment, is set
Standby and its storage device, by inciting somebody to action, and effectively.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (10)
1. a kind of sentiment analysis method of social media comment, the sentiment analysis equipment that methods described is commented on by a kind of social media
Realize, it is characterised in that:Comprise the following steps:User comment information is obtained with specific program;Obtained using database processing
The information is simultaneously divided into training set and test set by user comment information;The training set and test set are pre-processed simultaneously respectively
Extract Feature Words;Emotional semantic classification grade and threshold value are set, are trained to obtain grader using bayes method combined training collection;
To test set classify simultaneously output category result with the grader;Delete emotion classification grade in classification results and be less than threshold value
Comment.
A kind of 2. sentiment analysis method of social media comment as claimed in claim 1, it is characterised in that:The specific journey
The step of sequence acquisition user comment information, specifically includes:The specific program is Python crawlers, and the specific program obtains
Take social media be used for store comment on server address;Set the temperature rank threshold of media event;Arranged according to the temperature
Name threshold value obtains comment;The comment obtained by theme of news classification storage.
A kind of 3. sentiment analysis method of social media comment as claimed in claim 1, it is characterised in that:It is described to use data
The information is simultaneously divided into the specific steps of training set and test set and included by user comment information that storehouse processing obtains:The database
For MySQL database;The MySQL database is divided into 8 fields, is respectively:Comment, which obtains, to be thumbed up number and is designated as numofzan, comments
By delivering the number that the time is designated as createtime, user name is designated as username, ID is designated as userid, the comment is commented on
Mesh is designated as replycount, content is designated as commenttext, theme of news ID is designated as group_id and the ID of comment is designated as comment
onlyid;The onlyid is the unique mark of comment;The comment data in the user comment of acquisition is carried out using SQL statement
Deduplication operation;Comment data after duplicate removal is designated as comment_nonrepetitive;By the comment_
Nonrepetitive points are training set and test set.
A kind of 4. sentiment analysis method of social media comment as claimed in claim 3, it is characterised in that:To the training set
The specific steps for being pre-processed respectively with test set and extracting Feature Words include:Remove the mark letter of every comment in training set
Breath and the comment content of forwarding people, only retain the comment that the ID is made;Comment after above-mentioned steps are handled is designated as
comment_personal;Retain the onlyid fields of every comment data;Jieba points are carried out to comment_personal
Word, including:Efficient word figure scanning is realized based on Trie tree constructions;Chinese character is all into word situation in generation comment;Form oriented
Acyclic figure (DAG) is designated as comment_jieba;Vocabulary stopword is disabled using Harbin Institute of Technology to remove in comment_jieba
Stop words, result is designated as comment_stopword;Count comment_stopword in degree adverb, negative word it is relative
Position;If negative word, before degree adverb, the weights of the negative word are 0.5 times of former weights;If negative word is in degree pair
Behind word, the weights of the negative word are 2 times of former weights;Word in commenting on wall scroll sorts from big to small according to weights;
The Feature Words that the larger preceding 7 groups of words of weights are commented on as this are chosen, are designated as wi, i=1 ... 7.
A kind of 5. sentiment analysis method of social media comment as claimed in claim 4, it is characterised in that:The setting emotion
Classification grade and threshold value and it is trained to obtain the specific steps of grader using bayes method combined training collection and includes:It is determined that
Category set C={ C1:It is extremely negative, C2:Negative sense, C3:It is relatively negative, C4:Neutrality, C5:It is positive }, and given threshold is C1;Statistics obtains each
The conditional probability of each Feature Words under classification:P(w1|C1), P (w2|C1)…P(w7|C1)…P(w1|C2)…P(w7|C2)…P(w1|
C5)…P(w7|C5);Count probability of all categories in training set:P(C1), P (C2), P (C3), P (C4) and P (C5);Comment X is divided
For CiThe standard of class is P (Ci| X)=max { P (C1|X),P(C2|X)…P(C5|X)};P (C are calculated using Bayes' theoremi|X);
Grader structure is completed using above-mentioned steps to every comment in training set.
A kind of 6. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:For Feature Words not
The foot comment of 7, with NULL complementary features words, and define P (NULL | Ci)=1.
A kind of 7. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:If the classification of comment
As a result it is C1, then lookup matching is carried out according to onlyid, and delete this comment in time.
A kind of 8. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:It is described to use pattra leaves
This theorem calculates P (Ci| X) specifically include:It is conditional sampling to set between each Feature Words, then
9. a kind of storage device, its feature includes:The storage device store instruction and data are used to realize claim 1~8
A kind of sentiment analysis method of described social media comment.
A kind of 10. sentiment analysis equipment of social media comment, it is characterised in that:Including:Processor and the storage device;Institute
Processor is stated to load and perform the instruction in the storage device and data are used to realize a kind of society described in claim 1~8
Hand over the sentiment analysis method of media comments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756607.6A CN107544961A (en) | 2017-08-29 | 2017-08-29 | A kind of sentiment analysis method, equipment and its storage device of social media comment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756607.6A CN107544961A (en) | 2017-08-29 | 2017-08-29 | A kind of sentiment analysis method, equipment and its storage device of social media comment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107544961A true CN107544961A (en) | 2018-01-05 |
Family
ID=60958235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710756607.6A Pending CN107544961A (en) | 2017-08-29 | 2017-08-29 | A kind of sentiment analysis method, equipment and its storage device of social media comment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107544961A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460010A (en) * | 2018-01-17 | 2018-08-28 | 南京邮电大学 | A kind of comprehensive grade model implementation method based on sentiment analysis |
CN108536673A (en) * | 2018-03-16 | 2018-09-14 | 数库(上海)科技有限公司 | Media event abstracting method and device |
CN110727763A (en) * | 2019-10-09 | 2020-01-24 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN113158082A (en) * | 2021-05-13 | 2021-07-23 | 聂佼颖 | Artificial intelligence-based media content reality degree analysis method |
CN113220823A (en) * | 2020-01-21 | 2021-08-06 | 北京中科闻歌科技股份有限公司 | Sentiment, topic and viewpoint analysis method for social media public language |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
-
2017
- 2017-08-29 CN CN201710756607.6A patent/CN107544961A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460010A (en) * | 2018-01-17 | 2018-08-28 | 南京邮电大学 | A kind of comprehensive grade model implementation method based on sentiment analysis |
CN108536673A (en) * | 2018-03-16 | 2018-09-14 | 数库(上海)科技有限公司 | Media event abstracting method and device |
CN110727763A (en) * | 2019-10-09 | 2020-01-24 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN110727763B (en) * | 2019-10-09 | 2022-10-14 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN113220823A (en) * | 2020-01-21 | 2021-08-06 | 北京中科闻歌科技股份有限公司 | Sentiment, topic and viewpoint analysis method for social media public language |
CN113220823B (en) * | 2020-01-21 | 2024-03-01 | 北京中科闻歌科技股份有限公司 | Method and device for analyzing emotion, topic and viewpoint of social media public language |
CN113158082A (en) * | 2021-05-13 | 2021-07-23 | 聂佼颖 | Artificial intelligence-based media content reality degree analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107544961A (en) | A kind of sentiment analysis method, equipment and its storage device of social media comment | |
KR101612423B1 (en) | Disaster detecting system using social media | |
CN105005594B (en) | Abnormal microblog users recognition methods | |
CN104239539B (en) | A kind of micro-blog information filter method merged based on much information | |
Gharge et al. | An integrated approach for malicious tweets detection using NLP | |
Ning et al. | Spam message classification based on the Naïve Bayes classification algorithm | |
Faguo et al. | Research on short text classification algorithm based on statistics and rules | |
Chen et al. | Email Hoax Detection System Using Levenshtein Distance Method. | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN103313248B (en) | Method and device for identifying junk information | |
CN103441924A (en) | Method and device for spam filtering based on short text | |
US20160080476A1 (en) | Meme discovery system | |
CN107633077B (en) | System and method for cleaning social media text data by multiple strategies | |
CN110134876B (en) | Network space population event sensing and detecting method based on crowd sensing sensor | |
CN105630890B (en) | New word discovery method and system based on intelligent Answer System conversation history | |
Wang | Learning to classify email: a survey | |
CN109471932A (en) | Rumour detection method, system and storage medium based on learning model | |
Atoum | Cyberbullying detection through sentiment analysis | |
Taylor et al. | Surfacing contextual hate speech words within social media | |
CN102663435A (en) | Junk image filtering method based on semi-supervision | |
Khan et al. | Text mining approach to detect spam in emails | |
CN103744964A (en) | Webpage classification method based on locality sensitive Hash function | |
Pillai et al. | Mobile Text Misinformation Detection Using Effective Information Retrieval Methods | |
CN108462624A (en) | A kind of recognition methods of spam, device and electronic equipment | |
Gautam et al. | A review on cyberstalking detection using machine learning techniques: Current trends and future direction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180105 |
|
RJ01 | Rejection of invention patent application after publication |