CN107544961A - A kind of sentiment analysis method, equipment and its storage device of social media comment - Google Patents

A kind of sentiment analysis method, equipment and its storage device of social media comment Download PDF

Info

Publication number
CN107544961A
CN107544961A CN201710756607.6A CN201710756607A CN107544961A CN 107544961 A CN107544961 A CN 107544961A CN 201710756607 A CN201710756607 A CN 201710756607A CN 107544961 A CN107544961 A CN 107544961A
Authority
CN
China
Prior art keywords
comment
social media
sentiment analysis
designated
analysis method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710756607.6A
Other languages
Chinese (zh)
Inventor
任伟
种胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201710756607.6A priority Critical patent/CN107544961A/en
Publication of CN107544961A publication Critical patent/CN107544961A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides sentiment analysis method, equipment and its storage device of a kind of social media comment, methods described includes step:User comment information is obtained with specific program;Using database processing obtain user comment information and the information is divided into training set and test set;Training set and test set are pre-processed respectively and extract Feature Words;Emotional semantic classification grade and threshold value are set, are trained to obtain grader using bayes method combined training collection;To test set classify simultaneously output category result with grader;Delete the comment that emotion classification grade in classification results is less than threshold value.A kind of the sentiment analysis equipment and storage device of social media comment, for realizing a kind of sentiment analysis method of described social media comment.The present invention can have found the user comment that there is menace to be oriented in social platform in time, and menace user comment is quickly and accurately handled.

Description

A kind of sentiment analysis method, equipment and its storage device of social media comment
Technical field
The present invention relates to network information processing field, and in particular to a kind of sentiment analysis method of social media comment, sets Standby and its storage device.
Background technology
In news category social media, user often leaves the comment of oneself with regard to some hot tickets, such as during leading portion Between " Sa De " event, current India's event etc..But may exist in the comment information of user and mislead the public or there is prestige The content of side of body property, if similar comment data is retained for a long time on social media platform, may cause unnecessary public opinion It is panic.Therefore, there is menace or misleading content in user comment content on monitoring social media platform in time, and to this A little contents, which carry out quickly and accurately processing, just turns into urgent problem to be solved.
The content of the invention
In order to solve the above problems, the invention provides a kind of social media comment sentiment analysis method, equipment and its Storage device, pending data information is gathered by using Python crawlers first, then in conjunction with MySQL database to adopting The data collected are pre-processed, and are finally trained grader using bayesian theory, can effectively be solved the above problems.
Technical scheme provided by the invention is:A kind of sentiment analysis method of social media comment, methods described include step Suddenly:User comment information is obtained with specific program;Using database processing obtain user comment information and the information is divided into Training set and test set;The training set and test set are pre-processed respectively and extract Feature Words;Emotional semantic classification etc. is set Level and threshold value, are trained to obtain grader using bayes method combined training collection;With the grader to entering in test set Row classification and output category result;Delete the comment that emotion classification grade in classification results is less than threshold value.A kind of storage device, institute State storage device store instruction and data are used for the sentiment analysis method for realizing that a kind of social media is commented on.A kind of social matchmaker The sentiment analysis equipment of body comment, the equipment include processor and the storage device;The processor loads and performs institute State the instruction in storage device and data are used for the sentiment analysis method for realizing that a kind of described social media is commented on.
The beneficial effects of the invention are as follows:The invention provides a kind of social media comment sentiment analysis method, equipment and Its storage device, the user comment that there is menace to be oriented in social platform can be found in time.At the same time it can also accomplish sending out Now menace user comment is quickly and accurately handled afterwards, and the ID of pinpoint threat comment.
Brief description of the drawings
Fig. 1 is the overall flow figure of the sentiment analysis method that social media is commented in the embodiment of the present invention;
Fig. 2 is that training set and test set of the present invention pre-process and extract Feature Words flow chart;
Fig. 3 is the classifier training schematic flow sheet of the embodiment of the present invention;
Fig. 4 is the hardware device operating diagram of the embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is further described, the particular technique details hereinafter mentioned, such as:Method, equipment etc., are only better understood from reader Technical scheme, does not represent that present invention is limited only by following ins and outs.
The embodiment provides a kind of social media comment sentiment analysis method, equipment and its storage device, Passing through will.Referring to Fig. 1, Fig. 1 is the overall flow figure of the sentiment analysis method that social media is commented in the embodiment of the present invention, institute State the sentiment analysis equipment that method is commented on by a kind of social media and realize that specific steps include:
S101:User comment information is obtained with specific program;The specific program is Python crawlers, described specific Program obtains the server address that social media is used to store comment;Set the temperature rank threshold of media event;According to described Temperature rank threshold obtains comment;The comment obtained by theme of news classification storage.
S102:Using database processing obtain user comment information and the information is divided into described in training set and test set Database is MySQL database;The MySQL database is divided into 8 fields, is respectively:Comment, which obtains, to be thumbed up number and is designated as Numofzan, comment deliver that the time is designated as createtime, user name is designated as username, ID is designated as userid, this is commented By the number commented on be designated as replycount, comment content is designated as commenttext, theme of news ID is designated as group_id and The ID of comment is designated as onlyid;The onlyid is the unique mark of comment;Using SQL statement by the user comment of acquisition Comment data carries out deduplication operation;Comment data after duplicate removal is designated as comment_nonrepetitive;By the comment_ Nonrepetitive points are training set and test set.
S103:The training set and test set are pre-processed respectively and extract Feature Words.
S104:Emotional semantic classification grade and threshold value are set, is trained and is classified using bayes method combined training collection Device.
S105:With the grader to the simultaneously output category result that carries out classifying in test set.
S106:Delete the comment that emotion classification grade in classification results is less than threshold value.
Referring to Fig. 2, Fig. 2 is that training set and test set of the present invention pre-process and extract Feature Words flow chart, is specifically included:
S201:The comment content of every label information commented on and forwarding people in training set are removed, only retains the ID institutes The comment of work.
S202:Comment after above-mentioned steps are handled is designated as comment_personal.
S203:Retain the onlyid fields of every comment data.
S204:Efficient word figure scanning is realized based on Trie tree constructions.
S205:Chinese character is all into word situation in generation comment.
S206:Form directed acyclic graph (DAG) and be designated as comment_jieba.
Above-mentioned S204~S206 is to carry out jieba participles to comment_personal.
S207:The stop words in vocabulary stopword removals comment_jieba is disabled using Harbin Institute of Technology, result is designated as comment_stopword。
S208:Count degree adverb, the relative position of negative word in comment_stopword.
S209:If negative word, before degree adverb, the weights of the negative word are 0.5 times of former weights.
S210:If negative word, behind degree adverb, the weights of the negative word are 2 times of former weights.
S211:Word in commenting on wall scroll sorts from big to small according to weights.
S212:The Feature Words that the larger preceding 7 groups of words of weights are commented on as this are chosen, are designated as wi, i=1 ... 7.
Referring to Fig. 3, Fig. 3 is the classifier training schematic flow sheet of the embodiment of the present invention, is specifically included:
S301:Determine category set C={ C1:It is extremely negative, C2:Negative sense, C3:It is relatively negative, C4:Neutrality, C5:It is positive }, and set Threshold value is C1.
S302:Statistics obtains the conditional probability of lower each Feature Words of all categories:
P(w1|C1), P (w2|C1)…P(w7|C1)…P(w1|C2)…P(w7|C2)…P(w1|C5)…P(w7|C5).For spy Levy comment of the word less than 7, with NULL complementary features words, and define P (NULL | Ci)=1.
S303:Count probability of all categories in training set:P(C1), P (C2), P (C3), P (C4) and P (C5)。
S304:Comment X is divided into CiThe standard of class is P (Ci| X)=max { P (C1|X),P(C2|X)…P(C5|X)}.If comment The classification results of opinion are C1, then lookup matching is carried out according to onlyid, and delete this comment in time.
S305:P (Ci | X) is calculated using Bayes' theorem.It is conditional sampling to set between each Feature Words, then
S306:Grader structure is completed using above-mentioned steps to every comment in training set.
Referring to Fig. 4, Fig. 4 is the hardware device operating diagram of the embodiment of the present invention, and the hardware device specifically includes:One Sentiment analysis equipment 401, processor 402 and the storage device 403 of kind social media comment.
A kind of sentiment analysis equipment 401 of social media comment:A kind of sentiment analysis equipment of the social media comment 401 realize a kind of sentiment analysis method of social media comment.
Processor 402:The processor 402 load and perform the instruction in the storage device 403 and data be used for it is real A kind of sentiment analysis method of existing described social media comment.
Storage device 403:The store instruction of storage device 403 and data;The storage device 403 is described for realizing A kind of social media comment sentiment analysis method.
By performing embodiments of the invention, all technical characteristics in the claims in the present invention are obtained for detailed explain State.
Prior art is different from, the embodiment provides a kind of sentiment analysis method of social media comment, is set Standby and its storage device, by inciting somebody to action, and effectively.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (10)

1. a kind of sentiment analysis method of social media comment, the sentiment analysis equipment that methods described is commented on by a kind of social media Realize, it is characterised in that:Comprise the following steps:User comment information is obtained with specific program;Obtained using database processing The information is simultaneously divided into training set and test set by user comment information;The training set and test set are pre-processed simultaneously respectively Extract Feature Words;Emotional semantic classification grade and threshold value are set, are trained to obtain grader using bayes method combined training collection; To test set classify simultaneously output category result with the grader;Delete emotion classification grade in classification results and be less than threshold value Comment.
A kind of 2. sentiment analysis method of social media comment as claimed in claim 1, it is characterised in that:The specific journey The step of sequence acquisition user comment information, specifically includes:The specific program is Python crawlers, and the specific program obtains Take social media be used for store comment on server address;Set the temperature rank threshold of media event;Arranged according to the temperature Name threshold value obtains comment;The comment obtained by theme of news classification storage.
A kind of 3. sentiment analysis method of social media comment as claimed in claim 1, it is characterised in that:It is described to use data The information is simultaneously divided into the specific steps of training set and test set and included by user comment information that storehouse processing obtains:The database For MySQL database;The MySQL database is divided into 8 fields, is respectively:Comment, which obtains, to be thumbed up number and is designated as numofzan, comments By delivering the number that the time is designated as createtime, user name is designated as username, ID is designated as userid, the comment is commented on Mesh is designated as replycount, content is designated as commenttext, theme of news ID is designated as group_id and the ID of comment is designated as comment onlyid;The onlyid is the unique mark of comment;The comment data in the user comment of acquisition is carried out using SQL statement Deduplication operation;Comment data after duplicate removal is designated as comment_nonrepetitive;By the comment_ Nonrepetitive points are training set and test set.
A kind of 4. sentiment analysis method of social media comment as claimed in claim 3, it is characterised in that:To the training set The specific steps for being pre-processed respectively with test set and extracting Feature Words include:Remove the mark letter of every comment in training set Breath and the comment content of forwarding people, only retain the comment that the ID is made;Comment after above-mentioned steps are handled is designated as comment_personal;Retain the onlyid fields of every comment data;Jieba points are carried out to comment_personal Word, including:Efficient word figure scanning is realized based on Trie tree constructions;Chinese character is all into word situation in generation comment;Form oriented Acyclic figure (DAG) is designated as comment_jieba;Vocabulary stopword is disabled using Harbin Institute of Technology to remove in comment_jieba Stop words, result is designated as comment_stopword;Count comment_stopword in degree adverb, negative word it is relative Position;If negative word, before degree adverb, the weights of the negative word are 0.5 times of former weights;If negative word is in degree pair Behind word, the weights of the negative word are 2 times of former weights;Word in commenting on wall scroll sorts from big to small according to weights; The Feature Words that the larger preceding 7 groups of words of weights are commented on as this are chosen, are designated as wi, i=1 ... 7.
A kind of 5. sentiment analysis method of social media comment as claimed in claim 4, it is characterised in that:The setting emotion Classification grade and threshold value and it is trained to obtain the specific steps of grader using bayes method combined training collection and includes:It is determined that Category set C={ C1:It is extremely negative, C2:Negative sense, C3:It is relatively negative, C4:Neutrality, C5:It is positive }, and given threshold is C1;Statistics obtains each The conditional probability of each Feature Words under classification:P(w1|C1), P (w2|C1)…P(w7|C1)…P(w1|C2)…P(w7|C2)…P(w1| C5)…P(w7|C5);Count probability of all categories in training set:P(C1), P (C2), P (C3), P (C4) and P (C5);Comment X is divided For CiThe standard of class is P (Ci| X)=max { P (C1|X),P(C2|X)…P(C5|X)};P (C are calculated using Bayes' theoremi|X); Grader structure is completed using above-mentioned steps to every comment in training set.
A kind of 6. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:For Feature Words not The foot comment of 7, with NULL complementary features words, and define P (NULL | Ci)=1.
A kind of 7. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:If the classification of comment As a result it is C1, then lookup matching is carried out according to onlyid, and delete this comment in time.
A kind of 8. sentiment analysis method of social media comment as claimed in claim 5, it is characterised in that:It is described to use pattra leaves This theorem calculates P (Ci| X) specifically include:It is conditional sampling to set between each Feature Words, then
9. a kind of storage device, its feature includes:The storage device store instruction and data are used to realize claim 1~8 A kind of sentiment analysis method of described social media comment.
A kind of 10. sentiment analysis equipment of social media comment, it is characterised in that:Including:Processor and the storage device;Institute Processor is stated to load and perform the instruction in the storage device and data are used to realize a kind of society described in claim 1~8 Hand over the sentiment analysis method of media comments.
CN201710756607.6A 2017-08-29 2017-08-29 A kind of sentiment analysis method, equipment and its storage device of social media comment Pending CN107544961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710756607.6A CN107544961A (en) 2017-08-29 2017-08-29 A kind of sentiment analysis method, equipment and its storage device of social media comment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710756607.6A CN107544961A (en) 2017-08-29 2017-08-29 A kind of sentiment analysis method, equipment and its storage device of social media comment

Publications (1)

Publication Number Publication Date
CN107544961A true CN107544961A (en) 2018-01-05

Family

ID=60958235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710756607.6A Pending CN107544961A (en) 2017-08-29 2017-08-29 A kind of sentiment analysis method, equipment and its storage device of social media comment

Country Status (1)

Country Link
CN (1) CN107544961A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460010A (en) * 2018-01-17 2018-08-28 南京邮电大学 A kind of comprehensive grade model implementation method based on sentiment analysis
CN108536673A (en) * 2018-03-16 2018-09-14 数库(上海)科技有限公司 Media event abstracting method and device
CN110727763A (en) * 2019-10-09 2020-01-24 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN113158082A (en) * 2021-05-13 2021-07-23 聂佼颖 Artificial intelligence-based media content reality degree analysis method
CN113220823A (en) * 2020-01-21 2021-08-06 北京中科闻歌科技股份有限公司 Sentiment, topic and viewpoint analysis method for social media public language

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460010A (en) * 2018-01-17 2018-08-28 南京邮电大学 A kind of comprehensive grade model implementation method based on sentiment analysis
CN108536673A (en) * 2018-03-16 2018-09-14 数库(上海)科技有限公司 Media event abstracting method and device
CN110727763A (en) * 2019-10-09 2020-01-24 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN110727763B (en) * 2019-10-09 2022-10-14 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN113220823A (en) * 2020-01-21 2021-08-06 北京中科闻歌科技股份有限公司 Sentiment, topic and viewpoint analysis method for social media public language
CN113220823B (en) * 2020-01-21 2024-03-01 北京中科闻歌科技股份有限公司 Method and device for analyzing emotion, topic and viewpoint of social media public language
CN113158082A (en) * 2021-05-13 2021-07-23 聂佼颖 Artificial intelligence-based media content reality degree analysis method

Similar Documents

Publication Publication Date Title
CN107544961A (en) A kind of sentiment analysis method, equipment and its storage device of social media comment
KR101612423B1 (en) Disaster detecting system using social media
CN105005594B (en) Abnormal microblog users recognition methods
CN104239539B (en) A kind of micro-blog information filter method merged based on much information
Gharge et al. An integrated approach for malicious tweets detection using NLP
Ning et al. Spam message classification based on the Naïve Bayes classification algorithm
Faguo et al. Research on short text classification algorithm based on statistics and rules
Chen et al. Email Hoax Detection System Using Levenshtein Distance Method.
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN103313248B (en) Method and device for identifying junk information
CN103441924A (en) Method and device for spam filtering based on short text
US20160080476A1 (en) Meme discovery system
CN107633077B (en) System and method for cleaning social media text data by multiple strategies
CN110134876B (en) Network space population event sensing and detecting method based on crowd sensing sensor
CN105630890B (en) New word discovery method and system based on intelligent Answer System conversation history
Wang Learning to classify email: a survey
CN109471932A (en) Rumour detection method, system and storage medium based on learning model
Atoum Cyberbullying detection through sentiment analysis
Taylor et al. Surfacing contextual hate speech words within social media
CN102663435A (en) Junk image filtering method based on semi-supervision
Khan et al. Text mining approach to detect spam in emails
CN103744964A (en) Webpage classification method based on locality sensitive Hash function
Pillai et al. Mobile Text Misinformation Detection Using Effective Information Retrieval Methods
CN108462624A (en) A kind of recognition methods of spam, device and electronic equipment
Gautam et al. A review on cyberstalking detection using machine learning techniques: Current trends and future direction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180105

RJ01 Rejection of invention patent application after publication