CN110825876A - Movie comment viewpoint emotion tendency analysis method - Google Patents
Movie comment viewpoint emotion tendency analysis method Download PDFInfo
- Publication number
- CN110825876A CN110825876A CN201911082409.1A CN201911082409A CN110825876A CN 110825876 A CN110825876 A CN 110825876A CN 201911082409 A CN201911082409 A CN 201911082409A CN 110825876 A CN110825876 A CN 110825876A
- Authority
- CN
- China
- Prior art keywords
- comment
- emotion
- words
- viewpoint
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a movie comment viewpoint emotion tendentiousness analysis method, which comprises the following steps: crawling film description information and comment information of a plurality of films of each category from a film comment website; carrying out data preprocessing on the collected film comment description information and comment information; formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank; marking by keyword matching or manual marking, and carrying out comment label category marking and emotion tendency marking on each comment statement; generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model; and automatically generating a comment label category label and an emotion tendency label by using a comment viewpoint emotion analysis model aiming at the target film comment. The emotional expression of the user to the film can be comprehensively and accurately reflected.
Description
Technical Field
The invention relates to the technical field of information extraction and data mining, in particular to a movie comment viewpoint emotion orientation analysis method.
Background
In the internet big data era, online comments become praise terms, and are the most direct expression mode and channel of the emotional attitude of consumers. The analysis of the consumer comments can obtain the all-around evaluation of the product for the consumer, so that the product can be known in multiple dimensions, and the decision making of the user is facilitated. For the merchant, the preference of the consumer and the market can be known, so that the service quality is improved, and the stickiness of the customer is increased. With the increasing innovation of internet media technology, the movie entertainment industry, such as the cinema industry and the home entertainment industry, is developing vigorously, movies have become daily entertainment options of people, and the acceptance and welcome of people to movies also breed a large amount of comment information. The subjective view is extracted from public comments, and the positive tendency or negative tendency of the public is judged to be an important problem in information extraction and mining in the field of natural language processing, and meanwhile, the film comment information shows the value of the film comment information in the aspects of value transmission, film and television environment modeling and the like, and the film comment information is developed and analyzed to contribute to the deepening development of film and television research. Therefore, it is significant to analyze the emotional orientation of the movie review viewpoint.
The commonly used method for extracting the opinion of the user comment is mainly an unsupervised rule extraction and clustering algorithm and the like. The method based on rule extraction mainly extracts viewpoints in the comments according to the syntactic structure manual summary rule, but the manual arrangement rule cannot cover all comment viewpoint expression modes, so that the method has limited effective viewpoints to extract. The clustering-based method is simple but low in accuracy, and is difficult to generate reasonable and accurate comment tags.
At present, dictionary matching and classification algorithms and the like are commonly used methods for comment sentiment analysis. The method based on the emotion dictionary completely depends on the emotion dictionary and is limited by the size of the scale of the dictionary; the emotion classification algorithm is a supervised method, some training sets are obtained according to comment information and score combination, some training sets are manually labeled, and a large amount of labor cost is consumed.
In addition, comment information of different industries often has respective focus points, so the ways of emotion analysis are slightly different. For movie reviews, compared with online review information such as e-commerce, restaurants, hotels and the like, the included user experience and experience information is relatively complex, so that the current emotion analysis and viewpoint extraction method cannot be completely applied to movie review analysis. In addition, many online comment researches use comment viewpoint extraction and emotion classification as two separate research modules, and user comments on a certain product or thing are often multidimensional, and the comments and the derogations of each dimension evaluation of the product are different, and it is obviously not correct enough to directly analyze whether the user emotion is good comment (positive) or bad comment (negative), so that the emotion analysis on the main viewpoint dimension extracted by the user is more practical. For example, for the comment that "the actor in the movie is cracked, but the story is not good", the results of (actor, positive direction) and (plot, negative direction) obtained by emotion analysis are more accurate.
Disclosure of Invention
The invention aims to provide a movie comment viewpoint emotion tendency analysis method which can comprehensively and accurately reflect emotion expression of a user on a movie.
The technical scheme for realizing the purpose is as follows:
a movie comment opinion sentiment tendency analysis method comprises the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model;
and step S6, automatically generating comment label category labels and emotion tendency labels by using the comment viewpoint emotion analysis model aiming at the target movie comment.
Preferably, in step S1, the classification of the movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a lead actor name, a type and a total score;
the comment information includes: the comment is a nickname, useful number of comments, time of comment, comment content and score.
Preferably, the data preprocessing comprises:
integrating all the collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment content in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
and acquiring the film name, the director name and the director name from the acquired description information of each film, storing the film names, the director names and the director names into a user-defined dictionary, and marking the film names with different symbols.
Preferably, the step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among the words and the expression structure of viewpoint words and sentiment words in the comment viewpoints;
sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis are carried out on the comment content in the comment corpus to obtain each comment sentence, whether the comment sentences match a certain comment viewpoint extraction rule or not is checked, if matching, viewpoint words and sentiment words are obtained,
and respectively storing all the acquired viewpoint words and emotion words as a comment label word library and a viewpoint emotion word library.
Preferably, the dependency syntax structure includes: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the part of speech among the words comprises: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-like object component is an emotional word; the fixed language component is an emotional word, and the noun component modified by the fixed language component is a viewpoint word.
Preferably, the step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking the comment sentences with label category labels and emotion tendentiousness labels if the matching of the acquired viewpoint words and the emotion dictionary is successful; otherwise, carrying out manual label category marking and emotion tendency marking;
and performing manual label type marking and emotion tendency marking on the comment sentences of which the viewpoint words and the emotion words are not extracted in the step S3.
Preferably, the obtaining of the tag category dictionary includes:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means clustering algorithm;
manually inducing and screening the popular viewpoints of the movie reviews into 8 dimensions of director, photography, scenario, actor, emotion, audio-visual effect, subject matter and impression, screening words under each cluster, and reserving related words to form a preliminary label category dictionary;
acquiring related words of the labeled category words in the preliminary label category dictionary by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
Preferably, the step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and performing weighted fusion on the two primary label emotion classification models to generate a final label emotion classification model.
Preferably, the step of generating the preliminary comment tag classification model or the preliminary tag emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing the keyword matched marking data set and the manually marked data set after the data balance into a training set and a testing set according to a preset proportion;
performing word segmentation on the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
Preferably, the step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The invention has the beneficial effects that: the method and the device are used for processing text information with complex movie comment contents and emotional tendencies, and analyzing the emotional tendencies of movie comment data in a mode of combining various methods and various strategies, so that the emotional tendencies of audiences to certain aspects of a movie can be captured accurately.
Drawings
FIG. 1 is a flow chart of a movie reviews perspective emotional orientation analysis method of the present invention;
FIG. 2 is a flow chart of keyword matching marking in the present invention;
FIG. 3 is a schematic diagram of a review tag classification model fusion in the present invention;
FIG. 4 is a schematic diagram of label emotion classification model fusion in the present invention;
FIG. 5 is a schematic diagram of a classification model construction process according to the present invention;
FIG. 6 is a flow chart of the automatic generation of comment emotion tags in the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
Referring to fig. 1, the method for analyzing the sentiment orientation of the review viewpoint of the movie according to the present invention mainly extracts the review viewpoint of movie review data, performs marking classification and sentiment orientation analysis of the viewpoint, that is, obtains the category of the review label and the sentiment orientation thereof, and simultaneously constructs a review viewpoint sentiment analysis model to analyze and classify the new movie review data and attach the category and the sentiment label thereto. Comprises the following steps:
step S1, data crawling: and crawling love, animation, action, science fiction, horror, comedy and suspicion categories of film description information of a plurality of films and comment information of each film from a film evaluation website. The movie description information includes information such as movie name, director name, genre, and overall score. The comment information of the film comprises information such as a nickname of a commentator, useful number of comments, comment time, comment content, score and the like.
Step S2, performing data preprocessing on the movie description information and the comment information, including:
integrating data, namely integrating all the collected comment information into a comment corpus;
data deduplication, namely removing duplicate data in the comment corpus;
processing the missing value, and deleting data with missing comment content in the comment corpus;
the traditional Chinese processing is to convert all traditional Chinese in the comment corpus into simplified Chinese;
and self-defining a user dictionary, acquiring the film name, the director name and the director name from the collected film description information, storing the film name, the director name and the director name into the user-defined dictionary, and marking the film names with different symbols.
Step S3, comment viewpoint extraction: and (3) making a plurality of universal comment viewpoint extraction rules according to the dependency syntax structure and the part of speech among the words in the modern Chinese and by combining the expression structure of the viewpoint words and the emotion words in the actual comment viewpoint. The method comprises the steps of carrying out operations such as sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis on comment contents in a comment corpus to obtain each comment sentence, then checking whether the comment sentences are matched with a certain comment viewpoint extraction rule, obtaining (viewpoint words and sentiment words) if the comment sentences are matched with the comment viewpoint extraction rule, and finally storing all the obtained viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank respectively.
The comment viewpoint extraction rule mainly divides the rule into two types according to the dependency syntax structure: the rule system takes a main and predicate Structure (SBV) as a core, and the rule system takes a fixed-center structure (ATT) as a core. The syntax relationships involved in the extraction rules are shown in table 1:
type of relationship | Tag | Description | Example |
Main and subordinate structure | SBV | subject-verb | I send her a bunch of flowers (I < — send) |
Structure of Buddhist guest | VOB | verb-object | I send her bunch of flowers (send- - > flower) |
Centering structure | ATT | attribute | Red apple (Red < -apple) |
Middle structure | ADV | adverbial | Very beautiful (very < -beautiful) |
Dynamic compensation structure | CMP | complement | Completed operation (do- - > complete) |
Parallel structure | COO | coordinate | Mountain and sea (mountain- - >)Sea) |
TABLE 1
Further, the SBV-based rule system is mainly classified into 4 categories, as shown in table 2:
TABLE 2
As can be seen from Table 2, the rules based on SBV are mainly based on the noun subject to directly or indirectly establish relationship connection with an object or an object-like structure (hereinafter, the indirect or object-like structure is referred to as an object-like structure). The extracted subject component is a comment viewpoint word, and the extracted object-like component is a comment viewpoint emotion word.
This rule does not only relate to the sentence structure listed in Table 2, but also considers whether the subject and the formal object have a parallel structure, and further considers whether the formal object has adverb modifications because negative words affect the emotion. For example, for the movie rating "movie and scenario good", two sets of viewpoint words and emotion word pairs (movie, good), (scenario, good) can be extracted according to the proposed rules; the 'subject rich and novel' can obtain a (subject, rich) and (subject, novel) label pair; "movie is not good at" can be extracted (movie, not good at).
Further, the rule system with ATT as the core is also classified into 4 types, and the specific rules are shown in table 3.
TABLE 3
Since the fixed language is used to modify, define, and explain the quality and characteristics of a noun or pronoun, the centering relation is essential in the review perspective extraction rule. As seen from table 3, the adjectives are generally used as sentiment words for commenting on the viewpoint, and the nouns modified by them or verbs used as nouns are used as viewpoint words for commenting on. Similarly, the rules also need to consider the side-by-side structure of noun components, adjectives, and adverb components that modify adjectives. For example, the example sentence "hard and embarrassed performance" given in table 3 is parallel to "embarrassed", so two sets of label pairs (representing, hard) and (performing, embarrassed) can be extracted; the "show not live" can be extracted (show, not live).
And step S4, commenting the label category mark and the emotion tendency mark, and dividing the comment label category mark into keyword matching marking and manual marking. The method comprises the following steps that a label category dictionary and an emotion dictionary need to be acquired during keyword matching marking, keyword matching is carried out, the main process is shown in figure 2, the label category dictionary is acquired firstly, and the method comprises the following steps:
1) film proper noun substitution. The comment tag word library contains the film names, director names and actor names in a user-defined dictionary and is respectively marked as 'movies', 'directors' and 'actors', so that the classification of partial words in the comment tag word library is realized; that is, if the names of actors such as "zhang san" and "lie si" exist in the comment tag word stock, but the machine cannot distinguish that "zhang san" and "lie si" are actors, the names of actors in the user-defined dictionary can be matched with the names of actors in the user-defined dictionary, so that "zhang san" and "lie si" can be marked as "actors"; the same approach is used for the marking of the director's name and the film name.
2) And (5) training a word vector model. Dividing words of comment contents in a comment corpus, stopping words, and storing the words in a text, wherein each comment sentence is stored in a line, and the words are separated by spaces; obtaining a word vector model by utilizing the word2vec (word vector) model to train the well-processed comment content;
3) and clustering words. Expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means (k mean) clustering algorithm; the k categories are determined by observing clustering results through multiple tests;
4) and (5) inducing the evaluation dimension and screening a category dictionary. The popular viewpoints of the film reviews are divided into 8 dimensions of director, photography, drama, actor, emotion, audio-visual, subject and impression by manual induction and screening, the words under each cluster are screened, and the related words are reserved to form a label category dictionary;
5) a tag class dictionary is augmented. And (3) acquiring related words of the label category words by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary. The method comprises the steps of obtaining related words of label category words, calculating similarity between the words through a word vector model, setting a threshold value, determining that the words are related and similar when the similarity is larger than the threshold value, and manually screening results of the related words to ensure the accuracy of a label category dictionary.
An example of the generated label category dictionary is shown in table 4:
TABLE 4
Next, an emotion dictionary is obtained. Firstly, collecting positive and negative emotion dictionaries of an open source, wherein the HowNet dictionary of a known network and the emotion dictionaries of the open source of Taiwan university are mainly used for sorting and combining the dictionaries. The HowNet knowledge network dictionary only takes positive and negative evaluation words. Then, counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting some words irrelevant to the movie comment emotion to form an emotion dictionary with movie characteristics.
And finally, matching keywords. The keyword matching is to extract comment sentences of the viewpoint words and the emotion words in the comment viewpoint extraction, match the viewpoint words with the label category dictionary, match the emotion words with the emotion dictionary, and mark (label category, emotion tendentiousness) on the comment sentences if both the comment sentences and the emotion words can be successfully matched. For example, for a "less-than-storied" comment, the comment viewpoint is extracted to obtain a (less-than-storied) label, and a (storyline, negative) label is obtained after the label category and emotional tendency label.
The manual marking has two conditions that sentences of the viewpoint words and the emotion words are not extracted in the comment viewpoint extraction, sentences which can extract the viewpoint words and the emotion words but cannot meet the keyword matching marking can be extracted in the comment viewpoint extraction, and the manual label category marking and the emotion tendency marking are carried out on the condition.
And step S5, generating a comment viewpoint emotion analysis model which is composed of a comment label classification model and a label emotion classification model, wherein the two classification models are different except for class labels, and the whole data processing and classification algorithm are the same. There are two types of classification model datasets: firstly, a data set marked by keyword matching and secondly, a data set marked manually are respectively used for training to generate 2 comment label classification models and 2 label emotion classification models. In order to improve the accuracy of emotion analysis, the 2 comment label classification models are weighted and fused to generate a new comment label classification model, and the 2 label emotion classification models are weighted and fused to generate a new label emotion classification model, which is referred to fig. 3 and 4. In this embodiment, the weight of the model generated by the keyword marking data and the weight of the model generated by the manual marking data are 0.4 and 0.6, respectively.
The comment opinion sentiment analysis probability calculation formula is as follows:
Pi=0.4*P1i+0.6*P2i
wherein, PiRepresenting the probability that a certain comment content in the comment corpus is of the i category, P1i、P2iThe probability values obtained by the models generated by the keyword marking data and the probability values obtained by the models generated by the manual marking data are respectively shown. For the comment tag classification model, the values of i are 0-7, and the 8 categories of director, photography, scenario, actor, emotion, audio-visual and subject are represented respectively. For the label emotion classification model, the values of i are 0 and 1, 1 represents positive emotion, and 0 represents negative emotion.
The above-mentioned construction process of the classification model, see fig. 5, involves the following steps:
first, data balancing is performed. The various samples of the classified data may have an unbalanced phenomenon, which has a great influence on the overall accuracy of classification. The invention adopts an upsampling (Oversampling) strategy, namely, copying small data types into multiple copies.
Second, dataset partitioning is performed. The scrambled data set is divided into a training set and a test set according to the ratio of 8: 2.
Then, feature extraction is performed. Segmenting the corpus of the training set, removing stop words, extracting text features by adopting TF-IDF algorithm (word frequency-inverse document frequency), and calculating CHI-square value (CHI2 or CHI) of each feature2) And by setting a threshold value K (K is an integer), keeping K characteristics before the chi-square value arrangement to realize characteristic dimension reduction.
And finally, importing the data into a random forest classification model, and performing model training, storage and evaluation.
Step S6, the comment emotion label is automatically generated. After the comment opinion emotion analysis model is trained, automatic marking of new film comments can be performed, and a specific emotion prediction process is described with reference to fig. 6. Firstly, comment viewpoint extraction and extraction (viewpoint words and emotion words) are carried out, if the (viewpoint words and emotion words) can be obtained, keyword matching including label category matching and emotion word matching is carried out, and if the keyword matching and the emotion word matching can be successfully matched, a result is directly output. Otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds (T1 and T2), and outputting (comment tag class mark and emotion tendency mark) if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions should also fall within the scope of the present invention, and should be defined by the claims.
Claims (10)
1. A movie comment viewpoint emotion tendentiousness analysis method is characterized by comprising the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model;
and step S6, automatically generating comment label category labels and emotion tendency labels by using the comment viewpoint emotion analysis model aiming at the target movie comment.
2. The method for analyzing emotional tendency of opinion of movie reviews according to claim 1, wherein in step S1, the classification of movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a lead actor name, a type and a total score;
the comment information includes: the comment is a nickname, useful number of comments, time of comment, comment content and score.
3. The method for analyzing emotional tendency of opinion of movie reviews according to claim 1, wherein the data preprocessing comprises:
integrating all the collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment content in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
and acquiring the film name, the director name and the director name from the acquired description information of each film, storing the film names, the director names and the director names into a user-defined dictionary, and marking the film names with different symbols.
4. The method for analyzing emotional tendency of opinion of movie reviews, according to claim 1, wherein said step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among the words and the expression structure of viewpoint words and sentiment words in the comment viewpoints;
sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis are carried out on the comment content in the comment corpus to obtain each comment sentence, whether the comment sentences match a certain comment viewpoint extraction rule or not is checked, if matching, viewpoint words and sentiment words are obtained,
and respectively storing all the acquired viewpoint words and emotion words as a comment label word library and a viewpoint emotion word library.
5. The method of analyzing emotional tendency of opinion of movie reviews, according to claim 4, wherein the dependency syntax structure comprises: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the part of speech among the words comprises: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-like object component is an emotional word; the fixed language component is an emotional word, and the noun component modified by the fixed language component is a viewpoint word.
6. The method for analyzing emotional tendency of opinion of movie reviews according to claim 3, wherein said step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking the comment sentences with label category labels and emotion tendentiousness labels if the matching of the acquired viewpoint words and the emotion dictionary is successful; otherwise, carrying out manual label category marking and emotion tendency marking;
and performing manual label type marking and emotion tendency marking on the comment sentences of which the viewpoint words and the emotion words are not extracted in the step S3.
7. The method for analyzing emotional tendency of opinion of movie reviews according to claim 6, wherein said obtaining a dictionary of tag categories comprises:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means clustering algorithm;
manually inducing and screening the popular viewpoints of the movie reviews into 8 dimensions of director, photography, scenario, actor, emotion, audio-visual effect, subject matter and impression, screening words under each cluster, and reserving related words to form a preliminary label category dictionary;
acquiring related words of the labeled category words in the preliminary label category dictionary by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
8. The method for analyzing emotional tendency of opinion of movie reviews, according to claim 1, wherein said step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and performing weighted fusion on the two primary label emotion classification models to generate a final label emotion classification model.
9. The method for analyzing emotion tendentiousness of comment viewpoint of movie as claimed in claim 8, wherein said step of generating preliminary comment label classification model or preliminary label emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing the keyword matched marking data set and the manually marked data set after the data balance into a training set and a testing set according to a preset proportion;
performing word segmentation on the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
10. The method for analyzing emotional tendency of opinion of movie reviews according to claim 6, wherein said step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082409.1A CN110825876B (en) | 2019-11-07 | 2019-11-07 | Movie comment viewpoint emotion tendency analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082409.1A CN110825876B (en) | 2019-11-07 | 2019-11-07 | Movie comment viewpoint emotion tendency analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110825876A true CN110825876A (en) | 2020-02-21 |
CN110825876B CN110825876B (en) | 2022-07-15 |
Family
ID=69553492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911082409.1A Active CN110825876B (en) | 2019-11-07 | 2019-11-07 | Movie comment viewpoint emotion tendency analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110825876B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111565322A (en) * | 2020-05-14 | 2020-08-21 | 北京奇艺世纪科技有限公司 | User emotional tendency information obtaining method and device and electronic equipment |
CN111666767A (en) * | 2020-06-10 | 2020-09-15 | 创新奇智(上海)科技有限公司 | Data identification method and device, electronic equipment and storage medium |
CN111966944A (en) * | 2020-08-17 | 2020-11-20 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN112115231A (en) * | 2020-09-17 | 2020-12-22 | 中国传媒大学 | Data processing method and device |
CN112215003A (en) * | 2020-11-09 | 2021-01-12 | 深圳市洪堡智慧餐饮科技有限公司 | Comment label extraction method based on albert pre-training model and kmean algorithm |
CN112214661A (en) * | 2020-10-12 | 2021-01-12 | 西华大学 | Emotional unstable user detection method for conventional video comments |
CN112527963A (en) * | 2020-12-17 | 2021-03-19 | 深圳市欢太科技有限公司 | Multi-label emotion classification method and device based on dictionary, equipment and storage medium |
CN112612873A (en) * | 2020-12-25 | 2021-04-06 | 上海德拓信息技术股份有限公司 | NLP technology-based centralized event mining method |
CN112651211A (en) * | 2020-12-11 | 2021-04-13 | 北京大米科技有限公司 | Label information determination method, device, server and storage medium |
CN113010689A (en) * | 2021-03-22 | 2021-06-22 | 平安科技(深圳)有限公司 | Buddhism knowledge discrimination method, device, equipment and storage medium |
CN113065052A (en) * | 2021-04-07 | 2021-07-02 | 顶象科技有限公司 | Method and device for analyzing authenticity of video comment, electronic equipment and storage medium |
CN113127640A (en) * | 2021-03-12 | 2021-07-16 | 嘉兴职业技术学院 | Malicious spam comment attack identification method based on natural language processing |
CN113312478A (en) * | 2021-04-25 | 2021-08-27 | 国家计算机网络与信息安全管理中心 | Viewpoint mining method and device based on reading understanding |
CN113505582A (en) * | 2021-05-25 | 2021-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Music comment sentiment analysis method, equipment and medium |
CN113515663A (en) * | 2021-08-03 | 2021-10-19 | 广州酷狗计算机科技有限公司 | Comment information display method and device, electronic equipment and storage medium |
CN113536080A (en) * | 2021-07-20 | 2021-10-22 | 湖南快乐阳光互动娱乐传媒有限公司 | Data uploading method and device and electronic equipment |
CN113961725A (en) * | 2021-10-25 | 2022-01-21 | 北京明略软件***有限公司 | Automatic label labeling method, system, equipment and storage medium |
CN115392199A (en) * | 2022-08-22 | 2022-11-25 | 再惠(上海)网络科技有限公司 | Evaluation analysis and report generation method, device, electronic equipment and storage medium |
CN116644754A (en) * | 2023-05-31 | 2023-08-25 | 重庆邮电大学 | Internet financial product comment viewpoint extraction method based on big data |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279460A (en) * | 2013-05-24 | 2013-09-04 | 北京尚友通达信息技术有限公司 | Method for analyzing and processing online shopping comments |
US20140201041A1 (en) * | 2013-01-11 | 2014-07-17 | Tagnetics, Inc. | Out of stock sensor |
CN104462487A (en) * | 2014-12-19 | 2015-03-25 | 南开大学 | Individualized online news comment mood forecast method capable of fusing multiple information sources |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN105354183A (en) * | 2015-10-19 | 2016-02-24 | Tcl集团股份有限公司 | Analytic method, apparatus and system for internet comments of household electrical appliance products |
CN106096664A (en) * | 2016-06-23 | 2016-11-09 | 广州云数信息科技有限公司 | A kind of sentiment analysis method based on social network data |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN106407236A (en) * | 2015-08-03 | 2017-02-15 | 北京众荟信息技术有限公司 | An emotion tendency detection method for comment data |
CN106649519A (en) * | 2016-10-17 | 2017-05-10 | 北京邮电大学 | Method of digging and assessing product features |
CN108108433A (en) * | 2017-12-19 | 2018-06-01 | 杭州电子科技大学 | A kind of rule-based and the data network integration sentiment analysis method |
CN108108468A (en) * | 2017-12-29 | 2018-06-01 | 华中科技大学鄂州工业技术研究院 | A kind of short text sentiment analysis method and apparatus based on concept and text emotion |
CN108460010A (en) * | 2018-01-17 | 2018-08-28 | 南京邮电大学 | A kind of comprehensive grade model implementation method based on sentiment analysis |
CN109684647A (en) * | 2019-02-19 | 2019-04-26 | 东北林业大学 | Film comment sentiment analysis method and device |
-
2019
- 2019-11-07 CN CN201911082409.1A patent/CN110825876B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140201041A1 (en) * | 2013-01-11 | 2014-07-17 | Tagnetics, Inc. | Out of stock sensor |
CN103279460A (en) * | 2013-05-24 | 2013-09-04 | 北京尚友通达信息技术有限公司 | Method for analyzing and processing online shopping comments |
CN104462487A (en) * | 2014-12-19 | 2015-03-25 | 南开大学 | Individualized online news comment mood forecast method capable of fusing multiple information sources |
CN106407236A (en) * | 2015-08-03 | 2017-02-15 | 北京众荟信息技术有限公司 | An emotion tendency detection method for comment data |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN105354183A (en) * | 2015-10-19 | 2016-02-24 | Tcl集团股份有限公司 | Analytic method, apparatus and system for internet comments of household electrical appliance products |
CN106096664A (en) * | 2016-06-23 | 2016-11-09 | 广州云数信息科技有限公司 | A kind of sentiment analysis method based on social network data |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN106649519A (en) * | 2016-10-17 | 2017-05-10 | 北京邮电大学 | Method of digging and assessing product features |
CN108108433A (en) * | 2017-12-19 | 2018-06-01 | 杭州电子科技大学 | A kind of rule-based and the data network integration sentiment analysis method |
CN108108468A (en) * | 2017-12-29 | 2018-06-01 | 华中科技大学鄂州工业技术研究院 | A kind of short text sentiment analysis method and apparatus based on concept and text emotion |
CN108460010A (en) * | 2018-01-17 | 2018-08-28 | 南京邮电大学 | A kind of comprehensive grade model implementation method based on sentiment analysis |
CN109684647A (en) * | 2019-02-19 | 2019-04-26 | 东北林业大学 | Film comment sentiment analysis method and device |
Non-Patent Citations (1)
Title |
---|
王学贺 等: "基于Word2vec和多分类器的影评情感分类方法", 《宁夏大学学报(自然科学版)》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111565322A (en) * | 2020-05-14 | 2020-08-21 | 北京奇艺世纪科技有限公司 | User emotional tendency information obtaining method and device and electronic equipment |
CN111666767A (en) * | 2020-06-10 | 2020-09-15 | 创新奇智(上海)科技有限公司 | Data identification method and device, electronic equipment and storage medium |
CN111666767B (en) * | 2020-06-10 | 2023-07-18 | 创新奇智(上海)科技有限公司 | Data identification method and device, electronic equipment and storage medium |
CN111966944A (en) * | 2020-08-17 | 2020-11-20 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN111966944B (en) * | 2020-08-17 | 2024-04-09 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN112115231A (en) * | 2020-09-17 | 2020-12-22 | 中国传媒大学 | Data processing method and device |
CN112214661B (en) * | 2020-10-12 | 2022-04-08 | 西华大学 | Emotional unstable user detection method for conventional video comments |
CN112214661A (en) * | 2020-10-12 | 2021-01-12 | 西华大学 | Emotional unstable user detection method for conventional video comments |
CN112215003A (en) * | 2020-11-09 | 2021-01-12 | 深圳市洪堡智慧餐饮科技有限公司 | Comment label extraction method based on albert pre-training model and kmean algorithm |
CN112651211A (en) * | 2020-12-11 | 2021-04-13 | 北京大米科技有限公司 | Label information determination method, device, server and storage medium |
CN112527963B (en) * | 2020-12-17 | 2024-05-03 | 深圳市欢太科技有限公司 | Dictionary-based multi-label emotion classification method and device, equipment and storage medium |
CN112527963A (en) * | 2020-12-17 | 2021-03-19 | 深圳市欢太科技有限公司 | Multi-label emotion classification method and device based on dictionary, equipment and storage medium |
CN112612873A (en) * | 2020-12-25 | 2021-04-06 | 上海德拓信息技术股份有限公司 | NLP technology-based centralized event mining method |
CN112612873B (en) * | 2020-12-25 | 2023-07-07 | 上海德拓信息技术股份有限公司 | Centralized event mining method based on NLP technology |
CN113127640A (en) * | 2021-03-12 | 2021-07-16 | 嘉兴职业技术学院 | Malicious spam comment attack identification method based on natural language processing |
CN113010689A (en) * | 2021-03-22 | 2021-06-22 | 平安科技(深圳)有限公司 | Buddhism knowledge discrimination method, device, equipment and storage medium |
CN113065052A (en) * | 2021-04-07 | 2021-07-02 | 顶象科技有限公司 | Method and device for analyzing authenticity of video comment, electronic equipment and storage medium |
CN113312478A (en) * | 2021-04-25 | 2021-08-27 | 国家计算机网络与信息安全管理中心 | Viewpoint mining method and device based on reading understanding |
CN113312478B (en) * | 2021-04-25 | 2022-07-19 | 国家计算机网络与信息安全管理中心 | Viewpoint mining method and device based on reading understanding |
CN113505582A (en) * | 2021-05-25 | 2021-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Music comment sentiment analysis method, equipment and medium |
CN113536080A (en) * | 2021-07-20 | 2021-10-22 | 湖南快乐阳光互动娱乐传媒有限公司 | Data uploading method and device and electronic equipment |
CN113515663A (en) * | 2021-08-03 | 2021-10-19 | 广州酷狗计算机科技有限公司 | Comment information display method and device, electronic equipment and storage medium |
CN113961725A (en) * | 2021-10-25 | 2022-01-21 | 北京明略软件***有限公司 | Automatic label labeling method, system, equipment and storage medium |
CN115392199A (en) * | 2022-08-22 | 2022-11-25 | 再惠(上海)网络科技有限公司 | Evaluation analysis and report generation method, device, electronic equipment and storage medium |
CN115392199B (en) * | 2022-08-22 | 2023-08-04 | 再惠(上海)网络科技有限公司 | Evaluation analysis and report generation method, device, electronic equipment and storage medium |
CN116644754A (en) * | 2023-05-31 | 2023-08-25 | 重庆邮电大学 | Internet financial product comment viewpoint extraction method based on big data |
CN116644754B (en) * | 2023-05-31 | 2024-04-16 | 金智东博(北京)教育科技股份有限公司 | Internet financial product comment viewpoint extraction method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN110825876B (en) | 2022-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110825876B (en) | Movie comment viewpoint emotion tendency analysis method | |
Mazloom et al. | Multimodal popularity prediction of brand-related social media posts | |
Eirinaki et al. | Feature-based opinion mining and ranking | |
Basiri et al. | Sentence-level sentiment analysis in Persian | |
AU2011326430B2 (en) | Learning tags for video annotation using latent subtags | |
CN109800390B (en) | Method and device for calculating personalized emotion abstract | |
Lima et al. | Automatic sentiment analysis of Twitter messages | |
Singh et al. | Sentiment analysis of textual reviews; Evaluating machine learning, unsupervised and SentiWordNet approaches | |
Cataldi et al. | Good location, terrible food: detecting feature sentiment in user-generated reviews | |
WO2017013667A1 (en) | Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof | |
Merler et al. | You are what you tweet… pic! gender prediction based on semantic analysis of social media images | |
CN106407420B (en) | Multimedia resource recommendation method and system | |
US10055741B2 (en) | Method and apparatus of matching an object to be displayed | |
CN108491512A (en) | The method of abstracting and device of headline | |
CN108460150A (en) | The processing method and processing device of headline | |
CN108399265A (en) | Real-time hot news providing method based on search and device | |
CN108363700A (en) | The method for evaluating quality and device of headline | |
Leopairote et al. | Software quality in use characteristic mining from customer reviews | |
Rani et al. | Study and comparision of vectorization techniques used in text classification | |
Grivolla et al. | A hybrid recommender combining user, item and interaction data | |
Yao et al. | Online deception detection refueled by real world data collection | |
Urriza et al. | Aspect-based sentiment analysis of user created game reviews | |
Dadoun et al. | Sentiment Classification Techniques Applied to Swedish Tweets Investigating the Effects of translation on Sentiments from Swedish into English | |
Li et al. | Confidence estimation and reputation analysis in aspect extraction | |
Clarizia et al. | Sentiment analysis in social networks: A methodology based on the latent dirichlet allocation approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |