CN110825876A - Movie comment viewpoint emotion tendency analysis method - Google Patents

Movie comment viewpoint emotion tendency analysis method Download PDF

Info

Publication number
CN110825876A
CN110825876A CN201911082409.1A CN201911082409A CN110825876A CN 110825876 A CN110825876 A CN 110825876A CN 201911082409 A CN201911082409 A CN 201911082409A CN 110825876 A CN110825876 A CN 110825876A
Authority
CN
China
Prior art keywords
comment
emotion
words
viewpoint
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911082409.1A
Other languages
Chinese (zh)
Other versions
CN110825876B (en
Inventor
许青青
谢赟
韩欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Original Assignee
Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tak Billiton Information Technology Ltd By Share Ltd filed Critical Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Priority to CN201911082409.1A priority Critical patent/CN110825876B/en
Publication of CN110825876A publication Critical patent/CN110825876A/en
Application granted granted Critical
Publication of CN110825876B publication Critical patent/CN110825876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a movie comment viewpoint emotion tendentiousness analysis method, which comprises the following steps: crawling film description information and comment information of a plurality of films of each category from a film comment website; carrying out data preprocessing on the collected film comment description information and comment information; formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank; marking by keyword matching or manual marking, and carrying out comment label category marking and emotion tendency marking on each comment statement; generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model; and automatically generating a comment label category label and an emotion tendency label by using a comment viewpoint emotion analysis model aiming at the target film comment. The emotional expression of the user to the film can be comprehensively and accurately reflected.

Description

Movie comment viewpoint emotion tendency analysis method
Technical Field
The invention relates to the technical field of information extraction and data mining, in particular to a movie comment viewpoint emotion orientation analysis method.
Background
In the internet big data era, online comments become praise terms, and are the most direct expression mode and channel of the emotional attitude of consumers. The analysis of the consumer comments can obtain the all-around evaluation of the product for the consumer, so that the product can be known in multiple dimensions, and the decision making of the user is facilitated. For the merchant, the preference of the consumer and the market can be known, so that the service quality is improved, and the stickiness of the customer is increased. With the increasing innovation of internet media technology, the movie entertainment industry, such as the cinema industry and the home entertainment industry, is developing vigorously, movies have become daily entertainment options of people, and the acceptance and welcome of people to movies also breed a large amount of comment information. The subjective view is extracted from public comments, and the positive tendency or negative tendency of the public is judged to be an important problem in information extraction and mining in the field of natural language processing, and meanwhile, the film comment information shows the value of the film comment information in the aspects of value transmission, film and television environment modeling and the like, and the film comment information is developed and analyzed to contribute to the deepening development of film and television research. Therefore, it is significant to analyze the emotional orientation of the movie review viewpoint.
The commonly used method for extracting the opinion of the user comment is mainly an unsupervised rule extraction and clustering algorithm and the like. The method based on rule extraction mainly extracts viewpoints in the comments according to the syntactic structure manual summary rule, but the manual arrangement rule cannot cover all comment viewpoint expression modes, so that the method has limited effective viewpoints to extract. The clustering-based method is simple but low in accuracy, and is difficult to generate reasonable and accurate comment tags.
At present, dictionary matching and classification algorithms and the like are commonly used methods for comment sentiment analysis. The method based on the emotion dictionary completely depends on the emotion dictionary and is limited by the size of the scale of the dictionary; the emotion classification algorithm is a supervised method, some training sets are obtained according to comment information and score combination, some training sets are manually labeled, and a large amount of labor cost is consumed.
In addition, comment information of different industries often has respective focus points, so the ways of emotion analysis are slightly different. For movie reviews, compared with online review information such as e-commerce, restaurants, hotels and the like, the included user experience and experience information is relatively complex, so that the current emotion analysis and viewpoint extraction method cannot be completely applied to movie review analysis. In addition, many online comment researches use comment viewpoint extraction and emotion classification as two separate research modules, and user comments on a certain product or thing are often multidimensional, and the comments and the derogations of each dimension evaluation of the product are different, and it is obviously not correct enough to directly analyze whether the user emotion is good comment (positive) or bad comment (negative), so that the emotion analysis on the main viewpoint dimension extracted by the user is more practical. For example, for the comment that "the actor in the movie is cracked, but the story is not good", the results of (actor, positive direction) and (plot, negative direction) obtained by emotion analysis are more accurate.
Disclosure of Invention
The invention aims to provide a movie comment viewpoint emotion tendency analysis method which can comprehensively and accurately reflect emotion expression of a user on a movie.
The technical scheme for realizing the purpose is as follows:
a movie comment opinion sentiment tendency analysis method comprises the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model;
and step S6, automatically generating comment label category labels and emotion tendency labels by using the comment viewpoint emotion analysis model aiming at the target movie comment.
Preferably, in step S1, the classification of the movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a lead actor name, a type and a total score;
the comment information includes: the comment is a nickname, useful number of comments, time of comment, comment content and score.
Preferably, the data preprocessing comprises:
integrating all the collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment content in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
and acquiring the film name, the director name and the director name from the acquired description information of each film, storing the film names, the director names and the director names into a user-defined dictionary, and marking the film names with different symbols.
Preferably, the step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among the words and the expression structure of viewpoint words and sentiment words in the comment viewpoints;
sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis are carried out on the comment content in the comment corpus to obtain each comment sentence, whether the comment sentences match a certain comment viewpoint extraction rule or not is checked, if matching, viewpoint words and sentiment words are obtained,
and respectively storing all the acquired viewpoint words and emotion words as a comment label word library and a viewpoint emotion word library.
Preferably, the dependency syntax structure includes: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the part of speech among the words comprises: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-like object component is an emotional word; the fixed language component is an emotional word, and the noun component modified by the fixed language component is a viewpoint word.
Preferably, the step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking the comment sentences with label category labels and emotion tendentiousness labels if the matching of the acquired viewpoint words and the emotion dictionary is successful; otherwise, carrying out manual label category marking and emotion tendency marking;
and performing manual label type marking and emotion tendency marking on the comment sentences of which the viewpoint words and the emotion words are not extracted in the step S3.
Preferably, the obtaining of the tag category dictionary includes:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means clustering algorithm;
manually inducing and screening the popular viewpoints of the movie reviews into 8 dimensions of director, photography, scenario, actor, emotion, audio-visual effect, subject matter and impression, screening words under each cluster, and reserving related words to form a preliminary label category dictionary;
acquiring related words of the labeled category words in the preliminary label category dictionary by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
Preferably, the step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and performing weighted fusion on the two primary label emotion classification models to generate a final label emotion classification model.
Preferably, the step of generating the preliminary comment tag classification model or the preliminary tag emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing the keyword matched marking data set and the manually marked data set after the data balance into a training set and a testing set according to a preset proportion;
performing word segmentation on the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
Preferably, the step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The invention has the beneficial effects that: the method and the device are used for processing text information with complex movie comment contents and emotional tendencies, and analyzing the emotional tendencies of movie comment data in a mode of combining various methods and various strategies, so that the emotional tendencies of audiences to certain aspects of a movie can be captured accurately.
Drawings
FIG. 1 is a flow chart of a movie reviews perspective emotional orientation analysis method of the present invention;
FIG. 2 is a flow chart of keyword matching marking in the present invention;
FIG. 3 is a schematic diagram of a review tag classification model fusion in the present invention;
FIG. 4 is a schematic diagram of label emotion classification model fusion in the present invention;
FIG. 5 is a schematic diagram of a classification model construction process according to the present invention;
FIG. 6 is a flow chart of the automatic generation of comment emotion tags in the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
Referring to fig. 1, the method for analyzing the sentiment orientation of the review viewpoint of the movie according to the present invention mainly extracts the review viewpoint of movie review data, performs marking classification and sentiment orientation analysis of the viewpoint, that is, obtains the category of the review label and the sentiment orientation thereof, and simultaneously constructs a review viewpoint sentiment analysis model to analyze and classify the new movie review data and attach the category and the sentiment label thereto. Comprises the following steps:
step S1, data crawling: and crawling love, animation, action, science fiction, horror, comedy and suspicion categories of film description information of a plurality of films and comment information of each film from a film evaluation website. The movie description information includes information such as movie name, director name, genre, and overall score. The comment information of the film comprises information such as a nickname of a commentator, useful number of comments, comment time, comment content, score and the like.
Step S2, performing data preprocessing on the movie description information and the comment information, including:
integrating data, namely integrating all the collected comment information into a comment corpus;
data deduplication, namely removing duplicate data in the comment corpus;
processing the missing value, and deleting data with missing comment content in the comment corpus;
the traditional Chinese processing is to convert all traditional Chinese in the comment corpus into simplified Chinese;
and self-defining a user dictionary, acquiring the film name, the director name and the director name from the collected film description information, storing the film name, the director name and the director name into the user-defined dictionary, and marking the film names with different symbols.
Step S3, comment viewpoint extraction: and (3) making a plurality of universal comment viewpoint extraction rules according to the dependency syntax structure and the part of speech among the words in the modern Chinese and by combining the expression structure of the viewpoint words and the emotion words in the actual comment viewpoint. The method comprises the steps of carrying out operations such as sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis on comment contents in a comment corpus to obtain each comment sentence, then checking whether the comment sentences are matched with a certain comment viewpoint extraction rule, obtaining (viewpoint words and sentiment words) if the comment sentences are matched with the comment viewpoint extraction rule, and finally storing all the obtained viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank respectively.
The comment viewpoint extraction rule mainly divides the rule into two types according to the dependency syntax structure: the rule system takes a main and predicate Structure (SBV) as a core, and the rule system takes a fixed-center structure (ATT) as a core. The syntax relationships involved in the extraction rules are shown in table 1:
type of relationship Tag Description Example
Main and subordinate structure SBV subject-verb I send her a bunch of flowers (I < — send)
Structure of Buddhist guest VOB verb-object I send her bunch of flowers (send- - > flower)
Centering structure ATT attribute Red apple (Red < -apple)
Middle structure ADV adverbial Very beautiful (very < -beautiful)
Dynamic compensation structure CMP complement Completed operation (do- - > complete)
Parallel structure COO coordinate Mountain and sea (mountain- - >)Sea)
TABLE 1
Further, the SBV-based rule system is mainly classified into 4 categories, as shown in table 2:
Figure BDA0002264370080000071
TABLE 2
As can be seen from Table 2, the rules based on SBV are mainly based on the noun subject to directly or indirectly establish relationship connection with an object or an object-like structure (hereinafter, the indirect or object-like structure is referred to as an object-like structure). The extracted subject component is a comment viewpoint word, and the extracted object-like component is a comment viewpoint emotion word.
This rule does not only relate to the sentence structure listed in Table 2, but also considers whether the subject and the formal object have a parallel structure, and further considers whether the formal object has adverb modifications because negative words affect the emotion. For example, for the movie rating "movie and scenario good", two sets of viewpoint words and emotion word pairs (movie, good), (scenario, good) can be extracted according to the proposed rules; the 'subject rich and novel' can obtain a (subject, rich) and (subject, novel) label pair; "movie is not good at" can be extracted (movie, not good at).
Further, the rule system with ATT as the core is also classified into 4 types, and the specific rules are shown in table 3.
Figure BDA0002264370080000081
TABLE 3
Since the fixed language is used to modify, define, and explain the quality and characteristics of a noun or pronoun, the centering relation is essential in the review perspective extraction rule. As seen from table 3, the adjectives are generally used as sentiment words for commenting on the viewpoint, and the nouns modified by them or verbs used as nouns are used as viewpoint words for commenting on. Similarly, the rules also need to consider the side-by-side structure of noun components, adjectives, and adverb components that modify adjectives. For example, the example sentence "hard and embarrassed performance" given in table 3 is parallel to "embarrassed", so two sets of label pairs (representing, hard) and (performing, embarrassed) can be extracted; the "show not live" can be extracted (show, not live).
And step S4, commenting the label category mark and the emotion tendency mark, and dividing the comment label category mark into keyword matching marking and manual marking. The method comprises the following steps that a label category dictionary and an emotion dictionary need to be acquired during keyword matching marking, keyword matching is carried out, the main process is shown in figure 2, the label category dictionary is acquired firstly, and the method comprises the following steps:
1) film proper noun substitution. The comment tag word library contains the film names, director names and actor names in a user-defined dictionary and is respectively marked as 'movies', 'directors' and 'actors', so that the classification of partial words in the comment tag word library is realized; that is, if the names of actors such as "zhang san" and "lie si" exist in the comment tag word stock, but the machine cannot distinguish that "zhang san" and "lie si" are actors, the names of actors in the user-defined dictionary can be matched with the names of actors in the user-defined dictionary, so that "zhang san" and "lie si" can be marked as "actors"; the same approach is used for the marking of the director's name and the film name.
2) And (5) training a word vector model. Dividing words of comment contents in a comment corpus, stopping words, and storing the words in a text, wherein each comment sentence is stored in a line, and the words are separated by spaces; obtaining a word vector model by utilizing the word2vec (word vector) model to train the well-processed comment content;
3) and clustering words. Expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means (k mean) clustering algorithm; the k categories are determined by observing clustering results through multiple tests;
4) and (5) inducing the evaluation dimension and screening a category dictionary. The popular viewpoints of the film reviews are divided into 8 dimensions of director, photography, drama, actor, emotion, audio-visual, subject and impression by manual induction and screening, the words under each cluster are screened, and the related words are reserved to form a label category dictionary;
5) a tag class dictionary is augmented. And (3) acquiring related words of the label category words by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary. The method comprises the steps of obtaining related words of label category words, calculating similarity between the words through a word vector model, setting a threshold value, determining that the words are related and similar when the similarity is larger than the threshold value, and manually screening results of the related words to ensure the accuracy of a label category dictionary.
An example of the generated label category dictionary is shown in table 4:
Figure BDA0002264370080000091
TABLE 4
Next, an emotion dictionary is obtained. Firstly, collecting positive and negative emotion dictionaries of an open source, wherein the HowNet dictionary of a known network and the emotion dictionaries of the open source of Taiwan university are mainly used for sorting and combining the dictionaries. The HowNet knowledge network dictionary only takes positive and negative evaluation words. Then, counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting some words irrelevant to the movie comment emotion to form an emotion dictionary with movie characteristics.
And finally, matching keywords. The keyword matching is to extract comment sentences of the viewpoint words and the emotion words in the comment viewpoint extraction, match the viewpoint words with the label category dictionary, match the emotion words with the emotion dictionary, and mark (label category, emotion tendentiousness) on the comment sentences if both the comment sentences and the emotion words can be successfully matched. For example, for a "less-than-storied" comment, the comment viewpoint is extracted to obtain a (less-than-storied) label, and a (storyline, negative) label is obtained after the label category and emotional tendency label.
The manual marking has two conditions that sentences of the viewpoint words and the emotion words are not extracted in the comment viewpoint extraction, sentences which can extract the viewpoint words and the emotion words but cannot meet the keyword matching marking can be extracted in the comment viewpoint extraction, and the manual label category marking and the emotion tendency marking are carried out on the condition.
And step S5, generating a comment viewpoint emotion analysis model which is composed of a comment label classification model and a label emotion classification model, wherein the two classification models are different except for class labels, and the whole data processing and classification algorithm are the same. There are two types of classification model datasets: firstly, a data set marked by keyword matching and secondly, a data set marked manually are respectively used for training to generate 2 comment label classification models and 2 label emotion classification models. In order to improve the accuracy of emotion analysis, the 2 comment label classification models are weighted and fused to generate a new comment label classification model, and the 2 label emotion classification models are weighted and fused to generate a new label emotion classification model, which is referred to fig. 3 and 4. In this embodiment, the weight of the model generated by the keyword marking data and the weight of the model generated by the manual marking data are 0.4 and 0.6, respectively.
The comment opinion sentiment analysis probability calculation formula is as follows:
Pi=0.4*P1i+0.6*P2i
wherein, PiRepresenting the probability that a certain comment content in the comment corpus is of the i category, P1i、P2iThe probability values obtained by the models generated by the keyword marking data and the probability values obtained by the models generated by the manual marking data are respectively shown. For the comment tag classification model, the values of i are 0-7, and the 8 categories of director, photography, scenario, actor, emotion, audio-visual and subject are represented respectively. For the label emotion classification model, the values of i are 0 and 1, 1 represents positive emotion, and 0 represents negative emotion.
The above-mentioned construction process of the classification model, see fig. 5, involves the following steps:
first, data balancing is performed. The various samples of the classified data may have an unbalanced phenomenon, which has a great influence on the overall accuracy of classification. The invention adopts an upsampling (Oversampling) strategy, namely, copying small data types into multiple copies.
Second, dataset partitioning is performed. The scrambled data set is divided into a training set and a test set according to the ratio of 8: 2.
Then, feature extraction is performed. Segmenting the corpus of the training set, removing stop words, extracting text features by adopting TF-IDF algorithm (word frequency-inverse document frequency), and calculating CHI-square value (CHI2 or CHI) of each feature2) And by setting a threshold value K (K is an integer), keeping K characteristics before the chi-square value arrangement to realize characteristic dimension reduction.
And finally, importing the data into a random forest classification model, and performing model training, storage and evaluation.
Step S6, the comment emotion label is automatically generated. After the comment opinion emotion analysis model is trained, automatic marking of new film comments can be performed, and a specific emotion prediction process is described with reference to fig. 6. Firstly, comment viewpoint extraction and extraction (viewpoint words and emotion words) are carried out, if the (viewpoint words and emotion words) can be obtained, keyword matching including label category matching and emotion word matching is carried out, and if the keyword matching and the emotion word matching can be successfully matched, a result is directly output. Otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds (T1 and T2), and outputting (comment tag class mark and emotion tendency mark) if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions should also fall within the scope of the present invention, and should be defined by the claims.

Claims (10)

1. A movie comment viewpoint emotion tendentiousness analysis method is characterized by comprising the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model;
and step S6, automatically generating comment label category labels and emotion tendency labels by using the comment viewpoint emotion analysis model aiming at the target movie comment.
2. The method for analyzing emotional tendency of opinion of movie reviews according to claim 1, wherein in step S1, the classification of movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a lead actor name, a type and a total score;
the comment information includes: the comment is a nickname, useful number of comments, time of comment, comment content and score.
3. The method for analyzing emotional tendency of opinion of movie reviews according to claim 1, wherein the data preprocessing comprises:
integrating all the collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment content in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
and acquiring the film name, the director name and the director name from the acquired description information of each film, storing the film names, the director names and the director names into a user-defined dictionary, and marking the film names with different symbols.
4. The method for analyzing emotional tendency of opinion of movie reviews, according to claim 1, wherein said step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among the words and the expression structure of viewpoint words and sentiment words in the comment viewpoints;
sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis are carried out on the comment content in the comment corpus to obtain each comment sentence, whether the comment sentences match a certain comment viewpoint extraction rule or not is checked, if matching, viewpoint words and sentiment words are obtained,
and respectively storing all the acquired viewpoint words and emotion words as a comment label word library and a viewpoint emotion word library.
5. The method of analyzing emotional tendency of opinion of movie reviews, according to claim 4, wherein the dependency syntax structure comprises: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the part of speech among the words comprises: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-like object component is an emotional word; the fixed language component is an emotional word, and the noun component modified by the fixed language component is a viewpoint word.
6. The method for analyzing emotional tendency of opinion of movie reviews according to claim 3, wherein said step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking the comment sentences with label category labels and emotion tendentiousness labels if the matching of the acquired viewpoint words and the emotion dictionary is successful; otherwise, carrying out manual label category marking and emotion tendency marking;
and performing manual label type marking and emotion tendency marking on the comment sentences of which the viewpoint words and the emotion words are not extracted in the step S3.
7. The method for analyzing emotional tendency of opinion of movie reviews according to claim 6, wherein said obtaining a dictionary of tag categories comprises:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
expressing the words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means clustering algorithm;
manually inducing and screening the popular viewpoints of the movie reviews into 8 dimensions of director, photography, scenario, actor, emotion, audio-visual effect, subject matter and impression, screening words under each cluster, and reserving related words to form a preliminary label category dictionary;
acquiring related words of the labeled category words in the preliminary label category dictionary by using the trained word vector model to expand the label category dictionary, removing repeated words in the dictionary, and generating a final label category dictionary;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
8. The method for analyzing emotional tendency of opinion of movie reviews, according to claim 1, wherein said step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and performing weighted fusion on the two primary label emotion classification models to generate a final label emotion classification model.
9. The method for analyzing emotion tendentiousness of comment viewpoint of movie as claimed in claim 8, wherein said step of generating preliminary comment label classification model or preliminary label emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing the keyword matched marking data set and the manually marked data set after the data balance into a training set and a testing set according to a preset proportion;
performing word segmentation on the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
10. The method for analyzing emotional tendency of opinion of movie reviews according to claim 6, wherein said step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
CN201911082409.1A 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method Active CN110825876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911082409.1A CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911082409.1A CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Publications (2)

Publication Number Publication Date
CN110825876A true CN110825876A (en) 2020-02-21
CN110825876B CN110825876B (en) 2022-07-15

Family

ID=69553492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911082409.1A Active CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Country Status (1)

Country Link
CN (1) CN110825876B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111565322A (en) * 2020-05-14 2020-08-21 北京奇艺世纪科技有限公司 User emotional tendency information obtaining method and device and electronic equipment
CN111666767A (en) * 2020-06-10 2020-09-15 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111966944A (en) * 2020-08-17 2020-11-20 中电科大数据研究院有限公司 Model construction method for multi-level user comment security audit
CN112115231A (en) * 2020-09-17 2020-12-22 中国传媒大学 Data processing method and device
CN112215003A (en) * 2020-11-09 2021-01-12 深圳市洪堡智慧餐饮科技有限公司 Comment label extraction method based on albert pre-training model and kmean algorithm
CN112214661A (en) * 2020-10-12 2021-01-12 西华大学 Emotional unstable user detection method for conventional video comments
CN112527963A (en) * 2020-12-17 2021-03-19 深圳市欢太科技有限公司 Multi-label emotion classification method and device based on dictionary, equipment and storage medium
CN112612873A (en) * 2020-12-25 2021-04-06 上海德拓信息技术股份有限公司 NLP technology-based centralized event mining method
CN112651211A (en) * 2020-12-11 2021-04-13 北京大米科技有限公司 Label information determination method, device, server and storage medium
CN113010689A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Buddhism knowledge discrimination method, device, equipment and storage medium
CN113065052A (en) * 2021-04-07 2021-07-02 顶象科技有限公司 Method and device for analyzing authenticity of video comment, electronic equipment and storage medium
CN113127640A (en) * 2021-03-12 2021-07-16 嘉兴职业技术学院 Malicious spam comment attack identification method based on natural language processing
CN113312478A (en) * 2021-04-25 2021-08-27 国家计算机网络与信息安全管理中心 Viewpoint mining method and device based on reading understanding
CN113505582A (en) * 2021-05-25 2021-10-15 腾讯音乐娱乐科技(深圳)有限公司 Music comment sentiment analysis method, equipment and medium
CN113515663A (en) * 2021-08-03 2021-10-19 广州酷狗计算机科技有限公司 Comment information display method and device, electronic equipment and storage medium
CN113536080A (en) * 2021-07-20 2021-10-22 湖南快乐阳光互动娱乐传媒有限公司 Data uploading method and device and electronic equipment
CN113961725A (en) * 2021-10-25 2022-01-21 北京明略软件***有限公司 Automatic label labeling method, system, equipment and storage medium
CN115392199A (en) * 2022-08-22 2022-11-25 再惠(上海)网络科技有限公司 Evaluation analysis and report generation method, device, electronic equipment and storage medium
CN116644754A (en) * 2023-05-31 2023-08-25 重庆邮电大学 Internet financial product comment viewpoint extraction method based on big data

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279460A (en) * 2013-05-24 2013-09-04 北京尚友通达信息技术有限公司 Method for analyzing and processing online shopping comments
US20140201041A1 (en) * 2013-01-11 2014-07-17 Tagnetics, Inc. Out of stock sensor
CN104462487A (en) * 2014-12-19 2015-03-25 南开大学 Individualized online news comment mood forecast method capable of fusing multiple information sources
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN106096664A (en) * 2016-06-23 2016-11-09 广州云数信息科技有限公司 A kind of sentiment analysis method based on social network data
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
CN106649519A (en) * 2016-10-17 2017-05-10 北京邮电大学 Method of digging and assessing product features
CN108108433A (en) * 2017-12-19 2018-06-01 杭州电子科技大学 A kind of rule-based and the data network integration sentiment analysis method
CN108108468A (en) * 2017-12-29 2018-06-01 华中科技大学鄂州工业技术研究院 A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN108460010A (en) * 2018-01-17 2018-08-28 南京邮电大学 A kind of comprehensive grade model implementation method based on sentiment analysis
CN109684647A (en) * 2019-02-19 2019-04-26 东北林业大学 Film comment sentiment analysis method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201041A1 (en) * 2013-01-11 2014-07-17 Tagnetics, Inc. Out of stock sensor
CN103279460A (en) * 2013-05-24 2013-09-04 北京尚友通达信息技术有限公司 Method for analyzing and processing online shopping comments
CN104462487A (en) * 2014-12-19 2015-03-25 南开大学 Individualized online news comment mood forecast method capable of fusing multiple information sources
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN106096664A (en) * 2016-06-23 2016-11-09 广州云数信息科技有限公司 A kind of sentiment analysis method based on social network data
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106649519A (en) * 2016-10-17 2017-05-10 北京邮电大学 Method of digging and assessing product features
CN108108433A (en) * 2017-12-19 2018-06-01 杭州电子科技大学 A kind of rule-based and the data network integration sentiment analysis method
CN108108468A (en) * 2017-12-29 2018-06-01 华中科技大学鄂州工业技术研究院 A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN108460010A (en) * 2018-01-17 2018-08-28 南京邮电大学 A kind of comprehensive grade model implementation method based on sentiment analysis
CN109684647A (en) * 2019-02-19 2019-04-26 东北林业大学 Film comment sentiment analysis method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王学贺 等: "基于Word2vec和多分类器的影评情感分类方法", 《宁夏大学学报(自然科学版)》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111565322A (en) * 2020-05-14 2020-08-21 北京奇艺世纪科技有限公司 User emotional tendency information obtaining method and device and electronic equipment
CN111666767A (en) * 2020-06-10 2020-09-15 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111666767B (en) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111966944A (en) * 2020-08-17 2020-11-20 中电科大数据研究院有限公司 Model construction method for multi-level user comment security audit
CN111966944B (en) * 2020-08-17 2024-04-09 中电科大数据研究院有限公司 Model construction method for multi-level user comment security audit
CN112115231A (en) * 2020-09-17 2020-12-22 中国传媒大学 Data processing method and device
CN112214661B (en) * 2020-10-12 2022-04-08 西华大学 Emotional unstable user detection method for conventional video comments
CN112214661A (en) * 2020-10-12 2021-01-12 西华大学 Emotional unstable user detection method for conventional video comments
CN112215003A (en) * 2020-11-09 2021-01-12 深圳市洪堡智慧餐饮科技有限公司 Comment label extraction method based on albert pre-training model and kmean algorithm
CN112651211A (en) * 2020-12-11 2021-04-13 北京大米科技有限公司 Label information determination method, device, server and storage medium
CN112527963B (en) * 2020-12-17 2024-05-03 深圳市欢太科技有限公司 Dictionary-based multi-label emotion classification method and device, equipment and storage medium
CN112527963A (en) * 2020-12-17 2021-03-19 深圳市欢太科技有限公司 Multi-label emotion classification method and device based on dictionary, equipment and storage medium
CN112612873A (en) * 2020-12-25 2021-04-06 上海德拓信息技术股份有限公司 NLP technology-based centralized event mining method
CN112612873B (en) * 2020-12-25 2023-07-07 上海德拓信息技术股份有限公司 Centralized event mining method based on NLP technology
CN113127640A (en) * 2021-03-12 2021-07-16 嘉兴职业技术学院 Malicious spam comment attack identification method based on natural language processing
CN113010689A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Buddhism knowledge discrimination method, device, equipment and storage medium
CN113065052A (en) * 2021-04-07 2021-07-02 顶象科技有限公司 Method and device for analyzing authenticity of video comment, electronic equipment and storage medium
CN113312478A (en) * 2021-04-25 2021-08-27 国家计算机网络与信息安全管理中心 Viewpoint mining method and device based on reading understanding
CN113312478B (en) * 2021-04-25 2022-07-19 国家计算机网络与信息安全管理中心 Viewpoint mining method and device based on reading understanding
CN113505582A (en) * 2021-05-25 2021-10-15 腾讯音乐娱乐科技(深圳)有限公司 Music comment sentiment analysis method, equipment and medium
CN113536080A (en) * 2021-07-20 2021-10-22 湖南快乐阳光互动娱乐传媒有限公司 Data uploading method and device and electronic equipment
CN113515663A (en) * 2021-08-03 2021-10-19 广州酷狗计算机科技有限公司 Comment information display method and device, electronic equipment and storage medium
CN113961725A (en) * 2021-10-25 2022-01-21 北京明略软件***有限公司 Automatic label labeling method, system, equipment and storage medium
CN115392199A (en) * 2022-08-22 2022-11-25 再惠(上海)网络科技有限公司 Evaluation analysis and report generation method, device, electronic equipment and storage medium
CN115392199B (en) * 2022-08-22 2023-08-04 再惠(上海)网络科技有限公司 Evaluation analysis and report generation method, device, electronic equipment and storage medium
CN116644754A (en) * 2023-05-31 2023-08-25 重庆邮电大学 Internet financial product comment viewpoint extraction method based on big data
CN116644754B (en) * 2023-05-31 2024-04-16 金智东博(北京)教育科技股份有限公司 Internet financial product comment viewpoint extraction method based on big data

Also Published As

Publication number Publication date
CN110825876B (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN110825876B (en) Movie comment viewpoint emotion tendency analysis method
Mazloom et al. Multimodal popularity prediction of brand-related social media posts
Eirinaki et al. Feature-based opinion mining and ranking
Basiri et al. Sentence-level sentiment analysis in Persian
AU2011326430B2 (en) Learning tags for video annotation using latent subtags
CN109800390B (en) Method and device for calculating personalized emotion abstract
Lima et al. Automatic sentiment analysis of Twitter messages
Singh et al. Sentiment analysis of textual reviews; Evaluating machine learning, unsupervised and SentiWordNet approaches
Cataldi et al. Good location, terrible food: detecting feature sentiment in user-generated reviews
WO2017013667A1 (en) Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
Merler et al. You are what you tweet… pic! gender prediction based on semantic analysis of social media images
CN106407420B (en) Multimedia resource recommendation method and system
US10055741B2 (en) Method and apparatus of matching an object to be displayed
CN108491512A (en) The method of abstracting and device of headline
CN108460150A (en) The processing method and processing device of headline
CN108399265A (en) Real-time hot news providing method based on search and device
CN108363700A (en) The method for evaluating quality and device of headline
Leopairote et al. Software quality in use characteristic mining from customer reviews
Rani et al. Study and comparision of vectorization techniques used in text classification
Grivolla et al. A hybrid recommender combining user, item and interaction data
Yao et al. Online deception detection refueled by real world data collection
Urriza et al. Aspect-based sentiment analysis of user created game reviews
Dadoun et al. Sentiment Classification Techniques Applied to Swedish Tweets Investigating the Effects of translation on Sentiments from Swedish into English
Li et al. Confidence estimation and reputation analysis in aspect extraction
Clarizia et al. Sentiment analysis in social networks: A methodology based on the latent dirichlet allocation approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant