CN109829166B - People and host customer opinion mining method based on character-level convolutional neural network - Google Patents

People and host customer opinion mining method based on character-level convolutional neural network Download PDF

Info

Publication number
CN109829166B
CN109829166B CN201910117188.0A CN201910117188A CN109829166B CN 109829166 B CN109829166 B CN 109829166B CN 201910117188 A CN201910117188 A CN 201910117188A CN 109829166 B CN109829166 B CN 109829166B
Authority
CN
China
Prior art keywords
character
text
neural network
comments
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910117188.0A
Other languages
Chinese (zh)
Other versions
CN109829166A (en
Inventor
杨有
张振
罗凌
余平
尚晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Normal University
Original Assignee
Chongqing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Normal University filed Critical Chongqing Normal University
Priority to CN201910117188.0A priority Critical patent/CN109829166B/en
Publication of CN109829166A publication Critical patent/CN109829166A/en
Application granted granted Critical
Publication of CN109829166B publication Critical patent/CN109829166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a character-level convolutional neural network-based opinion mining method for residential and residential customers, which comprises the following steps of: constructing a web crawler, collecting all civil comment and establishing a civil dictionary, performing feature extraction and vectorization and visual theme clustering on the text by using TF-IDF, constructing a civil theme dictionary, finding out corresponding evaluation number in the text after sentence division, constructing a convolutional neural network of a one-dimensional convolutional kernel for feature extraction based on weak supervision pre-classification of naive Bayes, obtaining emotional polarity, performing emotion visualization on the emotional polarity and verifying a model; the method can mine the emotion and user requirements hidden in the personalized comments from a large amount of noisy and false comment data, and is beneficial to enterprise organization and individual decision-making behaviors of users.

Description

People and host customer opinion mining method based on character-level convolutional neural network
Technical Field
The invention relates to the field of opinion mining methods for residential customers, in particular to an opinion mining method for residential customers based on a character-level convolutional neural network.
Background
Customer opinion mining is the analysis of customer requirements and opinions, the analysis of customer comments is beneficial to the improvement and iteration of the residential services, because of the intangibility of the residential services, the online comments of residents have larger influence than other kinds of information sources, therefore, the improvement of the service quality by means of the customer opinion mining is the key for rapidly accumulating competitive advantages, and the mainstream customer opinion mining modes have two types, namely, aiming at structural data analysis, namely, acquiring sensible and effective attributes based on structural data, such as questionnaires, litter tables, semantic difference tables and the like; and secondly, aiming at unstructured data analysis, namely analyzing the characteristics of the data by a natural language processing technology and a visualization technology, a large amount of texts for expressing opinions can be obtained from comment websites, forums, blogs and social media, and with the help of an emotion analysis system, the unstructured information can be automatically converted into structured data, namely, other subjects expressing opinions related to products, services, brands, politics or people can be captured.
The civil and residential comments have the characteristics of strong timeliness, independent context and theme, clear viewpoints, short space, random expression and the like, and the existing customer opinion mining mode has the defects in the aspect of efficiently mining the viewpoints and emotions of customers hidden in noise and cannot meet the actual requirements.
Disclosure of Invention
Aiming at the problems, the method can mine the emotion and the user requirements hidden in the personalized comments from a large amount of noisy and false comment data, and is beneficial to enterprise organization and the individual decision-making behavior of the user.
The invention provides a character-level convolutional neural network-based opinion mining method for residential and residential customers, which comprises the following steps of:
the method comprises the following steps: collecting and preprocessing online civilian comments, constructing a web crawler, collecting all the civilian comments, establishing a civilian-out dictionary, replacing punctuation marks by line-feed marks by using a word tagging function of a Hayada open source LTP, and decomposing subject sentences in the comments to form a subject evaluation text;
step two: topic clustering, namely performing feature extraction and vectorization on a topic evaluation text by using TF-IDF, performing visual topic clustering on civil and residential comments by using pyLDAvis to obtain a visual clustering result, selecting an initial text document number k according to a topic selection standard with high intra-cluster similarity and low inter-cluster similarity to obtain an initial model, and calculating the correlation among topics t;
step three: constructing a citizen host question dictionary by utilizing the citizen host standard file and the visual clustering result;
step four: finding out the corresponding evaluation number in the subject evaluation text after sentence division by means of attribute word matching, and then counting the evaluation number of the corresponding subject;
step five: weak supervision pre-classification based on naive Bayes, wherein the automatic annotation part of a web crawler is used for evaluating two types of emotions without the original comment for evaluation, k is the number of keywords of the comment, j is the number of categories, and an evaluation posterior probability is calculated in a text word frequency vectorization mode, wherein the output probability is more than 0.5, and the pre-classification is considered to be successful;
step six: performing emotion analysis on civil and residential comments based on C-CNN-SA, taking character-level unstructured comments as original signals, performing duplication removal according to characters, performing descending arrangement according to character frequency to establish a character table, quantizing the comments in a mode of inquiring position IDs in the character table, constructing a convolutional neural network of one-dimensional convolutional kernels to perform feature extraction, outputting through a softmax function to obtain emotion polarities, and printing parameters of the model through a Keras neural network tool;
step seven: extracting the convolutional neural network characteristics of the one-dimensional convolutional kernel to obtain emotion polarity, performing emotion visualization, comparing the opinion tendencies of the customers under a plurality of themes, and performing targeted improvement on the compared opinion tendencies of the customers under the plurality of themes so as to improve the overall satisfaction of the residents;
step eight: and (3) verifying the model, namely performing 10 experiments under the equivalent condition by using a ten-fold cross verification model evaluation method, and verifying the effectiveness of the model by using the average test set accuracy, the average precision, the average recall rate and the average F value as evaluation indexes.
The further improvement is that: the formula of TF-IDF in the second step is shown as formula (1):
Figure BDA0001970585230000031
the distribution condition of the characteristic items among different categories in one category and the distinguishing degree of the position factors of the characteristic words to the text are different, when the entries appear at different positions of the text document, the contribution to the distinguishing degree is different, the weight of the characteristic words is calculated by using a TF-IDF method, and the word w is in c t The improved IDF calculation formula in the class is shown in formula (2):
Figure BDA0001970585230000041
in equations (1) and (2), N is the total number of text documents, T is the total number of terms, where the number of text documents containing terms T is x, and c is t The number of the text documents of (1) is y, except c t The number of text documents containing the entry t outside is k.
The further improvement is that: the topic relevance calculation in the second step is shown as formula (3):
relevance(term_w|topic_t)=λ*p(w|t)+(1-λ)*p(w|t)/p(w) (3)
in formula (3), the relevance of a certain word topic is regulated by a parameter λ, and if λ is close to 1, a word w which appears more frequently under the topic t is more relevant to the topic t; if λ is closer to 0, then the more specific and unique word w under the topic t is more relevant to the topic t, and the relevance of the domain word term _ w to the topic _ t is changed by adjusting the magnitude of λ.
The further improvement lies in that: and in the second step, the value of the number k of the text documents refers to a civilian standard document, then an experiment is used for passing k =6 as a reference, a method of sequentially increasing the k value is adopted, and the selection of the theme attribute words is carried out by reducing the intersection among the subjects and observing the minimum k value which is not covered by the theme as the number of the themes.
The further improvement lies in that: the calculation formula of the output probability in the step five is shown as a formula (4):
Figure BDA0001970585230000042
in order to remove false comments and increase the accuracy of emotion analysis, pre-classification is used as data cleaning, labels of 0 and 1 are used for representing negative and positive respectively during pre-classification, positive texts with high confidence degrees are output with probability values larger than 0.9, and negative texts with high confidence degrees are output with probability values smaller than 0.1.
The further improvement lies in that: in the sixth step, firstly, a pixel level processing scheme in image processing is referred to, assuming that the size of a dictionary is n, comments are vectorized by using an ID of a character in a mode of establishing a character table, then a Con convolutional neural network layer is introduced for processing, character vectors of all characters of a sentence are spliced into a sentence matrix by using an Embdding layer on an input layer, the length of Pad is 200 to cover 99% of the text length, a method of supplementing 0 to Pre header is adopted, under the condition that the text length is insufficient, 0 is filled in the front, the character weight of the Embdding layer is set to be training and updating, then a one-dimensional Convolution kernel Convolution1D is used for feature extraction, a global maximum pooling layer sampling and two full-connection layers are adopted, finally, a softmax probability value of an active label is output as emotion polarity, and parameters of the model are printed by a Ketras neural network tool.
The further improvement lies in that: and in the sixth step, when the character vectors of all characters of one sentence are single characters, word segmentation processing is not carried out.
The invention has the beneficial effects that: the method can excavate the emotion and user requirements hidden in the personalized comments from a large amount of noisy and false comment data, is beneficial to enterprise organization and individual decision behaviors of users, can excavate the satisfaction conditions of the customers under various themes from a data-driven angle, can provide suggestions for residential managers and supervisors as a result, and can provide an emotion analysis algorithm suitable for visual theme extraction and weak supervision pre-training of the residential comments by improving an opinion excavation algorithm aiming at the problem of less national language materials, so that the implicit characteristic theme extraction and emotion analysis of the online residential comments can be realized, and the effectiveness of the model can be accurately verified by verifying the model.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic diagram of the LDA probabilistic model of the present invention.
FIG. 3 is a schematic diagram of the structure of the model of the method of the present invention.
FIG. 4 is a schematic diagram of the parameters of the method model of the present invention.
Fig. 5 is a schematic view illustrating visualization of a host topic in an embodiment of the present invention.
FIG. 6 is a schematic diagram of the comment ratios of the subjects in the embodiment of the present invention.
FIG. 7 is a diagram illustrating a service-emotion polarity distribution in an embodiment of the present invention.
FIG. 8 is a schematic diagram illustrating a customer opinion visualization under various topics in an embodiment of the present invention.
FIG. 9 is a diagram illustrating the experience-emotion polarity distribution in an embodiment of the present invention.
FIG. 10 is a diagram of feature-emotion polarity distribution in accordance with an embodiment of the present invention.
FIG. 11 is a schematic view of the facility-emotion polarity distribution in an embodiment of the present invention.
FIG. 12 is a schematic diagram of traffic-emotion polarity distribution in an embodiment of the present invention.
FIG. 13 is a schematic diagram of the price-emotion polarity distribution in an embodiment of the present invention.
FIG. 14 is a diagram illustrating the distribution of the polarity of context-emotion in an embodiment of the present invention.
FIG. 15 is a schematic diagram of a dining-emotion polarity distribution in an embodiment of the present invention.
Detailed Description
In order to make the technical means, objectives and functions of the invention easy to understand, the invention will be further described with reference to the following embodiments.
According to fig. 1, 2, 3, 4, 5, 6, 7, and 8, the present embodiment provides a character-level convolutional neural network-based opinion mining method for residential and residential customers, including the following steps:
the method comprises the following steps: the method comprises the steps of online civilian comment acquisition and pretreatment, web crawler construction, acquisition of travel Chongqing civilian comment blocks, acquisition of all the civilian comments of all the travel Chongqing blocks between 2016, 7, 26 and 2018, 7, 26 and establishing a civilian comment dictionary, wherein the constructed theme attribute words are 100, the number of comment pieces is less than 100, which influences theme extraction, so that the data only selects and scores the civilian comments with the number of comment users more than 100, the number of finally sorted qualified corpus pieces is 81810 in total, and 10000 unmarked pursuit comments are contained. After a civil-home dictionary is established, punctuation marks are replaced by line-feed marks by utilizing a word tagging function of an LTP (long term evolution) word of Kazakh-Haohang open source, and subject sentences in comments are decomposed to form texts, such as ' boss enthusiasm ', clean and tidy rooms, and ' the passenger stack is decomposed into ' boss enthusiasm ', ' clean and tidy rooms ' in scenic spots and ' three subject evaluations of the passenger stack in the scenic spots ';
step two: topic clustering, namely performing characteristic extraction and vectorization on a topic evaluation text by using TF-IDF, performing visual topic clustering on civil and residential comments by using pyLDAvis to obtain a visual clustering result, selecting an initial text document number k according to a topic selection standard with high intra-cluster similarity and low inter-cluster similarity to obtain an initial model, and calculating the correlation among topics t, wherein the TF-IDF formula is shown as a formula (1):
Figure BDA0001970585230000071
the distribution condition of the characteristic items in different categories in one category and the distinguishing degree of the position factors of the characteristic words to the text are different, when the entries appear in different positions of the text document, the contribution size to the distinguishing degree is different, the weight of the characteristic words is calculated by using a TF-IDF method, and the word w is in c t The improved IDF calculation formula in the class is shown in formula (2):
Figure BDA0001970585230000072
in equations (1) and (2), N is the total number of text documents, T is the total number of terms, where the number of text documents containing terms T is x, and c is t The number of text documents of (1) is y, except c t The number of the text documents which comprise the entry t is k;
the topic relevance calculation is shown in equation (3):
relevance(term_w|topic_t)=λ*p(w|t)+(1-λ)*p(w|t)/p(w) (3)
in the formula (3), the correlation of a certain word topic is regulated by a lambda parameter, if lambda is close to 1, then the word w which appears more frequently under the topic t is more correlated with the topic t; if lambda is closer to 0, more special and more unique words w under the theme t are more related to the theme t, the relevance between the domain word term _ w and the theme topic _ t is changed by adjusting the size of lambda, the left circle in fig. 5 represents different themes, the distance between the circles is the similarity between each theme, the selection of theme attribute words is performed by referring to a civilian standard file and sequentially increasing the value of K based on K =6 by using experiments, when the number of themes K =8 is less, the distribution is uniform, the effect is best, an eighth theme is selected, the theme words contained in the theme words include surrounding environment, elevator, bedding, garden, table and road, the like, the theme words include independent health words and are connected with each other, the theme words include 7 theme words, and the environment subjects include the largest theme words, and the theme words are summarized by referring to the environment standard file and the largest theme words are obtained by checking the independent health care system and the related to the theme words include 8 themes;
the citizen subject dictionary is constructed by means of the citizen standard file and the visual clustering assistance, and the constructed citizen subjects and subject attribute words are shown in the following table 1:
TABLE 1 topic Attribute word set
Figure BDA0001970585230000081
Finding out the corresponding evaluation number after sentence division by means of attribute word matching, counting the evaluation number of the corresponding subject, and finding that the attention of the customer opinions to facilities, services, environments, traffic, catering, features, prices and experiences is weakened in sequence in the civil and residential comments, wherein the number of the comments on the price and the experiences is small, and the specific result is shown in fig. 6;
step three: constructing a citizen host question dictionary by utilizing the citizen host standard file and the visual clustering result;
step four: finding out the corresponding evaluation number in the subject evaluation text after sentence division by means of attribute word matching, and then counting the evaluation number of the corresponding subject;
step five: weak supervision pre-classification based on naive Bayes, the original comments which are not evaluated are automatically marked by a web crawler, k is assumed to be the number of keywords of the comments, j is the number of categories, two types of emotions are evaluated, the posterior probability of one evaluation is calculated in a text word frequency vectorization mode, the output probability is more than 0.5, the pre-classification is considered to be successful, and the calculation formula of the output probability is shown as a formula (4):
Figure BDA0001970585230000091
in order to remove false comments and increase the accuracy of emotion analysis, pre-classification is used as data cleaning, labels of 0 and 1 are used for representing negative and positive respectively during pre-classification, the output probability value is greater than 0.9 and is used as a positive text with high confidence coefficient, and the output probability value is less than 0.1 and is used as a negative text with high confidence coefficient;
step six: the civil-home comment sentiment analysis based on the C-CNN-SA is characterized in that a text at a character level is taken as an original signal, duplication is removed according to characters, a character table is established according to character frequency descending order, comments are vectorized by inquiring position IDs in the character table, a convolutional neural network of one-dimensional convolutional cores is established for feature extraction, sentiment polarity is obtained, firstly, a pixel level processing scheme in image processing is referred to, an input layer (InputLayer) is a character vector evaluated in each sentence, an output layer outputs the sentiment polarity by using softmax, a model structure is shown in FIG. 3, the size of a dictionary is assumed to be n, the comments are vectorized by using the IDs of the characters in a mode of establishing the character table, then a Con convolutional neural network is introduced for processing, character vectors of all sentences are spliced into a sentence matrix by using an Embing _1 layer in the input layer (InputLayer), and statistics is carried out to obtain: the method comprises the steps that the length of a character matrix is 200 to cover 99% of the length of a text (input: 200), a method of supplementing 0 by a 'pre' header is adopted for the text with variable length, under the condition that the length of the text is not 200, 0 is filled in the front, the character weight of an Embdding layer is set to be training and updating, then, the Convolution1D (conv 1D _ 1) is used for carrying out feature extraction, a softmax probability value of an active label is taken as an emotion polarity and is finally output, parameters of the model are printed through a Keras network tool, the specific parameters are shown in figure 4, and when character vectors of all characters of a sentence are independent words, word segmentation processing is not carried out;
in fig. 6, the abscissa is used to represent the corresponding topic evaluation emotional tendency, the emotional score value of each comment falls between [0,1], the step size of the abscissa is set to 0.01, the closer to 1 (right part) represents the stronger positive emotion, the closer to 0 (left part) represents the stronger negative emotion, when the output probability is in the middle position, the emotion can be considered to be neutral, the emotional value is 0, and the ordinate represents the number of corresponding topic comments;
step seven: extracting the convolutional neural network features of the one-dimensional convolutional kernel to obtain emotion polarities, performing emotion visualization, comparing the opinion tendencies of customers under multiple themes, and performing targeted improvement on the compared opinion tendencies of customers under multiple themes to improve the overall satisfaction of residents, wherein the emotion tendency distribution under each theme can be obtained by summarizing the results of the statistical graph 6 to form a comment emotion polarity graph of customers under each theme, and 8 theme emotion tendency graphs in total are shown in fig. 7, and from the emotion analysis results of the resident customer in the theme, the following conclusions can be obtained through analysis: the evaluation of the themes of Chongqing residents such as 'service', 'traffic', 'experience', 'environment', 'price' and the like is high, the image is obviously inclined towards the right part, the theme is not separated from the geographical position of the 'mountain and water city' of the service industry in Chongqing, chongqing communication is convenient, the tourist index is gradually increased in recent years, a large number of foreign tourists are attracted to Chongqing play, the experience feeling of the residents is novel, from the consumption price, the Chongqing is in the southwest area, the consumption is slightly lower than that in the eastern area, the price is really good for the customers, but the sentiment of the tourists is stronger from the aspect of 'dining drink', 'characteristic' and 'facility', the sentiment of the tourists in other areas, which are probably mainly eaten in the Chongqing area, are probably not used for food, generally, the positions of the residents are close to scenic spots and many, the cost problem is considered, the investment on facilities is less, the opinions of customers are large, the facilities can be updated through cooperation with the scenic spots in the later period, the sentiment polarity diagram can only represent the opinion tendencies of the customers under a single theme, the sentiment polarity diagram is analyzed in a step-by-step mode, sentiment visualization is carried out according to the step length of 0.2, the opinion tendencies of the customers under multiple themes are compared, the horizontal coordinate represents an evaluation theme, the vertical coordinate represents sentiment proportion, the opinions of the customers under multiple themes can be simultaneously compared, the condition displayed in the diagram is consistent with the single sentiment polarity, the opinions of the customers to facilities and restaurants of the residents are large, the satisfaction degree is low, and then targeted improvement can be carried out according to the above, so that the overall satisfaction degree of the residents is improved, and particularly as shown in fig. 8;
step eight: the method comprises the following steps of verifying a model, using a ten-fold cross validation model evaluation method, carrying out 10 experiments under equivalent conditions, using the accuracy, average precision, average recall rate and average F value of an average test set as evaluation indexes to verify the validity of the model, using 36000 texts filtered by a weak trainer and manually selected by a training set, selecting 12000 comments marked manually by the test set, using four algorithms of a Decision Tree (DT), a Naive Bayes (NB), a Support Vector Machine (SVM) and an RNN (LSTM) and carrying out a comparison experiment in a way of judging whether weak supervision pre-training is used or not, using C-CNN-SA to represent a text model, using CNN-W to represent a character level CNN not classified by the weak classifier, using a standard word level CNN, using CNN-S to represent a word level CNN after a word is used for stopping, using C-RNN to represent LSTM at the character level, and obtaining the evaluation results shown in a table 2:
TABLE 2 model evaluation data sheet
Figure BDA0001970585230000121
As can be seen from Table 2, after the preprocessing step is added, the accuracy of the test set is improved by 2%, in the aspect of emotion classification, the method utilizes an improved model to compare with a traditional word-level model, the classification accuracy is improved to a certain extent, under short text emotion classification, the granularity accuracy of a character level is higher than that of a word level, due to the fact that the reason that the short text is expected, the classification performance is reduced due to the fact that text information may be lost by using stop word filtering, the text of the character level is taken as an original input signal, a one-dimensional convolutional neural network is directly used for feature extraction, under the condition of a short text, the meaning of a language word level does not need to be considered, and the emotion analysis engineering is simplified by the mode.
The method can excavate the emotion and user requirements hidden in the personalized comments from a large amount of noisy and false comment data, is favorable for enterprise organization and the decision-making behavior of users, can excavate the satisfaction conditions of the customers under various themes from the perspective of data driving, can provide suggestions for people and managers and supervisors as a result, and can extract and analyze the implicit characteristic theme of the online people and hosts and the emotion by improving the opinion excavation algorithm and proposing the emotion analysis algorithm suitable for visual theme extraction and weak supervision pre-training of the people and hosts for the problem of less language material of the people and hosts, and can accurately verify the effectiveness of the model by verifying the model.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. The character-level convolutional neural network-based opinion mining method for the residential customers is characterized by comprising the following steps of:
the method comprises the following steps: collecting and preprocessing on-line civil-residential comments, constructing a web crawler, collecting all the civil-residential comments, establishing a civil-residential dictionary, replacing punctuation marks by line-feed marks by using a word tagging function of a Hayada open source LTP, and decomposing subject sentences in the comments to form a subject evaluation text;
step two: topic clustering using TF-IDF formula
Figure DEST_PATH_IMAGE001
After the theme evaluation text is subjected to feature extraction and vectorization, the citizen is commented on by using pyLDAvisPerforming visual theme clustering to obtain a visual clustering result, selecting an initial text document number k according to theme selection standards with high intra-cluster similarity and low inter-cluster similarity to obtain an initial model, and calculating the correlation among themes t;
in the formula, N is the total text document number, T is the total entry number, wherein the text document number containing the entry T is x;
step three: constructing a citizen subject dictionary by utilizing the citizen standard file and the visual clustering result;
step four: finding out the corresponding evaluation number in the subject evaluation text after sentence division by means of attribute word matching, and then counting the evaluation number of the corresponding subject;
step five: weak supervision pre-classification based on naive Bayes, automatically labeling part of original comments without additional evaluation by a web crawler, assuming that k is the number of key words of the comments and j is the number of categories, evaluating two types of emotions, calculating an evaluation posterior probability by means of word frequency vectorization of a text, and judging that the pre-classification is successful if the output probability is more than 0.5;
step six: performing emotion analysis on civil and residential comments based on C-CNN-SA, taking character-level unstructured comments as original signals, performing duplication removal according to characters, performing descending arrangement according to character frequency to establish a character table, vectorizing the comments by inquiring position IDs in the character table, constructing a convolutional neural network of one-dimensional convolutional kernels to perform feature extraction, sampling through a global maximum pooling layer and two full-connected layers, outputting through a softmax function to obtain emotion polarity, and printing parameters of the model through a Keras neural network tool;
step seven: extracting the convolutional neural network characteristics of the one-dimensional convolutional kernel to obtain emotion polarity, performing emotion visualization, comparing the customer opinion tendencies under a plurality of themes, and performing targeted improvement on the compared customer opinion tendencies under the plurality of themes so as to improve the overall satisfaction of residents;
step eight: and (3) verifying the model, namely performing 10 experiments under the equivalent condition by using a ten-fold cross verification model evaluation method, and verifying the effectiveness of the model by using the average test set accuracy, the average precision, the average recall rate and the average F value as evaluation indexes.
2. The character-level convolutional neural network-based opinion mining method of people and guests, as claimed in claim 1, wherein:
the distribution condition of the characteristic items in different categories in a category and the distinguishing degree of the position factors of the characteristic words to the text are different, when the entries appear at different positions of the text document, the contribution size to the distinguishing degree is different, the weight of the characteristic words is calculated by using a TF-IDF method, and an improved IDF calculation formula of the words w in the ct category is shown as a formula (2):
Figure DEST_PATH_IMAGE002
in the formula (2), N is the total number of text documents, T is the total number of terms, where the number of text documents containing terms T is x, the number of text documents containing terms ct is y, and the number of text documents containing terms T except for ct is k.
3. The character-level convolutional neural network-based opinion mining method of people and guests, as claimed in claim 1, wherein: the topic relevance calculation in the second step is shown as formula (3):
relevance(term_w|topic_t)=λ*p(w|t)+(1-λ)*p(w|t)/p(w) (3)
in formula (3), the relevance of a certain word topic is regulated by a parameter λ, and if λ is close to 1, a word w which appears more frequently under the topic t is more relevant to the topic t; if λ is closer to 0, then the more specific and unique word w under the topic t is more relevant to the topic t, and the relevance of the domain word term _ w to the topic _ t is changed by adjusting the magnitude of λ.
4. The character-level convolutional neural network-based opinion mining method of people and guests, as claimed in claim 1, wherein: and in the second step, the value of the number k of the text documents refers to a civilian standard document, then an experiment is used for passing k =6 as a reference, a method of sequentially increasing the k value is adopted, and the selection of the theme attribute words is carried out by reducing the intersection among the subjects and observing the minimum k value which is not covered by the theme as the number of the themes.
5. The character-level convolutional neural network-based opinion mining method of people and guests, as claimed in claim 1, wherein: the calculation formula of the output probability in the step five is shown as a formula (4):
Figure DEST_PATH_IMAGE003
in order to remove false comments and increase the accuracy of emotion analysis, pre-classification is used as data cleaning, labels of 0 and 1 are used for representing negative and positive respectively during pre-classification, positive texts with higher confidence degrees are output with probability values larger than 0.9, and negative texts with higher confidence degrees are output with probability values smaller than 0.1.
6. The character-level convolutional neural network-based opinion mining method of people and guests, as claimed in claim 1, wherein: in the sixth step, firstly, a pixel level processing scheme in image processing is referred to, assuming that the size of a dictionary is n, the comments are vectorized by using the ID of characters in a mode of establishing a character table, then a Con convolutional neural network is introduced into a layer for processing, character vectors of all characters of a sentence are spliced into a sentence matrix by using an Embdding layer on an input layer, the length of Pad is 200 to cover 99% of the text length, a method of supplementing 0 to Pre header is adopted, under the condition that the text length is insufficient, 0 is filled in the front, the character weight of the Embdding layer is set to be training update, then a one-dimensional convolutional kernel Convolution1D is used for feature extraction, a global maximum pooling layer sampling and two full-connection layers are adopted, finally, the softmax probability value of an active label is output as the emotion polarity, and the parameters of the model are printed by a Keras neural network tool.
7. The method of claim 6, wherein the character-level convolutional neural network-based opinion mining method for people and their residents is as follows: and in the sixth step, when the character vectors of all characters of one sentence are single characters, word segmentation processing is not carried out.
CN201910117188.0A 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network Active CN109829166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910117188.0A CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910117188.0A CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Publications (2)

Publication Number Publication Date
CN109829166A CN109829166A (en) 2019-05-31
CN109829166B true CN109829166B (en) 2022-12-27

Family

ID=66862072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910117188.0A Active CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Country Status (1)

Country Link
CN (1) CN109829166B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347828B (en) * 2019-06-26 2022-03-15 西南交通大学 Subway passenger demand dynamic acquisition method and acquisition system thereof
CN110688451A (en) * 2019-08-15 2020-01-14 中国平安人寿保险股份有限公司 Evaluation information processing method, evaluation information processing device, computer device, and storage medium
CN110838287B (en) * 2019-10-16 2022-04-19 中国第一汽车股份有限公司 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium
CN111027553A (en) * 2019-12-23 2020-04-17 武汉唯理科技有限公司 Character recognition method for circular seal
CN111159409B (en) * 2019-12-31 2023-06-02 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111309859B (en) * 2020-01-21 2023-07-07 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device
CN111445271A (en) * 2020-03-31 2020-07-24 携程计算机技术(上海)有限公司 Model generation method, and prediction method, system, device and medium for cheating hotel
CN112070856B (en) * 2020-09-16 2022-08-26 重庆师范大学 Limited angle C-arm CT image reconstruction method based on non-subsampled contourlet transform
CN112784776B (en) * 2021-01-26 2022-07-08 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network
CN113778454B (en) * 2021-09-22 2024-02-20 重庆海云捷迅科技有限公司 Automatic evaluation method and system for artificial intelligent experiment platform
CN116385029B (en) * 2023-04-20 2024-01-30 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109308317A (en) * 2018-09-07 2019-02-05 浪潮软件股份有限公司 A kind of hot spot word extracting method of the non-structured text based on cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109308317A (en) * 2018-09-07 2019-02-05 浪潮软件股份有限公司 A kind of hot spot word extracting method of the non-structured text based on cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Sentiment Analysis on a Set of Movie Reviews Using Deep Learning Techniques;Koyel Chakraborty等;《Social Network Analytics:Computational Research Methods and Techniques》;20181231;全文 *
基于朴素贝叶斯网页分类的用户行为推衍;秦鹏等;《沈阳工业大学学报》;20180131(第01期);全文 *
基于深度学习的中文影评情感分析;周敬一等;《上海大学学报(自然科学版)》;20181031(第05期);全文 *

Also Published As

Publication number Publication date
CN109829166A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
Zheng et al. Identifying unreliable online hospitality reviews with biased user-given ratings: A deep learning forecasting approach
CN106649760A (en) Question type search work searching method and question type search work searching device based on deep questions and answers
CN106598944A (en) Civil aviation security public opinion emotion analysis method
Chang et al. Research on detection methods based on Doc2vec abnormal comments
CN110390018A (en) A kind of social networks comment generation method based on LSTM
CN103793503A (en) Opinion mining and classification method based on web texts
CN103177024A (en) Method and device of topic information show
CN112905739B (en) False comment detection model training method, detection method and electronic equipment
KR20120109943A (en) Emotion classification method for analysis of emotion immanent in sentence
CN110472203B (en) Article duplicate checking and detecting method, device, equipment and storage medium
CN108614855A (en) A kind of rumour recognition methods
CN105740382A (en) Aspect classification method for short comment texts
Lalata et al. A sentiment analysis model for faculty comment evaluation using ensemble machine learning algorithms
CN115329085A (en) Social robot classification method and system
Mozafari et al. Emotion detection by using similarity techniques
Martin et al. Are influential writers more objective? An analysis of emotionality in review comments
Abid et al. Semi-automatic classification and duplicate detection from human loss news corpus
CN110781300B (en) Tourism resource culture characteristic scoring algorithm based on Baidu encyclopedia knowledge graph
Tang et al. Evaluation of Chinese sentiment analysis APIs based on online reviews
Asha et al. Fake news detection using n-gram analysis and machine learning algorithms
Sai Ensemble machine learning models in predicting personality traits and insights using Myers-Briggs dataset
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN113220964A (en) Opinion mining method based on short text in network communication field
Nguyen et al. Analyzing customer experience in hotel services using topic modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant