CN110705286A - Comment information-based data processing method and device - Google Patents

Comment information-based data processing method and device Download PDF

Info

Publication number
CN110705286A
CN110705286A CN201910906324.4A CN201910906324A CN110705286A CN 110705286 A CN110705286 A CN 110705286A CN 201910906324 A CN201910906324 A CN 201910906324A CN 110705286 A CN110705286 A CN 110705286A
Authority
CN
China
Prior art keywords
word
words
constructed
emotion
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910906324.4A
Other languages
Chinese (zh)
Inventor
蒋剑波
刘志锋
肖桂林
陈晓祥
翟红亮
李炜杰
吴思思
周苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aoki Digital Technology Ltd By Share Ltd
Original Assignee
Aoki Digital Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aoki Digital Technology Ltd By Share Ltd filed Critical Aoki Digital Technology Ltd By Share Ltd
Priority to CN201910906324.4A priority Critical patent/CN110705286A/en
Publication of CN110705286A publication Critical patent/CN110705286A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a data processing method based on comment information, which comprises the following steps: acquiring evaluation text data of a user through a development platform interface; performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list; performing synonym conversion on the word list through a pre-constructed dimension word library; carrying out sentiment word recognition on the word list subjected to synonym conversion through one or more pre-constructed sentiment word banks, calculating according to preset weights of different sentiment words, and scoring the sentiment words; counting the score condition of the emotional words, and sorting to obtain the comprehensive score evaluation satisfaction degree of the user; according to the invention, the evaluation information of the user is subjected to multi-dimensional identification and scoring through various word banks, and related evaluation content is obtained by mining from the evaluation information, so that a merchant can self-adjust according to the evaluation content, and the service quality is improved.

Description

Comment information-based data processing method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a comment information-based data processing method and device.
Background
In the existing large e-commerce platforms, a merchant can use the system to automatically reply to an evaluation function, and after a customer evaluates a purchased commodity, the system automatically replies according to a reply language preset by the merchant; however, the existing automatic reply evaluation system only depends on a few comment sentences provided by merchants to automatically reply, and lacks deep mining analysis on data information; the comment sentences obtained by the merchant in the automatic reply system cannot acquire the content related to the service evaluation by the customer, so that the merchant cannot self-adjust the evaluation information made by the customer, and the service quality is improved.
Disclosure of Invention
The invention provides a data processing method and device based on comment information, which are used for carrying out multi-dimensional identification and scoring on the evaluation information of a user through a plurality of word banks so as to solve the technical problem that the comment sentences obtained by a merchant in the prior art cannot know the content related to service evaluation by the customer, and therefore the related evaluation content is obtained by mining from the evaluation information, so that the merchant can self-adjust according to the evaluation content and the service quality is improved.
In order to solve the above technical problem, an embodiment of the present invention provides a data processing method based on comment information, including:
acquiring evaluation text data of a user through a development platform interface;
performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
carrying out synonym conversion on the word list through a pre-constructed dimension word library;
carrying out sentiment word recognition on the word list subjected to synonym conversion through one or more pre-constructed sentiment word banks, calculating according to preset weights of different sentiment words, and scoring the sentiment words;
and counting the score conditions of the emotional words, and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
As a preferred scheme, the construction process of the word segmentation word stock comprises the following steps:
performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph;
constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability; the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
Preferably, after obtaining the word list, the method further includes:
and matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters, so that the efficiency and the recognition degree of the model are increased.
As a preferred scheme, the method for identifying the emotion words in the synonym-converted word list through one or more emotion word banks which are constructed in advance, calculating according to preset weights of different emotion words, and scoring the emotion words comprises the following steps:
marking the word list subjected to synonym conversion according to the emotion intensity degree of the emotion words through a pre-constructed emotion word library;
filing and scoring the words with the same dimensionality through a pre-constructed dimensionality emotional word lexicon of the word list subjected to synonym conversion;
and identifying the independent emotional words and scoring the comment text by the aid of the word list subjected to synonym conversion through a pre-constructed independent emotional word library.
An embodiment of the present invention further provides a data processing apparatus based on comment information, including:
the data acquisition module is used for acquiring evaluation text data of a user through a development platform interface;
the text word segmentation module is used for performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
the word conversion module is used for carrying out synonym conversion on the word list through a pre-constructed dimension word bank;
the recognition scoring module is used for recognizing the emotion words in the word list subjected to synonym conversion through one or more emotion word banks which are constructed in advance, calculating according to preset weights of different emotion words and scoring the emotion words;
and the statistic output module is used for counting the score condition of the emotional words and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
As a preferred scheme, the construction process of the word segmentation word stock comprises the following steps:
performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph;
constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability; the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
Preferably, the comment information-based data processing apparatus further includes: and the stop word removing module is used for matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters after the word list is obtained, so that the efficiency and the recognition degree of the model are increased.
Preferably, the identification scoring module includes: the emotion word unit, the dimension word unit and the independent word unit;
the emotion word unit is used for scoring the word list subjected to synonym conversion according to the emotion intensity of the emotion words through a pre-constructed emotion word library;
the dimension word unit is used for filing and scoring the words with the same dimension through a pre-constructed dimension emotional word library on the word list subjected to synonym conversion;
and the independent word unit is used for identifying the independent emotional words and scoring the comment text through a pre-constructed independent emotional word library on the word list subjected to synonym conversion.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein the computer program, when running, controls an apparatus in which the computer-readable storage medium is located to execute the comment information-based data processing method according to any one of the above items.
An embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor, when executing the computer program, implements the comment information-based data processing method according to any one of the above items.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
according to the invention, the evaluation information of the user is subjected to multi-dimensional identification and scoring through various word banks, so that the technical problem that the comment sentences obtained by a merchant in the prior art cannot know the content related to the service evaluation of the customer is solved, and the related evaluation content is mined from the evaluation information, so that the merchant can self-adjust according to the evaluation content, and the service quality is improved.
Drawings
FIG. 1: the steps of the data processing method based on the comment information in the embodiment of the invention are a flow chart;
FIG. 2: the flow chart of the steps of identifying and scoring the emotional word lexicon in the embodiment of the invention;
FIG. 3: the flow chart of the step of identifying and scoring the independent emotion word bank in the embodiment of the invention is shown;
FIG. 4: the data processing device based on comment information in the embodiment of the invention is a schematic structural diagram.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a preferred embodiment of the present invention provides a comment information-based data processing method, including:
s1, acquiring the evaluation text data of the user through the development platform interface;
the data acquisition way is the development platform interface pulling.
S2, performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
and constructing a word segmentation word bank, and segmenting the comments based on the word segmentation word bank, namely segmenting a complete comment text into word lists which can be processed. The word segmentation word bank has the function of dividing some specific words in the word segmentation process, so that semantic misreading caused by system misreading is avoided, and the accuracy of an output result is influenced.
In this embodiment, the process of constructing the word segmentation word bank includes: performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph; constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability; the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
S3, carrying out synonym conversion on the word list through a pre-constructed dimension word library;
identifying the category of the merchant, calling a dimension word bank of the corresponding category according to the category of the merchant, and then carrying out synonym conversion on the comments based on the dimension word bank. The term "dimension" is a term library constructed by synonymously converting terms expressing the dimensions of a product, for example, terms such as "texture", "material", "texture" and "quality" into "quality".
S4, carrying out sentiment word recognition on the word list subjected to synonym conversion through one or more pre-constructed sentiment word banks, calculating according to preset weights of different sentiment words, and scoring the sentiment words;
and (4) performing emotion scoring on the evaluation according to a word bank of the multi-aspect emotion words, such as an emotion word bank, an independent emotion word bank and a dimension emotion word bank.
And S5, counting the score condition of the emotional words, and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
In another embodiment, after obtaining the word list, the method further includes:
and matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters, so that the efficiency and the recognition degree of the model are increased.
In another embodiment, the performing of the synonym conversion on the word list performs emotion word recognition through one or more emotion word libraries which are constructed in advance, and performs calculation according to preset weights of different emotion words, and scoring the emotion words includes:
s41, scoring the word list after synonym conversion according to the emotion intensity of the emotion words through a pre-constructed emotion word library;
the scoring criteria were: the words expressing the emotion of the customer are sorted according to the strength of the tone and divided into a plurality of grades, the high score represents high satisfaction degree, and the low score represents low satisfaction degree.
S42, filing and scoring the words with the same dimensionality through a pre-constructed dimensionality emotional word library by the word list subjected to synonym conversion; for example, the term "good look" represents "good look" and the term "cheap" represents "cheap price", and the words are filed, marked and scored.
And S43, identifying the independent emotion words and scoring the comment text through the word list subjected to synonym conversion through a pre-constructed independent emotion word library.
The present invention will be described in detail with reference to specific examples.
As shown in fig. 1, for a comment processing model, the processing flow of comment data according to the present invention is described as follows:
firstly, a word segmentation word bank is constructed, and the comments are segmented based on the word segmentation word bank, namely, a complete comment text is segmented into a word list which can be processed.
The construction process of the word segmentation word bank is as follows: realizing efficient word graph scanning based on the Trie tree structure, and generating a directed acyclic graph formed by all possible word forming conditions of Chinese characters in a sentence; searching a maximum probability path by adopting dynamic programming, and finding out a maximum segmentation combination based on word frequency; for unknown words, an HMM model based on Chinese character word forming capability is adopted, and a Viterbi algorithm is used; finally, some fixed combination words which cannot be identified due to the small sample size are artificially defined.
The word segmentation word bank is used for dividing specific words (such as network words of 'give strength', wrongly written words of 'high fashion', brand words of 'Korean Du Shu house', idioms of 'fast thunder not well enough for masking ears', words of 'out of business' including emotion words) in the word segmentation process, so that semantic misreading caused by system misreading is avoided, and accuracy of output results is influenced.
And then constructing a stop word library, namely, matching and removing words which are meaningless to the semantic recognition in the comment characters, so that the efficiency and the recognition degree of the model are increased.
And then identifying the category of the merchant, calling a dimension word bank of the corresponding category according to the category of the merchant, and then carrying out synonym conversion on the comments based on the dimension word bank.
The term "dimension" is a term library constructed by synonymously converting terms expressing the dimensions of a product, for example, terms such as "texture", "material", "texture" and "quality" into "quality". (the dimension words of the goods are divided into six major categories, namely 'quality', 'price', 'service', 'logistics', 'style' and 'other', 'other' dimensions are formulated according to different industries, for example, the shoe industry is concerned by the customers with the 'dimension' and the appliance industry is concerned by the customers with the 'function').
In synonym conversion, sentiment scoring is carried out on the evaluation according to word banks of multi-aspect sentiment words, such as a sentiment word bank, an independent sentiment word bank and a dimension sentiment word bank.
And constructing an emotional word library, and scoring the emotional words in the comments. The scoring criteria were: the words expressing the emotion of the customer are sorted according to the strength of the mood and are divided into five grades, the high score represents high satisfaction degree and is 5 scores at the highest, the low score represents low satisfaction degree and is 1 score at the lowest, for example, the 'very satisfactory' score is 5 scores, the 'satisfactory' score is 4 scores, the 'general' score is 3 scores, the 'unsatisfactory' score is 2 scores, and the 'very unsatisfactory' score is 1 score. And extracting public sentiment information of the comments according to a semantic structure (NA structure) of the noun + the quantitative adjectives. According to a large number of observations and practices, in e-commerce comments, comment information of a customer on a commodity is mostly expressed by an NA structure, such as good quality, fast logistics and the like, on the basis of the law, a model carries out semantic structure judgment according to word segmentation parts and word parts of comment texts, and then carries out dimension marking on the comment through dimension division and sentiment word values, for example, the quality of the comment is good, express delivery is slow, a label is marked, namely logistics 2 points and quality 4 points, and a specific flow is shown in a following figure 2.
Constructing a dimension emotional word library, wherein the term of 'good look' represents 'good look' and the term of 'cheap' represents 'cheap price', and filing, marking and scoring the words.
And constructing an independent emotion word library, identifying the independent emotion words and marking the comment text. The specific flow is shown in fig. 3 below.
The invention aims at huge user evaluation data on an e-commerce platform and builds a set of system for identifying user satisfaction according to user evaluation based on semantic rules under e-commerce situation, and performs semantic analysis and word cloud analysis on comment information extraction of customers and archiving and scoring of commodity dimensions. The method mainly comprises the steps of archiving commodity dimensions and quantitatively marking user emotion on an E-commerce comment text by constructing a semantic model and a dictionary definition score, further obtaining scores in different dimensions of commodities in comment data and scores of user behaviors (repurchase and recommendation) and calculating a satisfaction index of overall comment.
The invention has the advantages that:
1. the comment data extraction and mining application is deeper, the information points fed back by the user are analyzed more comprehensively and more accurately, and the prior art is rough and not intuitive and accurate enough;
2. the emotion words are directly quantized, so that the result is more visual, comparison of comment data of different commodities or shops is possible under the same rule, the prior art simply divides the emotion words into positive and negative, and the result is fuzzy and general;
3. for the comment data of different categories, through the replacement of the dimension words, the problem that the user has different scenes with different categories and attention points is solved, the comment data is more detailed and has reference value, most of the prior art only uses one universal dictionary and one universal model, and the comment data is not suitable for some merchants.
4. The comment text semantics under the e-commerce situation are deeply researched, dictionary combinations which possibly cause ambiguity are converted under the universality rule, the accuracy of semantic interpretation is greatly guaranteed, only one semantic rule is often used in the prior art, and the ambiguity condition of the semantics under the e-commerce situation is not considered.
Referring to fig. 4, correspondingly, an embodiment of the present invention further provides a data processing apparatus based on comment information, including:
the data acquisition module is used for acquiring evaluation text data of a user through a development platform interface;
the text word segmentation module is used for performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
in this embodiment, the process of constructing the word segmentation word bank includes: performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph; constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability; the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
The word conversion module is used for carrying out synonym conversion on the word list through a pre-constructed dimension word bank;
the recognition scoring module is used for recognizing the emotion words in the word list subjected to synonym conversion through one or more emotion word banks which are constructed in advance, calculating according to preset weights of different emotion words and scoring the emotion words;
and the statistic output module is used for counting the score condition of the emotional words and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
In another embodiment, the comment information-based data processing apparatus further includes: and the stop word removing module is used for matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters after the word list is obtained, so that the efficiency and the recognition degree of the model are increased.
In another embodiment, the identification scoring module comprises: the emotion word unit, the dimension word unit and the independent word unit;
the emotion word unit is used for scoring the word list subjected to synonym conversion according to the emotion intensity of the emotion words through a pre-constructed emotion word library;
the dimension word unit is used for filing and scoring the words with the same dimension through a pre-constructed dimension emotional word library on the word list subjected to synonym conversion;
and the independent word unit is used for identifying the independent emotional words and scoring the comment text through a pre-constructed independent emotional word library on the word list subjected to synonym conversion.
The method excavates the information value of the E-commerce comment data as much as possible through model processing from the evaluation information of the user, realizes accurate extraction of all dimension information of different types of target commodities, performs semantic analysis on the comment of the user by using a method for quantifying emotional words, and calculates the commodity satisfaction degree of the information fed back from the comprehensive comment and the shopping experience information of the user; the method can effectively reduce the misreading of comment semantics, construct a priority word segmentation dictionary, and perform priority segmentation according to the priority order of part of specific combined words; deeply analyzing the purchasing emotion of the user, constructing an emotion word dictionary, scoring the user by using the combination of the degree words and the emotion words, and carrying out quantitative processing on unstructured data; the shopping behavior of the user is predicted, the user behavior dictionary is constructed, semantic words which are related to ' repurchase ', ' recommendation ', ' bad comment ', have evaluation tendency ' and ' emotion ' of the customer comments are filed, and then the user purchasing behavior is further predicted.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; when running, the computer program controls the device where the computer-readable storage medium is located to execute the comment information-based data processing method according to any one of the above embodiments.
An embodiment of the present invention further provides a terminal device, where the terminal device includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor, when executing the computer program, implements the comment information-based data processing method according to any of the above embodiments.
Preferably, the computer program may be divided into one or more modules/units (e.g., computer program) that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor may be any conventional Processor, the Processor is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.
The memory mainly includes a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like, and the data storage area may store related data and the like. In addition, the memory may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, or may also be other volatile solid state memory devices.
It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the terminal device is only an example and does not constitute a limitation of the terminal device, and may include more or less components, or combine some components, or different components.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

Claims (10)

1. A data processing method based on comment information is characterized by comprising the following steps:
acquiring evaluation text data of a user through an open platform interface;
performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
carrying out synonym conversion on the word list through a pre-constructed dimension word library;
carrying out sentiment word recognition on the word list subjected to synonym conversion through one or more pre-constructed sentiment word banks, calculating according to preset weights of different sentiment words, and scoring the sentiment words;
and counting the score conditions of the emotional words, and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
2. The comment information-based data processing method of claim 1, wherein the construction process of the participle lexicon comprises:
performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph;
constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability;
the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
3. The comment-information-based data processing method of claim 1 further comprising, after the obtaining of the word list:
and matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters, so that the efficiency and the recognition degree of the model are increased.
4. The comment information-based data processing method of claim 1, wherein the synonym-converted word list is subjected to emotion word recognition through one or more emotion word libraries constructed in advance, and is calculated according to preset weights of different emotion words, and the emotion words are scored, and the method comprises the following steps:
marking the word list subjected to synonym conversion according to the emotion intensity degree of the emotion words through a pre-constructed emotion word library;
filing and scoring the words with the same dimensionality through a pre-constructed dimensionality emotional word lexicon of the word list subjected to synonym conversion;
and identifying the independent emotional words and scoring the comment text by the aid of the word list subjected to synonym conversion through a pre-constructed independent emotional word library.
5. A comment information-based data processing apparatus, comprising:
the data acquisition module is used for acquiring evaluation text data of a user through a development platform interface;
the text word segmentation module is used for performing word segmentation processing on the evaluation text data through a pre-constructed word segmentation word bank to obtain a word list;
the word conversion module is used for carrying out synonym conversion on the word list through a pre-constructed dimension word bank;
the recognition scoring module is used for recognizing the emotion words in the word list subjected to synonym conversion through one or more emotion word banks which are constructed in advance, calculating according to preset weights of different emotion words and scoring the emotion words;
and the statistic output module is used for counting the score condition of the emotional words and sorting to obtain the comprehensive score evaluation satisfaction degree of the user.
6. The comment information-based data processing apparatus of claim 5 wherein the construction process of the segmented word stock includes:
performing word graph scanning based on the Trie tree structure to generate a directed acyclic graph formed by all possible word combinations of the Chinese characters in the sentence; searching a maximum probability path by adopting dynamic planning, and finding a maximum segmentation combination based on word frequency from the directed acyclic graph;
constructing the unknown words by a Viterbi algorithm by adopting an HMM model based on the Chinese character word forming capability;
the word segmentation word bank also comprises fixed combination words which cannot be identified due to the small sample size.
7. A comment information based data processing apparatus as claimed in claim 5 further comprising: and the stop word removing module is used for matching the word list through a pre-constructed stop word library to remove words which are meaningless to semantic recognition in the comment characters after the word list is obtained, so that the efficiency and the recognition degree of the model are increased.
8. The review information-based data processing apparatus of claim 5, wherein the identification scoring module comprises: the emotion word unit, the dimension word unit and the independent word unit;
the emotion word unit is used for scoring the word list subjected to synonym conversion according to the emotion intensity of the emotion words through a pre-constructed emotion word library;
the dimension word unit is used for filing and scoring the words with the same dimension through a pre-constructed dimension emotional word library on the word list subjected to synonym conversion;
and the independent word unit is used for identifying the independent emotional words and scoring the comment text through a pre-constructed independent emotional word library on the word list subjected to synonym conversion.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; wherein the computer program controls an apparatus in which the computer-readable storage medium is located to execute the comment information-based data processing method according to any one of claims 1 to 4 when executed.
10. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the comment information based data processing method according to any one of claims 1 to 4 when executing the computer program.
CN201910906324.4A 2019-09-24 2019-09-24 Comment information-based data processing method and device Pending CN110705286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910906324.4A CN110705286A (en) 2019-09-24 2019-09-24 Comment information-based data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910906324.4A CN110705286A (en) 2019-09-24 2019-09-24 Comment information-based data processing method and device

Publications (1)

Publication Number Publication Date
CN110705286A true CN110705286A (en) 2020-01-17

Family

ID=69195775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910906324.4A Pending CN110705286A (en) 2019-09-24 2019-09-24 Comment information-based data processing method and device

Country Status (1)

Country Link
CN (1) CN110705286A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241834A (en) * 2020-01-20 2020-06-05 和宇健康科技股份有限公司 Medical care quality evaluation obtaining method, device, medium and terminal equipment
CN111340385A (en) * 2020-03-10 2020-06-26 深圳华侨城创新研究院有限公司 Scientific measuring method for measuring joy index of tourist attraction
CN112541077A (en) * 2020-11-26 2021-03-23 深圳供电局有限公司 Processing method and system for power grid user service evaluation
CN112667780A (en) * 2020-12-31 2021-04-16 上海众源网络有限公司 Comment information generation method and device, electronic equipment and storage medium
CN112765963A (en) * 2020-12-31 2021-05-07 北京锐安科技有限公司 Sentence segmentation method and device, computer equipment and storage medium
CN112785335A (en) * 2021-01-21 2021-05-11 安徽商信政通信息技术股份有限公司 Data processing method and system for electronic government affair performance assessment system
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113486662A (en) * 2021-07-19 2021-10-08 上汽通用五菱汽车股份有限公司 Text processing method, system and medium
CN113782123A (en) * 2021-08-16 2021-12-10 山西大学 Online medical patient satisfaction measuring method based on network data
CN114398911A (en) * 2022-01-24 2022-04-26 平安科技(深圳)有限公司 Emotion analysis method and device, computer equipment and storage medium
CN114840658A (en) * 2022-07-06 2022-08-02 浙江口碑网络技术有限公司 Evaluation reply method, electronic device, and computer storage medium
CN114925373A (en) * 2022-05-17 2022-08-19 南京航空航天大学 Method for automatically identifying vulnerability of privacy protection policy of mobile application based on user comment
WO2023015715A1 (en) * 2021-08-12 2023-02-16 惠州Tcl云创科技有限公司 User-comment-based data processing method and apparatus, and device and storage medium
CN117350804A (en) * 2023-09-22 2024-01-05 深圳市小绿人网络信息技术有限公司 Information pushing method and system based on comprehensive credit integration system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286627A1 (en) * 2014-04-03 2015-10-08 Adobe Systems Incorporated Contextual sentiment text analysis
CN107391493A (en) * 2017-08-04 2017-11-24 青木数字技术股份有限公司 A kind of public feelings information extracting method, device, terminal device and storage medium
CN109214008A (en) * 2018-09-28 2019-01-15 珠海中科先进技术研究院有限公司 A kind of sentiment analysis method and system based on keyword extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286627A1 (en) * 2014-04-03 2015-10-08 Adobe Systems Incorporated Contextual sentiment text analysis
CN107391493A (en) * 2017-08-04 2017-11-24 青木数字技术股份有限公司 A kind of public feelings information extracting method, device, terminal device and storage medium
CN109214008A (en) * 2018-09-28 2019-01-15 珠海中科先进技术研究院有限公司 A kind of sentiment analysis method and system based on keyword extraction

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241834A (en) * 2020-01-20 2020-06-05 和宇健康科技股份有限公司 Medical care quality evaluation obtaining method, device, medium and terminal equipment
CN111340385A (en) * 2020-03-10 2020-06-26 深圳华侨城创新研究院有限公司 Scientific measuring method for measuring joy index of tourist attraction
CN112541077A (en) * 2020-11-26 2021-03-23 深圳供电局有限公司 Processing method and system for power grid user service evaluation
CN112541077B (en) * 2020-11-26 2023-11-17 深圳供电局有限公司 Processing method and system for power grid user service evaluation
CN112667780A (en) * 2020-12-31 2021-04-16 上海众源网络有限公司 Comment information generation method and device, electronic equipment and storage medium
CN112765963A (en) * 2020-12-31 2021-05-07 北京锐安科技有限公司 Sentence segmentation method and device, computer equipment and storage medium
CN112785335A (en) * 2021-01-21 2021-05-11 安徽商信政通信息技术股份有限公司 Data processing method and system for electronic government affair performance assessment system
CN112818682B (en) * 2021-01-22 2023-01-03 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113486662A (en) * 2021-07-19 2021-10-08 上汽通用五菱汽车股份有限公司 Text processing method, system and medium
WO2023015715A1 (en) * 2021-08-12 2023-02-16 惠州Tcl云创科技有限公司 User-comment-based data processing method and apparatus, and device and storage medium
CN113782123A (en) * 2021-08-16 2021-12-10 山西大学 Online medical patient satisfaction measuring method based on network data
CN114398911A (en) * 2022-01-24 2022-04-26 平安科技(深圳)有限公司 Emotion analysis method and device, computer equipment and storage medium
CN114925373A (en) * 2022-05-17 2022-08-19 南京航空航天大学 Method for automatically identifying vulnerability of privacy protection policy of mobile application based on user comment
CN114925373B (en) * 2022-05-17 2023-12-08 南京航空航天大学 Mobile application privacy protection policy vulnerability automatic identification method based on user comment
CN114840658A (en) * 2022-07-06 2022-08-02 浙江口碑网络技术有限公司 Evaluation reply method, electronic device, and computer storage medium
CN117350804A (en) * 2023-09-22 2024-01-05 深圳市小绿人网络信息技术有限公司 Information pushing method and system based on comprehensive credit integration system

Similar Documents

Publication Publication Date Title
CN110705286A (en) Comment information-based data processing method and device
CN109446524B (en) A kind of voice quality detecting method and device
CN107391493B (en) Public opinion information extraction method and device, terminal equipment and storage medium
CN107633007B (en) Commodity comment data tagging system and method based on hierarchical AP clustering
CN109522556B (en) Intention recognition method and device
CN109460455B (en) Text detection method and device
CN108391446B (en) Automatic extraction of training corpus for data classifier based on machine learning algorithm
CN109271489B (en) Text detection method and device
CN112380349A (en) Commodity gender classification method and device and electronic equipment
CN112015721A (en) E-commerce platform storage database optimization method based on big data
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN111858843A (en) Text classification method and device
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN107515849A (en) It is a kind of into word judgment model generating method, new word discovery method and device
Haque et al. Opinion mining from bangla and phonetic bangla reviews using vectorization methods
CN111475651B (en) Text classification method, computing device and computer storage medium
CN111324698A (en) Deep learning method, evaluation viewpoint extraction method, device and system
CN111046660A (en) Method and device for recognizing text professional terms
CN113051380A (en) Information generation method and device, electronic equipment and storage medium
CN109960730B (en) Short text classification method, device and equipment based on feature expansion
CN110888983A (en) Positive and negative emotion analysis method, terminal device and storage medium
CN113515587A (en) Object information extraction method and device, computer equipment and storage medium
CN111126038A (en) Information acquisition model generation method and device and information acquisition method and device
CN116416640A (en) Method, device, equipment and storage medium for determining document element

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117