WO2018045910A1 - 情感倾向的识别方法、对象分类方法及数据处理*** - Google Patents

情感倾向的识别方法、对象分类方法及数据处理*** Download PDF

Info

Publication number
WO2018045910A1
WO2018045910A1 PCT/CN2017/100060 CN2017100060W WO2018045910A1 WO 2018045910 A1 WO2018045910 A1 WO 2018045910A1 CN 2017100060 W CN2017100060 W CN 2017100060W WO 2018045910 A1 WO2018045910 A1 WO 2018045910A1
Authority
WO
WIPO (PCT)
Prior art keywords
processed
short text
category
sentiment
feature
Prior art date
Application number
PCT/CN2017/100060
Other languages
English (en)
French (fr)
Inventor
潘林林
赵争超
林君
肖谦
张一昌
Original Assignee
阿里巴巴集团控股有限公司
潘林林
赵争超
林君
肖谦
张一昌
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 潘林林, 赵争超, 林君, 肖谦, 张一昌 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2018045910A1 publication Critical patent/WO2018045910A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to an emotional tendency recognition method, an object classification method, and a data processing system.
  • the same short texts may correspond to different categories in different contexts. For example, taking the object as the clothing user evaluation text as an example, the first user evaluates as “the color of the clothes is dim, just right”, and the second user evaluates as “the color of the clothes is dim, not bright”. The above two objects have the same short text "cloth color dim”. If you sort by text, the two short texts will be grouped into one category, but the two should correspond to different categories.
  • the “dark color of clothes” in the first user evaluation corresponds to positive emotions, which should be divided into the first category; the “dark color of clothing” in the second user evaluation corresponds to negative emotions, which should be divided into For the second category. Therefore, it is currently common to use the sentiment tendency corresponding to short text to determine the category of the object.
  • the specific implementation process can be:
  • Emotional lexicon contains many positive vocabulary, such as “clothing”, “large screen”, “beautiful”, “fast”, “appropriate”, “beautiful”, etc.
  • the emotional lexicon also contains many negative words, such as “clothes” and “ugly”. “, “slow”, “small screen” and so on.
  • the object to be processed is first divided into punctuation marks, and a short text is arranged between two adjacent punctuation marks, thereby dividing the object to be processed into a plurality of short texts to be processed. For example, taking “clothing is a good fit, mom is very fond of", for example, after splitting by punctuation, you can get two short texts "fit is suitable” and "mother likes".
  • Each short text of the object to be processed is a short text to be processed.
  • a flowchart for determining a sentiment tendency of a short text to be processed for a processor the execution process specifically includes the following steps:
  • Step 1 The processor performs word segmentation on the short text to obtain the word segmentation result.
  • the short text to be processed is divided into several words, and some words are participle results.
  • the results obtained after the word segmentation are “clothing”, “very”, and “appropriate”.
  • the short text to be processed is “the mobile phone screen is large”, and the result of the word segmentation obtained after the word segmentation is “mobile phone”, “screen”, “very” and “large”.
  • Step 2 Match the word segmentation result with the sentiment lexicon according to the emotion matching rule.
  • Step 3 Determine the sentiment tendency corresponding to the short text to be processed.
  • the word segmentation result is matched with the emotion lexicon and the emotion rule. If the word segmentation in the word segmentation result corresponds to the positive emotion and does not include the negative word, it is determined that the short text corresponds to the positive emotion. If the emotional words in the word segmentation result correspond to negative emotions and do not contain negative words, it is determined that the short text corresponds to negative emotions.
  • the processor can automatically perform the process shown in Figure 1 so that the emotional tendencies of the short text to be processed can be automatically determined.
  • the applicant of the present application found during the research that although the above automatic processing process can identify the emotional tendency of the short text to be processed to a certain extent, the emotional tendency of the short text to be processed obtained by the above processing may be inaccurate.
  • Taobao since Taobao has many categories (such as clothing categories, electronic equipment categories, maternal and child categories, etc.), each category has corresponding users. Evaluation. Applicants discovered during the research that short texts containing the same emotional words in different categories may correspond to different emotional tendencies.
  • a short text is "large screen”, and the emotional tendency of the short text is positive emotion.
  • a short text is “large clothes”, and the emotional tendency of the short text is negative emotion.
  • the two short texts are "very large”, so the two short texts contain the same emotional words, but the two short texts have different emotional tendencies.
  • the processor in FIG. 1 automatically determines the sentiment tendency of the short text, the processor adopts the same processing method for all objects, that is, the existing processing process does not separately process the short text from the perspective of the object class. Emotional tendencies, so the emotional tendency to determine short texts in the prior art is inaccurate.
  • the present application provides a method for identifying an emotional tendency so that the emotional tendency of the short text to be processed can be accurately determined.
  • a method of identifying sentimental tendencies including:
  • the sentiment estimation model determines a feature set corresponding to the short text to be processed; wherein each feature in the feature set includes: the to-be-processed a word segmentation of the short text and a category identifier to which the short text to be processed belongs; according to the pre-trained sentiment estimation model, combined with the feature set of the short text to be processed, the sentiment estimation is performed on the short text to be processed; wherein
  • the sentiment estimation model includes: a model obtained by training a plurality of short text samples with emotional tendencies according to at least two categories, outputting positive emotions and negative emotions; and based on the positive emotions corresponding to the short texts to be processed Degree and negative sentiment, determining an emotional tendency corresponding to the short text to be processed;
  • the sentiment estimation model is that a category corresponds to an sentiment estimation model, determining a feature set corresponding to the short text to be processed; wherein each feature in the feature set includes: the to-be-processed essay The word segmentation; according to the emotion degree estimation model corresponding to the category identifier, combined with the feature set of the short text to be processed, the sentiment degree estimation is performed on the short text to be processed; wherein the emotion degree estimation model is: a model for outputting positive affectiveness and negative affectiveness obtained after training of a plurality of short text samples corresponding to the sentimental tendency corresponding to the category identifier; The positive emotion degree and the negative emotion degree corresponding to the short text are processed, and the emotional tendency corresponding to the short text to be processed is determined.
  • the method further includes:
  • a method of identifying sentimental tendencies including:
  • each feature in the feature set includes: a word segmentation of the short text to be processed and the The category identifier to which the short text to be processed belongs;
  • the sentiment estimation is performed on the short text to be processed; wherein the sentiment estimation model includes: based on at least two categories, with an emotional tendency a model of a number of short text samples obtained after training, which outputs positive emotions and negative emotions;
  • the determining the feature set corresponding to the short text to be processed includes:
  • a set of individual features is determined as a feature set of the short text to be processed.
  • the determining the feature set corresponding to the short text to be processed includes:
  • a set of each feature and the plurality of combined features is determined as a feature set of the short text to be processed.
  • the feature is combined by using the n-gram language model to obtain a plurality of combined features, including:
  • the features are combined by using a binary language model to obtain a plurality of combined features.
  • the sentiment estimation of the short text to be processed includes:
  • the positive emotion degree and the negative emotion degree corresponding to the short text to be processed are output.
  • the determining the sentiment tendency corresponding to the to-be-processed short text based on the positive sentiment and the negative sentiment corresponding to the short text to be processed includes:
  • the greater sentiment is greater than the pre-set reliability, it is determined that the sentiment tendency corresponding to the short text to be processed is consistent with the sentiment tendency of the greater sentiment.
  • the sentiment estimation model comprises:
  • the model of the positive sentiment and the negative sentiment obtained after training based on the feature sets of the plurality of short texts corresponding to the at least two categories is identified.
  • the method further includes:
  • a method of identifying sentimental tendencies including:
  • each feature in the feature set includes: the short text to be processed Participle;
  • the emotion estimation model is: according to the category Identifying a model of the corresponding positive emotions and negative emotions obtained after training a number of short text samples with sentimental tendencies;
  • the determining the feature set corresponding to the short text to be processed includes:
  • a set of each participle and a plurality of combined participles is determined as a feature set of the short text to be processed, and one participle corresponds to one feature.
  • the determining the feature set corresponding to the short text to be processed includes:
  • the word segmentation result is determined as a feature set of the short text to be processed, and one word segment corresponds to one feature.
  • the method further includes:
  • An emotional orientation recognition system comprising:
  • a data providing device for transmitting a plurality of objects
  • the processor is configured to receive a plurality of objects sent by the data providing device, construct an emotion estimation model according to short texts of the plurality of objects, and determine an emotional tendency of the short text to be processed by using the sentiment estimation model.
  • the processor is further configured to construct a correspondence between the sentiment estimation model and the category identifier to which the object belongs.
  • the system further comprises a receiving device
  • the processor is further configured to output an emotional tendency of the to-be-processed text
  • the receiving device is configured to receive an emotional tendency of the to-be-processed text.
  • An emotional orientation recognition system comprising:
  • a data providing device for transmitting a plurality of objects
  • a model construction device configured to receive a plurality of objects sent by the data providing device, construct an emotion estimation model according to short texts of the plurality of objects, and send the sentiment estimation model;
  • a processor configured to receive the sentiment estimation model, and use the sentiment estimation model to determine an emotional tendency of the short text to be processed.
  • the model construction device is further configured to construct a correspondence between the sentiment estimation model and the category identifier to which the object belongs, and send the correspondence to the processor.
  • the system further comprises a receiving device
  • the processor is further configured to output an emotional tendency of the to-be-processed text
  • the receiving device is configured to receive an emotional tendency of the to-be-processed text.
  • An object classification method including:
  • Determining feature information of the object to be processed wherein the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of the short text;
  • category identification on the feature information of the object to be processed according to the pre-trained category recognition model wherein the category recognition model is: the first category and the second category obtained after training according to the feature information of the plurality of object samples Classifier.
  • the feature information further includes:
  • the object is attached to feature information belonging to the second body.
  • the classifying the feature information according to the pre-trained category recognition model comprises:
  • first category matching degree is greater than the second category matching degree, determining that the category of the to-be-processed object is the first category
  • the second category matching degree is greater than the first category matching degree, determining that the category of the to-be-processed object is the second category.
  • the method further includes:
  • the method further includes:
  • the object samples are derived from the object set, and satisfy a preset rule
  • the category recognition model is retrained based on the updated existing object samples.
  • a classification method for user evaluation including:
  • Determining feature information of the user evaluation to be processed wherein the feature information includes text feature information of the user evaluation, image feature information of the user evaluation, feature information of the seller, and feature information of the buyer, and the text feature information includes an essay Emotional tendency
  • the category recognition model is: the first type of user obtained after training according to the feature information of the plurality of user evaluation samples Evaluation and classifier for the second type of user evaluation.
  • the method further includes:
  • the method further includes:
  • the category recognition model is retrained based on the updated existing user evaluation samples.
  • An object classification system comprising:
  • a data providing device for transmitting a plurality of objects
  • a processor configured to receive a plurality of objects sent by the data providing device, and obtain and output a class identification model of the first category and the second category according to the feature information of the objects; and used to determine feature information of the object to be processed;
  • the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of the short text; and classifying the feature information of the object to be processed according to the category recognition model; Used to output objects of the first category;
  • a data receiving device configured to receive and use the object of the first category.
  • An object classification system comprising:
  • a data providing device for transmitting a plurality of objects
  • a model construction device configured to receive a plurality of objects sent by the data providing device, and obtain and output a category identification model of the first category and the second category according to the feature information of the plurality of objects, and send the category identification model;
  • a processor configured to receive the category identification model, and determine feature information of the object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature
  • the levy information includes an emotional tendency of the short text; according to the category identification model, classifying the feature information of the object to be processed; and also outputting the object of the first category;
  • a data receiving device configured to receive and use the object of the first category.
  • the present application provides a method for identifying sentiment orientation.
  • the method uses a plurality of short texts with sentiment tendencies corresponding to the category as training samples, acquires a feature set of short texts for training, and obtains an emotional degree estimation model. Since each feature contains a short text segmentation and a category identifier, the sentiment estimation model applied for the application fully considers the category to which the short text belongs. Therefore, the sentiment tendency of the short text to be processed determined based on the sentiment estimation model is also more accurate.
  • 1 is a flow chart of determining an emotional tendency of a short text to be processed in the prior art
  • FIGS. 2a-2b are schematic structural diagrams of an emotion tendency recognition system according to an embodiment of the present application.
  • 3a-3c are schematic diagrams showing the correspondence between the emotion estimation model and the category provided by the embodiment of the present application.
  • 4a-4c are flowcharts of constructing an emotion estimation model provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of still another method for constructing an emotion estimation model according to an embodiment of the present application.
  • 6a-6b are flowcharts of still another constructed emotion estimation model provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for identifying an sentiment tendency according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a method for identifying an sentiment tendency according to an embodiment of the present application.
  • FIG. 10 is a flowchart of a method for identifying an sentiment tendency according to an embodiment of the present application.
  • 11a-11b are flowcharts of a method for identifying an sentiment tendency according to an embodiment of the present application.
  • FIG. 12 is a flowchart of an object classification method according to an embodiment of the present application.
  • FIG. 13 is a flowchart of still another object classification method according to an embodiment of the present application.
  • FIG. 14 is a flowchart of still another object classification method according to an embodiment of the present application.
  • FIG. 15 is a flowchart of still another object classification method according to an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an object classification system according to an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of still another object classification system according to an embodiment of the present application.
  • FIG. 18 is a flowchart of a scenario embodiment of an object classification method according to an embodiment of the present disclosure.
  • the present application proposes a technical means for constructing the sentiment estimation model to estimate the positive affectiveness and negative affectiveness corresponding to the short text to be processed by using the sentiment estimation model.
  • the positive emotion degree is used to indicate the degree to which the short text to be processed belongs to positive emotion.
  • the negative emotion degree is used to indicate the degree to which the short text to be processed belongs to negative emotion.
  • the present invention provides an emotional tendency recognition system.
  • the recognition system of the sentiment orientation provided in FIG. 2a specifically includes: a data providing device 100, and a processor 200 connected to the data providing device 100.
  • the data providing device 100 is configured to send a number of objects to the processor 200.
  • the processor 200 is configured to construct an emotion estimation model according to short texts of several objects, and determine an emotional tendency of the short text to be processed by using the sentiment estimation model.
  • the present application also provides an identification system for another sentimental orientation (see Figure 2b).
  • the recognition system of the sentiment orientation provided in FIG. 2b specifically includes: a data providing device 100, a model building device 300 connected to the data providing device, and a processor 200 connected to the model building device.
  • the model building device 300 can be a processing device with processing capabilities.
  • the data providing device 100 is configured to send a number of objects to the model building device 300.
  • the model construction device 300 is configured to construct an emotion estimation model based on short texts of several objects, and
  • the sensitivity estimation model is sent to the processor 200.
  • the processor 200 is configured to determine an emotional tendency of the short text to be processed by using the sentiment estimation model.
  • both the processor 200 and the model construction device 300 can perform the process of constructing the sentiment estimation model, and the processes of constructing the sentiment estimation model are consistent. . Therefore, the processor 200 or the model construction device 300 is collectively referred to as a processing device, so that the processing device is used to collectively represent the processor 200 or the model construction device 300 in the process of constructing the emotion estimation model described below.
  • a receiving device (not shown) connected to the processor may also be included in the system shown in Figures 2a and 2b.
  • the processor determines the emotional tendency of the short text to be processed
  • the processor is further configured to output an emotional tendency of the to-be-processed text
  • the receiving device is configured to receive an emotional tendency of the to-be-processed text, so that the receiving device can Other processes are performed using the emotional tendencies of the text to be processed.
  • the process of constructing the sentiment estimation model is described below. Since the prior art determines that the category of short text is not considered in the process of emotional sentiment of the short text to be processed, the emotional tendency determined in the prior art is not accurate. Therefore, the present application considers the category of the short text in the process of constructing the emotion estimation model by the processing device, so that the constructed emotion estimation model can accurately determine the positive emotion and the negative emotion of the short text to be processed.
  • This application proposes three implementations of the device construction emotion estimation model. See Figures 3a-3c for a schematic diagram of the category and sentiment estimation models in the three implementations.
  • the first implementation all categories correspond to an sentiment estimation model (see Figure 3a).
  • the second implementation each category corresponds to an sentiment estimation model (see Figure 3b).
  • the third implementation an implementation between the first implementation and the second implementation (see Figure 3c); assuming the N categories, the third implementation can build M emotions Degree estimation model, where M is a non-zero natural number, and 1 ⁇ M ⁇ N.
  • the first implementation all categories correspond to a sentiment estimation model.
  • this implementation constructs a corresponding sentiment estimation model for all categories.
  • the process of estimating the model of emotions corresponding to all categories includes the following steps:
  • Step S401 Determine a short text sample used to construct the sentiment estimation model.
  • the data providing device can send objects under various categories to the processing device, and the processing device can acquire multiple objects under each category.
  • the processing device can segment each object by punctuation, thereby dividing each object into a plurality of short texts.
  • a user under the clothing category evaluates that “clothes are suitable, moms like them very much”, and then according to the punctuation marks, two short texts “fit clothes are suitable” can be obtained. And “Mom likes it.”
  • Target short text For example, in a user rating under the category of electronic devices, "the screen of the mobile phone is large and the appearance is very beautiful", after dividing by punctuation, two short texts “large screen of the mobile phone” and "very beautiful appearance” can be obtained.
  • the processing device can execute each short text as shown in FIG. 1. If the process shown in FIG. 1 is performed, it is determined that a short text corresponds to a positive emotion. Then, determining that the short text can be used to construct an sentiment estimation model, and the short text corresponds to a positive emotion.
  • a short text belongs to positive emotion after manual confirmation, it indicates that the short text has no obvious characteristics and is not suitable as a short text for constructing an emotional estimation model. Therefore, the short text is discarded.
  • Step S402 Determine a feature set corresponding to each short text.
  • step S401 the word segmentation result of each short text can be obtained by using the process shown in FIG. 1 (see step 1 in FIG. 1 , and details are not described herein again). Then, the feature set corresponding to each short text is further determined.
  • the difference between the two methods is that the feature set determined by the first mode includes the combination feature, and the feature set determined by the second mode does not include the combination feature.
  • Step 411 Acquire a category identifier corresponding to the short text to be processed, and a word segmentation result obtained after the short text to be processed performs the word segmentation operation.
  • the processing device has obtained the word segmentation result of the target short text in step S301. Since the target short text is consistent with the category of the object to be processed, the processing device can determine the category identifier of the object to be processed as the category identifier of the target short text.
  • the short text of the target belongs to the clothing category, and for the example of “large clothes”, the result of the word segmentation corresponding to the short text of the target is “clothing” “very” and “large”, and if the purpose of the clothing category is “16”, then The corresponding category identifier of the target short text is "16".
  • the target short text belongs to the electronic device category, and the "screen is large” is taken as an example.
  • the word segmentation result corresponding to the target short text is "screen” "very” and “large”, and the electronic device category identifier is "10".
  • the corresponding category identifier of the target short text is "10".
  • Step 412 Combine each participle with the category identifier to obtain each feature.
  • the present application combines each word segment with the category to obtain each feature.
  • the feature contains the category identifier, and the identifiers of different categories are different, the feature can accurately distinguish the word segmentation of different categories. In this way, the sentiment estimation model obtained by the training can accurately distinguish the same participle under different categories.
  • the target short text "large clothes” is taken as an example, and the respective features corresponding to the target short text may be “clothes 16", “very 16” and “large 16".
  • each feature corresponding to the target short text may be “screen 10", “very 10” and “large 10".
  • the processing device can distinguish that the participles “big 16” and “big 10" are two different features, and the two features belong to different categories.
  • the combination of the word segmentation and the category identifier is after the word segmentation, the class object identifier, and the category identifier is in front and the word segment is in the back.
  • the word segmentation and the category identifier may also have other combinations, which are not limited herein.
  • Step 413 Perform n-ary combination on each feature to obtain several combined features.
  • each feature of each short text is combined using an n-gram language model.
  • n is a non-zero natural number
  • one element in the n-gram language model corresponds to a participle in the short text.
  • the feature combination of the n-gram language model is specifically: the adjacent n features are merged together, and the n-1 features are merged together until the two features are merged together.
  • Step 414 Determine each feature and a set of several combined features as a feature set of the target short text.
  • the feature combination of the binary language model is taken as an example, and the feature set of the target short text finally obtained includes: “clothes 16”, “very 16”, “big 16”, “clothes 16 is 16” And "very 16 big 16".
  • Step 421 Acquire a category identifier corresponding to the short text to be processed, and a word segmentation result obtained by performing the word segmentation operation on the short text to be processed.
  • Step 422 Combine each participle with the category identifier to obtain each feature.
  • step S421 and step S422 in FIG. 4c are the same as step S411 and step S412 in FIG. 4b, and details are not described herein again.
  • Step 423 Determine a set of each feature as a feature set of the target short text.
  • step of performing feature combination is absent during the execution of FIG. 4c, so the set of individual features determined in step S422 can be directly determined as the feature set of the target short text.
  • the feature set of the target short text finally obtained after execution according to FIG. 4c includes: “clothes 16", “very 16”, and "big 16".
  • step S403 determining an emotional tendency of each feature in each short text corresponding feature set, and a positive affective degree and a negative affective degree of each feature, and corresponding emotions and positive faces of each feature and each feature Emotional and negative sentiment, as input parameters of the sentiment estimation model.
  • the sentiment tendency of the short text has been determined. Because the emotional tendency of each feature is consistent with the emotional tendency of short text. Therefore, when the short text corresponds to the positive emotion, each feature in the feature set is determined to correspond to the positive emotion; when the short text corresponds to the negative emotion, each feature in the feature set is determined to correspond to the negative emotion.
  • the processing device can obtain a large number of identical features, and the emotional sentiments corresponding to the features may be the same and may be different.
  • the processing device can count the total number of features and count the first number of positive emotions and the second number of negative emotions.
  • the positive sentiment of the feature is determined according to the proportional relationship between the first quantity and the total quantity; and the negative sentiment of the feature is determined according to the proportional relationship between the first quantity and the total quantity.
  • Step S404 Perform training according to the preset classifier model, and obtain the emotion degree estimation model obtained after the training.
  • the preset classifier model may include a maximum entropy model, a support vector machine, a neural network algorithm, and the like. There are related technical means in the training process, and will not be repeated here.
  • the following describes the second implementation of the device construction emotion estimation model.
  • an emotion estimation model is constructed for each category. Therefore, since there is only one in each emotion estimation model.
  • the category so in the second implementation, the word segmentation is equivalent to the feature, so in the second implementation, the word segmentation and the category identifier need not be combined.
  • the construction process of the sentiment estimation model corresponding to each category is consistent. Therefore, taking a target category as an example, the process of constructing the target sentiment estimation model corresponding to the target category is introduced in detail.
  • the process of constructing the target sentiment estimation model specifically includes the following steps:
  • Step S501 Determine a short text sample of the construction target emotion degree estimation model.
  • step S501 The specific execution process of step S501 is similar to the process of step S401, and details are not described herein again.
  • Step S502 Determine a feature set corresponding to each short text.
  • step S501 the word segmentation result of each short text can be obtained by using the process shown in FIG. 1 (see step 1 in FIG. 1 , and details are not described herein again). Then, the feature set corresponding to each short text is further determined. There are two implementation modes in this step. The difference between the two methods is that the feature set determined by the first mode includes the combination feature, and the feature set determined by the second mode does not include the combination feature.
  • Step 601 Acquire a word segmentation result corresponding to the target short text, and each word segment corresponds to one feature.
  • Step 602 Perform n-ary combination on the respective features to obtain a plurality of combined features.
  • Step 603 Determine each feature and a set of several combined features as a feature set of the target short text.
  • the feature set of the target short text finally obtained in this embodiment includes: “clothing”, “very”, “big”, “The clothes are very” and “very big.”
  • Step 611 Acquire a word segmentation result corresponding to the target short text, and each word segment corresponds to one feature.
  • Step 612 Determine the word segmentation result as a feature set of the target short text.
  • step of performing feature combination is missing in the execution of FIG. 6b, so the set of individual features determined in step S611 can be directly determined as the feature set of the target short text.
  • the feature set of the target short text finally obtained after execution according to FIG. 6b includes: “clothing”, “very”, and "large”.
  • step S503 determining an emotional tendency of each feature in each short text corresponding feature set under the target category, and a positive affective degree and a negative affective degree of each feature, and selecting each feature under the target category and The emotional tendency, positive affective degree and negative affective degree corresponding to each feature are used as input parameters of the target sentiment estimation model.
  • each feature in the feature set is determined to correspond to the positive emotion; when the short text corresponds to the negative emotion, each feature in the feature set is determined to correspond to the negative emotion.
  • Step S504 Perform training according to the preset classifier model, and obtain a target emotion degree estimation model obtained after the training.
  • the preset classifier model may include a maximum entropy model, a support vector machine, a neural network algorithm, and the like. There are related technical means in the training process, and will not be repeated here.
  • FIG. 5 is a process for constructing a class of sentiment estimation model
  • FIG. 3 is a process for constructing a sentiment estimation model for all classes. The processing steps of the two are similar. Therefore, the execution process of the embodiment of FIG. 5 can be Refer to the specific implementation process of FIG. 4, and details are not described herein again.
  • each category corresponds to an sentiment estimation model. Therefore, in order to avoid confusion, after processing the emotional estimation model, the processing device also constructs a mapping between the sentiment estimation model and the category identifier, so that the subsequent processor can accurately determine each The sentiment estimation model corresponding to the category.
  • the emotion degree estimation model corresponding to two or more categories may be included, and/or the emotion degree estimation model corresponding to one category.
  • the construction process of the emotion estimation model corresponding to two or more categories reference may be made to the embodiment shown in FIG.
  • an emotional degree estimation model corresponding to a category reference may be made to the embodiment shown in FIG. 5, and details are not described herein again.
  • the processor 200 can directly use the emotion estimation model to utilize the emotion estimation model. Determine the emotional tendency of the short text to be processed.
  • the model building device 300 transmits the sentiment estimation model to the processor 200, so that the processor 200 determines the emotional tendency of the short text to be processed using the sentiment estimation model.
  • the process of determining the emotional tendency of the short text to be processed by the processor 200 based on the sentiment estimation model is described below. Since the emotion estimation model has three different implementation modes, the execution process of the processor 200 is different under different implementation modes. Therefore, the following describes the different implementation modes of the emotion estimation model. Implementation process.
  • the processor 200 determines the emotional tendency of the short text to be processed in the following manner.
  • a method for identifying an emotional tendency specifically includes the following steps:
  • Step S701 Determine a feature set corresponding to the short text to be processed, where each feature in the feature set includes: a word segmentation of the short text to be processed and a category identifier to which the to-be-processed text belongs.
  • the first execution mode is also used in this step to determine the short text feature set to be processed.
  • a first implementation manner of determining a feature set corresponding to a short text to be processed includes the following steps:
  • Step S801 Acquire a category identifier corresponding to the short text to be processed, and a word segmentation result obtained after the short text is to be processed.
  • Step S802 Combine each participle in the word segmentation result with the category identifier to obtain each feature.
  • Step S803 performing n-ary combination on the respective features to obtain a plurality of combined features.
  • Step S804 Determine a set of each feature and a plurality of combined features as a feature set of the short text to be processed.
  • FIG. 8a The execution process of FIG. 8a can be referred to the execution process of FIG. 4a, and details are not described herein again.
  • the second execution mode is also used in this step to determine the feature set of the short text to be processed. .
  • a second implementation manner of determining a feature set corresponding to the short text to be processed includes the following steps:
  • Step S811 Acquire a category identifier corresponding to the short text to be processed, and a word segmentation result obtained after the short text is to be processed.
  • Step S812 Combine each participle in the word segmentation result with the category identifier to obtain each feature.
  • Step S813 Determine a set of each feature as a feature set of the short text to be processed.
  • FIG. 8b The execution process of FIG. 8b can be referred to the execution process of FIG. 4b, and details are not described herein again.
  • step S702 performing a sentiment estimation on the short text to be processed according to the pre-trained sentiment estimation model combined with the feature set of the short text to be processed; wherein the sentiment estimation model includes: Two categories, a series of short text samples with sentimental tendencies, and a model of positive emotion and negative sentiment.
  • the processor inputs the feature set to the sentiment estimation model, and the positive sentiment degree and the negative sentiment corresponding to the feature set are output after being estimated by the sentiment estimation model.
  • Step S703 Determine an emotional tendency corresponding to the short text to be processed based on the positive emotion degree and the negative emotion degree corresponding to the short text to be processed.
  • the sentiment tendency corresponding to the short text to be processed may also be outputted for use in other aspects.
  • step S702 after estimating that the short text to be processed belongs to the positive emotion level of the positive emotion, and after the negative text of the pending text belongs to the negative emotion, in order to further determine the emotional tendency of the short text to be processed, the positive emotion degree and the negative feeling may be negative. Emotional comparisons. If the positive sentiment is greater than the negative sentiment, it is determined that the short text to be processed belongs to the corresponding positive emotion; if the negative sentiment is greater than the positive sentiment, it is determined that the short text to be processed corresponds to the negative emotion.
  • positive affectiveness and negative affectiveness are not much different.
  • the probability value of positive emotion is 0.51
  • the probability of negative emotion is 0.49. Understandably, since the positive and negative emotions are very close, it is theoretically impossible to be accurate.
  • the emotional tendency of short text is to be processed. However, in this case, the emotional tendency of the short text to be processed is still determined in the above manner, and an error occurs.
  • the present application provides the following ways to deal with the sentimental tendencies of short text.
  • Step S901 Determine a greater degree of sentiment in both the positive affective degree and the negative affective degree.
  • Step S902 Determine whether the greater sentiment is greater than a pre-set confidence.
  • Pre-set reliability is the degree to which a greater degree of sentiment is determined. Then, the magnitude of the greater sentiment and the pre-set confidence is determined.
  • Step S903 If the greater sentiment degree is greater than the pre-set confidence, it is determined that the sentiment tendency corresponding to the to-be-processed short text is consistent with the sentiment tendency of the greater sentiment.
  • the greater sentiment is greater than the pre-set confidence, then the confidence of the greater sentiment is determined to be higher. Therefore, the emotional tendency of the short text to be processed can be accurately determined. At this time, the emotional tendency of the short text to be processed is consistent with the emotional tendency of the larger emotional degree.
  • the greater sentiment corresponds to the positive sentiment, it is determined that the short text to be processed belongs to the corresponding positive emotion; if the greater sentiment corresponds to the negative sentiment, it is determined that the short text to be processed corresponds to the negative emotion.
  • Step S904 If the greater sentiment is not greater than the pre-set reliability, perform other processing to determine the sentiment tendency of the text to be processed.
  • the greater sentiment is not greater than the pre-set confidence, then the confidence of the greater sentiment is determined to be lower. Therefore, the emotional tendency of the short text to be processed cannot be accurately determined. Assuming that the greater sentiment is 0.55 and the pre-set reliability is 0.7, in this case, the emotional tendency of the short text to be processed cannot be accurately determined.
  • a receiving device (not shown) connected to the processor may also be included in the system shown in Figures 2a and 2b. After the processor determines the sentimental tendency of the short text to be processed, the processor is also used to And outputting the emotional tendency of the to-be-processed text; the receiving device is configured to receive an emotional tendency of the to-be-processed text, so that the receiving device can utilize the emotional tendency of the to-be-processed text.
  • the processor 200 determines the emotional tendency of the short text to be processed in the following manner.
  • a method for identifying an emotional tendency according to the present application specifically includes the following steps:
  • Step S1001 Determine a feature set and a category identifier corresponding to the short text to be processed.
  • the first execution mode is also used in this step to determine the short text feature set to be processed.
  • Step 1101 Acquire a word segmentation result obtained after performing the word segmentation operation on the short text to be processed.
  • Step 1102 Perform word segmentation on each participle by using an n-gram language model to obtain a plurality of combined word segments.
  • Step 1103 Determine a set of each participle and a plurality of combined participles as a feature set of the short text to be processed, and one participle corresponds to one feature.
  • the execution process of FIG. 11a is similar to the execution process of FIG. 6a.
  • For the specific implementation process refer to the execution process of FIG. 6a, and details are not described herein again.
  • the second implementation manner determines the feature set of the short text in the process of determining the sentiment estimation model.
  • the second execution mode is also used to determine the short text feature set to be processed.
  • Step 1111 Acquire a word segmentation result obtained after the short text is to be processed to perform a word segmentation operation.
  • Step 1112 Determine the word segmentation result as a feature set of the short text to be processed, and one word segment corresponds to one feature.
  • the execution process in FIG. 11b is similar to the execution process in FIG. 6b.
  • For the specific execution process refer to the execution process of FIG. 6a, and details are not described herein again.
  • step S1002 based on the sentiment estimation model corresponding to the category identifier, and combining the feature set of the short text to be processed, the sentiment estimation is performed on the short text to be processed; wherein the emotional estimation
  • the measurement model is a model for outputting positive emotions and negative emotions obtained after training according to the feature set of the plurality of short text samples corresponding to the category identifier.
  • a plurality of sentiment estimation models may be searched according to the category identifier, thereby determining an emotion estimation model corresponding to the category identifier.
  • the processor inputs the feature set to the sentiment estimation model, and the positive sentiment degree and the negative sentiment corresponding to the feature set are output after being estimated by the sentiment estimation model.
  • Step S1003 Determine an emotional tendency corresponding to the short text to be processed based on the positive emotion degree and the negative emotion degree corresponding to the short text to be processed.
  • the execution process of this step is the same as the execution process of step 703 of FIG. 7, and details are not described herein again.
  • a receiving device (not shown) connected to the processor may also be included.
  • the processor determines the sentiment orientation corresponding to the short text to be processed
  • the processor is further configured to output an emotional tendency of the to-be-processed text
  • the receiving device is configured to receive an emotional tendency of the to-be-processed text.
  • the processor 200 pre-stores the correspondence between the category identifier and the sentiment estimation model, and pre-builds each category identifier and emotion estimation model. The corresponding relationship of the construction methods.
  • processor 200 receives a category identifier, first determining a construction manner of the sentiment estimation model corresponding to the category identifier;
  • the sentiment estimation model is constructed by using the first implementation manner, adaptively determining the emotional tendency of the short text to be processed according to the process shown in FIG. 4; that is, determining a feature set corresponding to the short text to be processed; wherein, Each feature in the feature set includes: a word segmentation of the short text to be processed and a category identifier to which the short text to be processed belongs; a pre-trained sentiment estimation model, combined with a feature set of the short text to be processed, to be processed The short text is used for emotional estimation; wherein the sentiment estimation model includes: training after training for a number of short text samples with emotional tendencies according to at least two categories And a model for outputting a positive emotion and a negative emotion; and determining an emotional tendency corresponding to the short text to be processed based on the positive emotion and the negative emotion corresponding to the short text to be processed.
  • the emotional tendency of the short text to be processed is determined according to the adaptive process shown in FIG. 5. That is, the feature set corresponding to the short text to be processed is determined; wherein each feature in the feature set includes: a word segmentation of the short text to be processed; and an emotion estimation model corresponding to the category identifier, combined with Processing the feature set of the short text, and performing the sentiment estimation on the short text to be processed; wherein the sentiment estimation model is: after training according to the short text sample corresponding to the category identifier and having an emotional tendency a model for outputting positive emotions and negative emotions; determining an emotional tendency corresponding to the short text to be processed based on the positive emotions and the negative emotions corresponding to the short texts to be processed.
  • FIG. 7 and FIG. 10 it can be seen that the present application has the following beneficial effects:
  • the present application provides a method for identifying an emotional tendency.
  • the method uses a plurality of short texts with emotional tendencies to perform training, and obtains an emotional degree estimation model. Since each feature set contains short text segmentation and category identifiers, the sentiment estimation model applied for the application fully considers the category to which the short text belongs. Therefore, the positive sentiment and the negative sentiment of the short text to be processed determined based on the sentiment estimation model are more accurate than the prior art. Furthermore, the emotional tendency determined by positive affectiveness and negative affectiveness is also more accurate.
  • the maximum entropy model is taken as an example to describe the training process of constructing the sentiment estimation model in this application:
  • matrix A contains the positive and negative emotions corresponding to each feature and each feature.
  • Matrix B contains two classification results: positive emotions and negative emotions.
  • b is used to indicate its emotional tendency.
  • f i (a, b) indicates the common occurrence of (a, b).
  • the sentiment tendency corresponding to the short text in the training sample is the probability of b
  • b) indicates the conditional probability of the feature a on the premise that the sentiment tendency of the short text is b.
  • the expectation of f i (a, b) in the training sample should be consistent with the expectation of f i (a, b) in the model.
  • the Lagrange multiplier method is used to solve the optimal solution of the objective equation (2) under the constraint condition of formula (4).
  • the optimal solution is as follows:
  • w i is the weight of the feature f i .
  • the present application provides an object classification method.
  • the object can be classified by directly using the sentiment tendency of the short text of the object to be processed. Specifically, the following steps are included:
  • Step S1201 Determine short text information of the object to be processed, wherein the short text information includes an emotional tendency of the short text.
  • the processor can divide the object to be processed into a plurality of short texts by using punctuation marks, and each short text can determine its emotional tendency according to the process provided in FIG. 7 or FIG. 10 of the present application, so that each short text in the object to be processed can be determined. Emotional tendency.
  • the short text information may further include: the number of short texts belonging to positive emotions among the objects to be processed, the number of short texts belonging to negative emotions, the proportion of positive short texts, the proportion of negative short texts, and the like.
  • Step S1202 Perform category identification on the short text information according to the pre-trained category recognition model; wherein the category identification feature model is: the first category and the second category trained according to the short text information of the plurality of objects Classifier.
  • the category recognition model is obtained by training the short text information of a plurality of objects in advance, and the obtained classifiers of the first category and the second category are obtained.
  • the short text information of several objects can be trained by using a maximum entropy model, a neural network algorithm, or a support vector machine to obtain a category recognition model.
  • the related technical means can adopt the training method in the prior art, and details are not described herein again.
  • the short text of the object to be processed is input to the category recognition model, and after the category recognition model is processed, the category of the object to be processed can be determined.
  • the object can include an image in addition to the text.
  • the user evaluation may have an image of the product in addition to the text (character user evaluation).
  • the object category determined by the short text information of the object alone is inaccurate because the image feature information of the object is not taken into consideration; similarly, the object type determined by using the image feature information of the object alone is not accurate. Because the short text information of the object is not taken into account. Therefore, in this embodiment, the short text information and the image feature information are combined, and the short text information and the image feature information are used together to determine the object category, thereby improving the accuracy of the object category.
  • the present application further provides an object classification method, in which a plurality of features of an object to be processed are used to classify objects. As shown in FIG. 13, the following steps are specifically included:
  • Step S1301 Determine feature information corresponding to the object to be processed; wherein the feature information includes short text information and image feature information, and the short text information includes an emotional tendency of the short text.
  • the processor can divide the object to be processed into a plurality of short texts by using punctuation marks, and each short text can determine its emotional tendency according to the process provided in FIG. 7 or FIG. 10 of the present application, so that each short text in the object to be processed can be determined. Emotional tendency.
  • the short text information may further include: the number of short texts belonging to positive emotions among the objects to be processed, the number of short texts belonging to negative emotions, the proportion of positive short texts, the proportion of negative short texts, and the like.
  • the processor can process the image to obtain image feature information.
  • the image feature information may include one or more of the following image features: image width, image height, number of faces in the image, number of subgraphs included in the image, whether the background of the image is a solid color, and the image includes a text area. What is the ratio, the number of main colors in the image significant area, the number of main colors of the image, the psoriasis score of the image, the quality score of the image body, the probability score of the image as a dummy model, the probability score of the real model in the image, and the product of the image display The probability score of the details and so on.
  • Step S1302 Perform category identification on the feature information according to the pre-trained category recognition model; wherein the category identification feature model is: a classifier of the first category and the second category trained according to the feature information of the plurality of objects .
  • the category recognition model is a classifier that outputs the first category and the second category after training using the short text information and the image feature information of a plurality of objects in advance.
  • the short text information of several objects can be trained by using a maximum entropy model, a neural network algorithm, or a support vector machine to obtain a category recognition model.
  • the related technical means can adopt the training method in the prior art, and details are not described herein again.
  • the short text of the object to be processed is sent to the category recognition model, thereby determining the category of the object to be processed.
  • the feature information may further include: the feature information of the object to be processed attached to the first body; and/or the object to be processed is attached to the second body Characteristic information.
  • the feature information may also be included, which will not be enumerated here.
  • the feature information attached to the first subject to be processed by the object to be processed is specifically: the attached information of the seller belongs to the seller (first subject), for example, the credit rating of the seller and the sales volume of the seller. Wait.
  • the feature information of the object to be processed attached to the second body is specifically: the attached information of the item belonging to the buyer (second body), for example, the credit rating of the buyer, the release of the non-default user evaluation data volume, and the release.
  • the feature information of the object has a plurality of feature information.
  • this implementation In this paper, a gradient lifting decision tree model is proposed to train several training samples to obtain a category recognition model.
  • the gradient lifting decision tree model is a lifting method based on the decision tree.
  • the gradient decision tree model includes multiple decision trees. The reason why multiple decision trees are adopted is that the single decision tree will be over-fitting due to excessive splitting, and the generalization ability will be lost. If the split is too small, it will cause insufficient learning. full.
  • the initial value F 0 may be a random value, or may be equal to 0.
  • the specific value may be determined according to the actual situation, and is not limited herein.
  • the M decision trees are linearly combined to obtain the final gradient decision tree model.
  • T i (X) represents the matching degree of the feature information of the object to be processed and a decision tree
  • ⁇ i represents the weight of a decision tree
  • M represents the total number of decision trees.
  • the gradient decision tree model uses multiple decision trees to achieve good results in both training precision and generalization ability.
  • the gradient lifting decision tree model is a boosting algorithm.
  • the gradient lifting decision tree model naturally contains the idea of boosting: combining a series of weak classifiers. Form a strong classifier. It does not require too much for each decision tree, each tree learns a little knowledge, and then adds up the knowledge learned by each decision tree to form a powerful model.
  • the application further provides an object classification method, as shown in FIG. 14 , which specifically includes the following steps:
  • Step S1401 Determine feature information corresponding to the object to be processed.
  • the feature information includes short text information, image feature information, feature information attached to the first object to be processed, and feature information attached to the second body to be processed.
  • the short text information includes an emotional tendency of short text.
  • the step may be: determining feature information of the user evaluation to be processed; wherein the feature information includes text feature information of the user evaluation, image feature information of the user evaluation, feature information of the seller, and buying Characteristic information of the home, and the text feature information includes an emotional tendency of the short text.
  • Step S1402 Identify the feature information and the pre-trained gradient promotion decision tree model.
  • this step is based on the pre-trained gradient lifting decision tree model, and classifying the feature information of the user evaluation to be processed; wherein the category recognition model is: based on several user evaluations The classifier of the first type of user evaluation and the classifier of the second type of user evaluation obtained after the training of the characteristic information of the sample.
  • this step includes the following steps:
  • Step S1501 Input the feature information into the category recognition model, that is, the gradient promotion decision tree model.
  • the gradient-proposed decision tree model has an M tree, and the feature information is matched with the M tree to obtain the category determined after matching each tree.
  • Step S1502 Determine a first category matching degree and a second category matching degree corresponding to the to-be-processed object.
  • the first category matching degree and the second category matching degree are determined according to the above formula 6.
  • the first category matching degree F 1 (X) F 0 + ⁇ 1 T 1 (X)+ ⁇ 2 T 2 (X)+... ⁇ i T i (X)...+ ⁇ M T M (X).
  • T i (X) represents the matching degree of the feature information with a tree
  • ⁇ i represents the weight corresponding to the tree. If a tree determines that the feature information corresponds to the first category, the weight is ⁇ i ; if a tree determines that the feature information corresponds to the second category, the weight is 0.
  • the second category matching degree F 2 (X) F 0 + ⁇ 1 T 1 (X)+ ⁇ 2 T 2 (X)+... ⁇ i T i (X)...+ ⁇ M T M (X).
  • T i (X) represents the matching degree of the feature information with a tree
  • ⁇ i represents the weight corresponding to the tree. If a tree determines that the feature information corresponds to the second category, the weight is ⁇ i ; if a tree determines that the feature information corresponds to the first category, the weight is 0.
  • Step S1503 Compare the first category matching degree and the second category matching degree. If the first category matching degree is greater than the second category matching degree, the process proceeds to step S1504; if the second category matching degree is greater than the first category matching degree, the process proceeds to step S1505.
  • Step S1504 Determine that the category of the object to be processed is the first category.
  • this step is to determine the category of the user evaluation to be processed as the first category.
  • the first category is the quality user evaluation
  • this step is to determine the category of the user evaluation to be processed as a quality user evaluation.
  • Step S1505 Determine that the category of the object to be processed is the second category.
  • this step is to determine the category of the user evaluation to be processed as the second category.
  • the second category is the inferior user evaluation, then this step is to determine the category of the user evaluation to be processed as a poor user evaluation.
  • the object to be processed After determining that the object to be processed is the first category, adding the object to be processed to the object set; and transmitting the object in the object set.
  • the object set can be used by other devices. During use, it can be filtered again to determine a plurality of better object samples, and then the object samples are sent to the processor, so that the processor can retrain the category by using the better object samples. Identify the model so that the category recognition model is more accurate. That is, the processor may receive a plurality of object samples derived from the set of objects; adding the plurality of object samples to existing object samples of the training category recognition model; based on the updated existing objects Sample, retrain the category recognition model.
  • the process is: after determining that the to-be-processed user evaluation is the first-type user evaluation, adding the to-be-processed user evaluation to the first-type user evaluation set; The first type of user evaluation set.
  • the first user evaluation set can be used by the user, and a better user evaluation can be determined in the first type of user evaluation set during use.
  • a better user rating can then be sent to the processing device in order for the processing device to retrain the category recognition model. That is, the system can form a closed loop system.
  • the processor receives a plurality of first type user evaluations, the first type of user evaluation is derived from the first type of user evaluation set; adding the plurality of first type user evaluations to the category identification model In some user evaluation samples, the category recognition model is retrained based on the updated existing user evaluation samples.
  • an object classification system including:
  • the data providing device 100 is configured to send a plurality of objects.
  • the processor 200 is configured to receive a plurality of objects sent by the data providing device, and obtain and output a class identification model of the first category and the second category according to the feature information of the plurality of objects; and determine feature information of the object to be processed
  • the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of the short text; and classifying the feature information of the object to be processed according to the category recognition model; Also used to output objects of the first category.
  • the data receiving device 400 is configured to receive and use the object of the first category.
  • the data receiving device 400 may again determine a plurality of better object samples through screening, and then retransmit the object samples to the processor 200, so that the processor retrains the category by using the better object samples. Identify the model so that the category recognition model is more accurate.
  • an object classification system including:
  • the data providing device 100 is configured to send a plurality of objects.
  • the model construction device 300 is configured to receive a plurality of objects sent by the data providing device, and obtain and output a category identification model of the first category and the second category according to the feature information of the plurality of objects, and send the category identification model. .
  • the processor 200 is configured to receive the category identification model, and determine feature information of the object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of the short text And identifying the model according to the category, performing feature identification on the feature information of the object to be processed; and outputting the object of the first category.
  • the data receiving device 400 is configured to receive and use the object of the first category.
  • the data receiving device 400 may again determine a plurality of better object samples through screening, and then retransmit the object samples to the processor 200, so that the processor retrains the category by using the better object samples. Identify the model so that the category recognition model is more accurate.
  • Short text-based recognition techniques are relatively easy to implement, but there are some limitations: not paying attention to image information published by buyers in user reviews. In actual scenes, such as apparel, the user does not only care about the text description part of the user evaluation, but also the real appearance of the product, that is, the image feature information.
  • the recognition technique based on image features is effective, but it also has certain limitations.
  • the high-quality user evaluation and recognition technology based on image features only uses the image information in the user evaluation to identify, and does not care about the experience of the purchaser after the specific purchase, that is, short text information. Therefore, it can be seen that the short text information and the image feature information in the user evaluation are equally important.
  • the Applicant has found that there are other features that can be helpful in determining quality user ratings. For example, seller characteristics and buyer characteristics. Therefore, in the embodiment, the above features are used as the basis for determining the user's evaluation as a high-quality user evaluation or a poor user evaluation.
  • the present embodiment proposes a machine learning method based on a plurality of feature fusions, that is, a gradient lifting decision tree model, to train a plurality of training samples, thereby obtaining a category recognition model.
  • FIG. 18 a flow chart for determining a quality user rating is provided for the present application.
  • the process of quality user evaluation can be clearly determined from the figure. It is mainly composed of three parts:
  • the pre-processing rules can be: some requirements that must be met for images and text in high-quality user evaluation, that is, using a small number of text and features of a small number of dimensions in the image features to filter a large number of user ratings.
  • the short texts in the high-quality user evaluation cannot be negative emotions. Based on this, if the short texts in the user evaluation all correspond to the negative emotions, it is determined that the quality is not a good user evaluation.
  • the resolution of the image reaches the preset resolution, the image is a non-conversation screenshot, the obvious advertising slogan in the image, and the watermark ratio is less than the preset value, and so on.
  • User evaluations in the user evaluation server that satisfy the above short text requirements and image feature requirements are placed in the user evaluation library. For user evaluations that do not meet short text requirements and image feature requirements, these user reviews are judged as good user ratings and are not placed in the user evaluation library.
  • some non-premium user evaluations can be filtered out, which not only can reduce the number of times of high-quality user evaluation and recognition models, but also effectively filter out non-quality user evaluations and improve the accuracy of high-quality user evaluation and recognition models. rate.
  • the user evaluation in the user evaluation library is identified by the high-quality user evaluation recognition model, and if the recognition result is a high-quality user evaluation, it is placed in the high-quality user evaluation set.
  • the data receiving device can obtain high-quality user evaluation from the high-quality user evaluation set and use the high-quality evaluation in the actual application process.
  • the data receiving device re-evaluates the high-quality user evaluation in the high-quality evaluation set according to the preset criteria, thereby screening out the high-quality user evaluation that meets the preset criteria.
  • the premium user ratings that meet the pre-set criteria are then sent to the processor or model building device for the processor or model building device to iteratively update the premium user rating recognition model.
  • the quality user evaluation model is re-trained by high-quality user evaluation that meets the pre-set criteria, so that the high-quality user evaluation and recognition model can output the high-quality user evaluation that meets the user's needs as much as possible.
  • the high-quality user evaluations selected in the high-quality user evaluation collection meet the preset rules of the seller or the operating personnel, these high-quality user evaluations are re-added to the user evaluation database, and the update and optimization of the quality user evaluation recognition model is re-optimized so that The high-quality user evaluation recognition model better identifies high-quality user evaluations that meet user expectations.
  • the user can no longer need to select one from the original user evaluation library, and only needs to select the high-quality user evaluation set to quickly obtain the high-quality user evaluation, thereby effectively reducing the labor cost.
  • the high-quality user evaluation model can effectively iteratively update with the high-quality user evaluation provided by the merchant, thereby further identifying the high-quality user evaluation that meets the merchant's expectations.
  • the functions described in the method of the present embodiment can be stored in a computing device readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a
  • the computing device (which may be a personal computer, server, mobile computing device, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请提供了一种情感倾向的识别方法、对象分类方法及数据处理***。本申请在情感倾向的识别方法中构建的情感度估测模型充分考虑了短文本所属的类目,所以,基于情感度估测模型确定出情感倾向更加准确。此外,本申请提供的对象分类方法中,将对象的文本特征信息、图像特征信息以及其它特征信息一并作为对象分类的依据,所以本申请提供的对象分类方法、可以兼顾文本特征信息、图像特征信息和其他特征信息,这样可以提高分类的准确率。

Description

情感倾向的识别方法、对象分类方法及数据处理*** 技术领域
本申请涉及数据处理技术领域,尤其涉及情感倾向的识别方法、对象分类方法及数据处理***。
背景技术
目前,在很多技术领域都涉及对对象进行分类的问题,通常而言,依据对象的文本对对象进行分类,将对象分为两个类别:第一类别或第二类别。在对象的文本中,按标点符号可以将文本为多个短文本。
由于汉字的词义丰富,在不同的语境下相同的短文本可能对应不同的类别。例如,以对象为衣服用户评价文本为例,第一条用户评价为“衣服颜色暗淡,刚好”,第二条用户评价为“衣服颜色暗淡,不鲜亮”。上述两个对象具有相同的短文本“衣服颜色暗淡”。若按文本进行分类,则会将两个短文本归为一类,可是两者理应对应不同的类别。
可以看出在不同语境中,第一条用户评价中的“衣服颜色暗淡”对应正面情感,理应分为第一类别;第二条用户评价中的“衣服颜色暗淡”对应负面情感,理应分为第二类别。因此,目前通常利用短文本对应的情感倾向来确定对象的类别。
为了确定短文本的情感倾向,传统方式通常为人工查看并确定短文本的情感倾向。虽然人工标注确定短文本的情感倾向的准确率较高,但是效率较低,无法适用于批量短文本的处理。
发明内容
本申请的申请人在研究过程中发现:可以利用处理器自动识别短文本的情感倾向。具体实现过程可以为:
在处理器具体执行之前,先构建情感词库。情感词库包含很多正面词汇,例如,“衣服”“屏幕大”“漂亮”、“快速”、“合适”、“美丽”等,情感词库也包含很多负面词汇,例如,“衣服”“难看”、“慢速”、“屏幕小”等。
为了对待处理对象进行处理,首先对待处理对象按标点符号进行切分,相邻两个标点符号之间为一个短文本,从而将待处理对象切分为若干个待处理短文本。例如,以“衣服很合适,老妈很喜欢”为例,按照标点符号切分后,可以获得两个短文本“衣服很合适”和“老妈很喜欢”。待处理对象的每个短文本,均为待处理短文本。
参见图1,为处理器确定待处理短文本的情感倾向的流程图,执行过程具体包括以下步骤:
步骤1:处理器对待处理短文本进行分词,获得分词结果。
按照预设分词规则,将待处理短文本分为若干词语,若干词语均为分词结果。
例如,以待处理短文本为“衣服很合适”为例,在分词后获得的结果为“衣服”、“很”和“合适”。以待处理短文本为“手机屏幕很大”,则分词后获得的分词结果为“手机”、“屏幕”、“很”和“大”。
由于对待处理短文本进行分词,并不是本申请关注的重点,在此不再详细描述预设分词规则的具体实现方式。
步骤2:将分词结果与情感词库,按情感匹配规则进行匹配。
步骤3:确定与待处理短文本对应的情感倾向。
将分词结果、与情感词库和情感规则进行匹配,若分词结果中分词均对应正面情感且不包含否定词,则确定短文本对应正面情感。若分词结果中情感词均对应负面情感且不包含否定词,则确定短文本对应负面情感。
处理器可以自动执行图1所示的过程,从而可以自动确定待处理短文本的情感倾向。但是,本申请申请人在研究过程中发现:虽然上述自动处理过程在一定程度上可以识别待处理短文本的情感倾向,但是,上述处理过程获得的待处理短文本的情感倾向可能不准确。
例如,以对象为淘宝网上的用户评价为例,由于淘宝网上有很多类目(例如服饰类目、电子设备类目、母婴类目等),各个类目的物品均有相应的用户评价。申请人在研究过程中发现:在不同类目下包含相同情感词的短文本可能对应不同的情感倾向。
比如,在电子设备类目下、一个短文本为“屏幕很大”,该短文本的情感倾向为正面情感。在服饰类目下、一个短文本为“衣服很大”,该短文本的情感倾向为负面情感。从上述举例可以看出,在两个不同类目下、两个短文本均有“很大”,所以两个短文本包含相同的情感词,但是这两个短文本却具有不同的情感倾向。
由于上述图1中处理器自动确定短文本的情感倾向的过程中,处理器针对所有对象均采用同样的处理方式,即现有的处理过程没有从对象类目的角度、来分别处理短文本的情感倾向,所以,现有技术中确定短文本的情感倾向不准确。
因此,本申请提供一种情感倾向的识别方法,以便可以准确确定待处理短文本的情感倾向。
为了实现上述目的,本申请提供了以下技术特征:
一种情感倾向的识别方法,包括:
确定待处理短文本对应类目标识;其中,一个文本相邻两个标点符号之间文字称为短文本;
确定与所述类目标识对应的情感度估测模型的实现方式;
若所述情感度估测模型的实现方式为所有类目对应一个情感度估测模型,则确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词和所述待处理短文本所属的类目标识;依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两种类目的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向;
若所述情感度估测模型的实现方式为一个类目对应一个情感度估测模型,确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词;依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;基于所述待 处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
优选的,在确定所述待处理短文本对应的情感倾向后,还包括:
输出所述待处理短文本对应的情感倾向。
一种情感倾向的识别方法,包括:
确定待处理短文本对应的特征集合;其中,一个文本相邻两个标点符号之间的文字称为短文本;所述特征集合中每个特征包括:所述待处理短文本的分词和所述待处理短文本所属的类目标识;
依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两种类目的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;
基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
优选的,所述确定待处理短文本对应的特征集合,包括:
获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果;
将所述分词结果中的各个分词和所述类目标识进行组合,获得各个特征;
将各个特征的集合,确定为所述待处理短文本的特征集合。
优选的,所述确定待处理短文本对应的特征集合,包括:
获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果;
将所述分词结果中的各个分词和所述类目标识进行组合,获得各个特征;
利用n元语言模型对所述各个特征进行特征组合,获得若干个组合特征;
将各个特征和所述若干个组合特征的集合,确定为所述待处理短文本的特征集合。
优选的,所述利用n元语言模型对所述各个特征进行特征组合,获得若干个组合特征,包括:
利用二元语言模型对所述各个特征进行特征组合,获得若干个组合特征。
优选的,所述依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测,包括:
将所述特征集合输入至所述情感度估测模型;
由所述情感度估测模型估算后、输出待处理短文本对应的正面情感度和负面情感度。
优选的,所述基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向,包括:
确定所述正面情感度和所述负面情感度两者中的较大情感度;
判断所述较大情感度是否大于预设置信度;
若所述较大情感度大于预设置信度,则确定所述待处理短文本对应的情感倾向与所述较大情感度的情感倾向一致。
优选的,所述情感度估测模型包括:
利用最大熵模型,依据至少两个类目标识对应的若干个短文本的特征集合训练后得到的、输出正面情感度和负面情感度的模型。
优选的,在确定所述待处理短文本对应的情感倾向后,还包括:
输出所述待处理短文本对应的情感倾向。
一种情感倾向的识别方法,包括:
确定待处理短文本对应的特征集合和类目标识;其中,一个文本相邻两个标点符号之间的文字称为短文本;所述特征集合中每个特征包括:所述待处理短文本的分词;
依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;
基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
优选的,所述确定待处理短文本对应的特征集合,包括:
获取所述待处理短文本执行分词操作后获得的分词结果;
利用n元语言模型对各个分词进行分词组合,获得若干个组合分词;
将各个分词和若干个组合分词的集合,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
优选的,所述确定待处理短文本对应的特征集合,包括:
获取所述待处理短文本执行分词操作后获得的分词结果;
将所述分词结果,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
优选的,在确定所述待处理短文本对应的情感倾向后,还包括:
输出所述待处理短文本对应的情感倾向。
一种情感倾向的识别***,包括:
数据提供设备,用于发送若干个对象;
处理器,用于接收所述数据提供设备送的若干个对象,依据若干个对象的短文本构建情感度估测模型,并利用情感度估测模型确定待处理短文本的情感倾向。
优选的,所述处理器,还用于构建情感度估测模型与对象所属的类目标识的对应关系。
优选的,所述***还包括接收设备;
所述处理器,还用于输出所述待处理文本的情感倾向;
所述接收设备,用于接收所述待处理文本的情感倾向。
一种情感倾向的识别***,包括:
数据提供设备,用于发送若干个对象;
模型构建设备,用于接收所述数据提供设备送的若干个对象,依据若干个对象的短文本构建情感度估测模型,并发送所述情感度估测模型;
处理器,用于接收所述情感度估测模型,并利用情感度估测模型确定待处理短文本的情感倾向。
优选的,所述模型构建设备,还用于构建情感度估测模型与对象所属的类目标识的对应关系,并将对应关系发送至所述处理器。
优选的,所述***还包括接收设备;
所述处理器,还用于输出所述待处理文本的情感倾向;
所述接收设备,用于接收所述待处理文本的情感倾向。
一种对象分类方法,包括:
确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;
依据预先训练的类别识别模型,对所述待处理对象的特征信息进行类别识别;其中,所述类别识别模型为:依据若干对象样本的特征信息训练后得到的、第一类别和第二类别的分类器。
优选的,所述特征信息还包括:
构建所述对象的第一主体的特征信息;和/或,
所述对象所附属于第二主体的特征信息。
优选的,所述依据预先训练的类别识别模型,对所述特征信息进行类别识别,包括:
将所述特征信息输入至所述类别识别模型;确定所述待处理对象对应的第一类别匹配度和第二类别匹配度;
对所述第一类别匹配度和第二类别匹配度进行比较;
若第一类别匹配度大于第二类别匹配度,则确定所述待处理对象的类别为第一类别;
若第二类别匹配度大于第一类别匹配度,则确定所述待处理对象的类别为第二类别。
优选的,还包括:
在确定所述待处理对象为第一类别之后,将所述待处理对象添加至对象集合中;
发送所述对象集合中的对象。
优选的,还包括:
接收多个对象样本,所述对象样本来源于所述对象集合,且,满足预设规则;
将所述多个对象样本,添加至训练类别识别模型的已有对象样本中;
基于更新后的已有对象样本,重新训练类别识别模型。
一种用户评价的分类方法,包括:
确定待处理用户评价的特征信息;其中,所述特征信息包括用户评价的文本特征信息、用户评价的图像特征信息、卖家的特征信息和买家的特征信息,并且,所述文本特征信息包括短文本的情感倾向;
依据预先训练的梯度提升决策树模型,对所述待处理用户评价的特征信息进行类别识别;其中,所述类别识别模型为:依据若干用户评价样本的特征信息训练后得到的、第一类用户评价和第二类用户评价的分类器。
优选的,还包括:
在确定所述待处理用户评价为第一类用户评价之后,将所述待处理用户评价添加至第一类用户评价集合中;
发送所述第一类用户评价集合。
优选的,还包括:
接收多个第一类用户评价,所述第一类用户评价来源于所述第一类用户评价集合;
将所述多个第一类用户评价,添加至类别识别模型已有的用户评价样本中;
基于更新后的已有的用户评价样本,重新训练类别识别模型。
一种对象分类***,包括:
数据提供设备,用于发送若干个对象;
处理器,用于接收所述数据提供设备送的若干个对象,依据若干对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型;用于确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象;
数据接收设备,用于接收并使用所述第一类别的对象。
一种对象分类***,包括:
数据提供设备,用于发送若干个对象;
模型构建设备,用于接收所述数据提供设备送的若干个对象,依据若干个对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型,并发送所述类别识别模型;
处理器,用于接收所述类别识别模型,并确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特 征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象;
数据接收设备,用于接收并使用所述第一类别的对象。
通过以上技术手段,可以实现以下有益效果:
本申请提供一种情感倾向的识别方法,本方法利用与类目对应的若干个带有情感倾向的短文本作为训练样本,获取短文本的特征集合进行训练,并获得情感度估测模型。由于每个特征包含短文本的分词和类目标识,所以,申请构建的情感度估测模型充分考虑了短文本所属的类目。因此,基于情感度估测模型确定出的待处理短文本的情感倾向也更加准确。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为现有技术确定待处理短文本的情感倾向的流程图;
图2a-2b为本申请实施例提供的情感倾向的识别***的结构示意图;
图3a-3c为本申请实施例提供的情感度估测模型与类目的对应关系的示意图;
图4a-4c为本申请实施例提供的构建情感度估测模型的流程图;
图5为本申请实施例提供的又一构建情感度估测模型的流程图;
图6a-6b为本申请实施例提供的又一构建情感度估测模型的流程图;
图7为本申请实施例提供的情感倾向的识别方法的流程图;
图8a-8b为本申请实施例提供的情感倾向的识别方法的流程图;
图9为本申请实施例提供的情感倾向的识别方法的流程图;
图10为本申请实施例提供的情感倾向的识别方法的流程图;
图11a-11b为本申请实施例提供的情感倾向的识别方法的流程图;
图12为本申请实施例提供的对象分类方法的流程图;
图13为本申请实施例提供的又一对象分类方法的流程图;
图14为本申请实施例提供的又一对象分类方法的流程图;
图15为本申请实施例提供的又一对象分类方法的流程图;
图16为本申请实施例提供的一种对象分类***的结构示意图;
图17为本申请实施例提供的又一种对象分类***的结构示意图;
图18为本申请实施例提供的对象分类方法的场景实施例的流程图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了准确确定待处理短文本的情感倾向,本申请提出构建情感度估测模型的技术手段,以利用情感度估测模型来估测待处理短文本对应的正面情感度和负面情感度。其中,正面情感度用于表示待处理短文本属于正面情感的程度,同理,负面情感度用于表示待处理短文本属于负面情感的程度。在确定正面情感度和负面情感度之后,可以进一步确定待处理短文本的情感倾向。
为了使本领域技术人员更加清楚了解本申请的应用场景,参见图2a或图2b,为本申请提供了情感倾向的识别***。
图2a提供的情感倾向的识别***具体包括:数据提供设备100,与数据提供设备100相连的处理器200。
其中,数据提供设备100用于向处理器200发送若干个对象。处理器200,用于依据若干个对象的短文本构建情感度估测模型,并利用情感度估测模型确定待处理短文本的情感倾向。
本申请还提供另一种情感倾向的识别***(参见图2b)。
图2b提供的情感倾向的识别***具体包括:数据提供设备100,与数据提供设备相连的模型构建设备300,与所述模型构建设备相连的处理器200。模型构建设备300可以为具有处理能力的处理设备。
其中,数据提供设备100用于向模型构建设备300发送若干个对象。模型构建设备300,用于依据若干个对象的短文本构建情感度估测模型,并将情 感度估测模型发送至处理器200。处理器200,用于利用情感度估测模型确定待处理短文本的情感倾向。
在图2a和图2b提出的情感倾向的识别***中,处理器200和模型构建设备300均可以执行构建情感度估测模型的过程,并且,两者构建情感度估测模型的过程是一致的。因此,将处理器200或模型构建设备300统称为处理设备,以便在下述介绍构建情感度估测模型的过程中,采用处理设备来统一表示处理器200或模型构建设备300。
在图2a和图2b所示的***中还可以包括与处理器相连的接收设备(图示中未示出)。在处理器确定待处理短文本的情感倾向后,处理器,还用于输出所述待处理文本的情感倾向;所述接收设备,用于接收所述待处理文本的情感倾向,以便接收设备可以利用待处理文本的情感倾向执行其它处理过程。
下面介绍构建情感度估测模型的过程。由于现有技术确定待处理短文本的情感倾向的过程中未考虑短文本的类目,所以现有技术中确定出的情感倾向不准确。因此,本申请在处理设备构建情感度估测模型的过程中考虑短文本的类目,以便构建的情感度估测模型可以准确确定出待处理短文本的正面情感度和负面情感度。
本申请提出处理设备构建情感度估测模型的三种实现方式,参见图3a-3c为三种实现方式中类目与情感度估测模型的示意图。
第一种实现方式:所有类目对应一个情感度估测模型(参见图3a)。第二种实现方式:每个类目对应一个情感度估测模型(参见图3b)。第三种实现方式:介于第一种实现方式和第二种实现方式之间的一种实现方式(参见图3c);假设有N个类目,则第三种实现方式可以构建M个情感度估测模型,其中,M为非零自然数,且,1<M<N。
下面详细介绍这三种实现方式的具体实现过程:
第一种实现方式:所有类目对应一个情感度估测模型。
为了准确确定各个类目下的短文本对应的情感倾向,本实现方式为所有类目构建一个对应的情感度估测模型。
参见图4a,为所有类目对应的情感度估测模型的过程,具体包括以下步骤:
步骤S401:确定用于构建情感度估测模型的短文本样本。
a)获取数据提供设备发送的各个类目下的若干个对象,并对每个对象进行切分,获得每个对象的短文本集合。
数据提供设备可以向处理设备发送各个类目下的对象,处理设备可以获取每个类目下的多个对象。为了方便后续处理,处理设备可以对每个对象按标点符号进行切分,从而将每个对象切分为多个短文本。
例如,以对象为淘宝用户评价为例,在服饰类目下的一个用户评价“衣服很合适,老妈很喜欢”,则按照标点符号切分后,可以获得两个短文本“衣服很合适”和“老妈很喜欢”。目标短文本。例如,在电子设备类目下的一个用户评价“手机屏幕很大,外观很漂亮”,则按照标点符号切分后,可以获得两个短文本“手机屏幕很大”和“外观很漂亮”。
b)在所有的短文本中筛选出用于构建情感度估测模型的短文本样本。
经过实验发现,图1所示的执行过程,确定一个短文本属于正面情感的准确率较高,确定一个短文本属于负面情感的准确率较低。
因此,在本步骤中处理设备可以将每个短文本执行如图1所示的过程,若按图1所示的过程,确定出一个短文本对应正面情感。那么,确定该短文本可以用于构建情感度估测模型,且,该短文本对应正面情感。
若按图1所示的过程,确定一个短文本对应负面情感。那么,再由人工进行进一步的确认。若一个短文本在人工确认后属于负面情感,则确定该短文本可以用于构建情感度估测模型,且,该短文本对应负面情感。
若一个短文本在人工确认后属于正面情感,则说明该短文本的特点不明显,不适合作为构建情感度估测模型的短文本。因此则丢弃该短文本。
步骤S402:确定每个短文本对应的特征集合。
在步骤S401使用图1所示的过程中可以获得每个短文本的分词结果(详见图1中步骤1,在此不再赘述)。然后,进一步确定每个短文本对应的特征集合。
本步骤可以有两种执行方式,两种方式的区别在于:第一种方式确定出的特征集合中包含组合特征,而第二种方式中确定出的特征集合不包含组合特征。
由于确定每个短文本对应的特征集合均是一致的,因此,以一个目标短文本为例,对确定目标短文本的特征集合的过程进行详细介绍。
参见图4b,为确定目标短文本的特征集合的第一种执行方式的具体过程:
步骤411:获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果。
处理设备在步骤S301中已经获得目标短文本的分词结果。由于目标短文本与待处理对象的类目是一致的,因此,处理设备可以将待处理对象的类目标识,确定为目标短文本的类目标识。
以目标短文本属于服饰类目,且为“衣服很大”为例,目标短文本对应的分词结果为“衣服”“很”和“大”,假设服饰类目的标识为“16”,则目标短文本的对应的类目标识为“16”。
以目标短文本属于电子设备类目,且为“屏幕很大”为例,目标短文本对应的分词结果为“屏幕”“很”和“大”,假设电子设备类目的标识为“10”,则目标短文本的对应的类目标识为“10”。
步骤412:将各个分词和所述类目标识进行组合,获得各个特征。
由于不同类目下的短文本对应的分词可能是一致的,因此,为了充分考虑类目对短文本的影响,本申请将各个分词与类目进行组合,获得各个特征。
由于特征包含了类目标识,并且,不同类目的标识是不同的,所以采用特征可以准确区分不同类目的分词。这样,训练得到的情感度估测模型可以准确区分不同类目下的相同分词。
继续延续上述举例,以目标短文本“衣服很大”为例,则目标短文本对应的各个特征可以为“衣服16”“很16”和“大16”。以目标短文本“屏幕很大”为例,则目标短文本对应的各个特征可以为“屏幕10”“很10”和“大10”。站在特征角度,处理设备可以分辨出分词“大16”和“大10”是两个不同的特征,且两个特征属于不同的类目。
在本举例中,分词和类目标识的组合方式为分词在前、类目标标识后,还可以是类目标识在前、分词在后。当然,分词和类目标识还可以有其它组合方式,在此不做限定。
步骤413:对各个特征进行n元组合,获得若干个组合特征。
因为,通过研究过程中发现,一些特征具有固定搭配,例如“没有色差”、“没有掉色”、“没有起球”等等。对于这种固定搭配,由于两个词均是负面情感的词汇,但是两者叠加起来表达则为正面情感,所以这样的词汇若分开的话会造成一定的误判。因此,本实施例可以进行特征组合。
具体而言,为利用n元语言模型对每个短文本的各个特征进行组合。n为非零自然数,n元语言模型中的一个元对应短文本中的一个分词。n元语言模型进行特征组合具体为:将相邻的n个特征合并在一起,将n-1个特征合并在一起,直到将2个特征合并在一起。
以n=2为例,若目标短文本的各个特征为“衣服16”、“很16”和“大16”,则利用二元语言模型进行特征组合后,获得组合特征为“衣服16很16”和“很16大16”。
以n=3为例,若目标短文本的各个特征为“衣服16”、“很16”和“大16”,则进行三元语言模型进行特征组合后,获得组合特征为“衣服16很16大16”、“衣服16很16”和“很16大16”。
步骤414:将各个特征和若干个组合特征的集合,确定为所述目标短文本的特征集合。
延续上述实施例,则以二元语言模型进行特征组合为例,则最终获得的目标短文本的特征集合包括:“衣服16”、“很16”、“大16”、“衣服16很16”和“很16大16”。
参见图4c,为确定目标短文本的特征集合的第二种执行方式的具体过程:
步骤421:获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果。
步骤422:将各个分词和所述类目标识进行组合,获得各个特征。
图4c中的步骤S421和步骤S422的执行过程与图4b中的步骤S411和步骤S412一致,在不再赘述。
步骤423:将各个特征的集合,确定为所述目标短文本的特征集合。
在图4c的执行过程中缺少进行特征组合的步骤,所以,可以直接将步骤S422中确定的各个特征的集合,确定为目标短文本的特征集合。
以目标短文本为“衣服很大”为例,则按图4c执行后最终获得的目标短文本的特征集合包括:“衣服16”、“很16”、“大16”。
接着返回图4a,进入步骤S403:确定各个短文本对应特征集合中每个特征的情感倾向,以及每个特征的正面情感度和负面情感度,并将各个特征以及各个特征对应的情感倾向、正面情感度和负面情感度,作为情感度估测模型的输入参数。
在步骤S401执行图1实施例的过程中,已经确定短文本的情感倾向。由于各个特征的情感倾向与短文本的情感倾向是一致的。因此,在短文本对应正面情感时,确定特征集合中每个特征对应正面情感;在短文本对应负面情感时,确定特征集合中每个特征对应负面情感。
以一个特征为例,对确定特征的正面情感度和负面情感度的过程进行详细介绍。处理设备可以获得很多数量的同一个特征,并且,该特征对应的情感倾向可能相同,可能不同。
因此,处理设备可以统计该特征的总数量,并统计属于正面情感的第一数量,以及属于负面情感的第二数量。依据第一数量与总数量的比例关系,确定该特征的正面情感度;依据第一数量与总数量的比例关系,确定该特征的负面情感度。
步骤S404:按照预设分类器模型进行训练,并获得训练后得到的情感度估测模型。
预设分类器模型可以包括最大熵模型、支持向量机、神经网络算法等等。有关训练过程已有相关技术手段,在此不再赘述。
下面介绍处理设备构建情感度估测模型的第二种实现方式,在第二种实现方式中为每个类目构建一个情感度估测模型,因此,由于每个情感度估测模型中只有一个类目,所以在第二种实现方式中分词即相当于特征,因此在第二种实现方式中无需将分词和类目标识进行组合。
由于每个类目对应的情感度估测模型的构建过程是一致的。因此,以一个目标类目为例,对构建目标类目对应的目标情感度估测模型的过程进行详细介绍。
参见图5,构建目标情感度估测模型的过程具体包括以下步骤:
步骤S501:确定构建目标情感度估测模型的短文本样本。
a)获取数据提供设备发送的目标类目下的若干个对象,并对每个对象进行切分,获得每个对象的短文本集合。
b)在所有的短文本中筛选出用于构建情感度估测模型的短文本。
步骤S501的具体执行过程与步骤S401的执行过程类似,在此不再赘述。
步骤S502:确定每个短文本对应的特征集合。
在步骤S501使用图1所示的过程中可以获得每个短文本的分词结果(详见图1中步骤1,在此不再赘述)。然后,进一步确定每个短文本对应的特征集合。本步骤可以有两种执行方式,两种方式的区别在于:第一种方式确定出的特征集合中包含组合特征,而第二种方式中确定出的特征集合不包含组合特征。
由于确定每个短文本对应的特征集合均是一致的,因此,以一个目标短文本为例,对确定目标短文本的特征集合的过程进行详细介绍。
参见图6a,为确定目标短文本的特征集合的第一种执行方式的具体过程:
步骤601:获取所述目标短文本对应的分词结果,每个分词对应一个特征。
步骤602:对所述各个特征进行n元组合,获得若干个组合特征。
步骤603:将各个特征和若干个组合特征的集合,确定为所述目标短文本的特征集合。
以待处理短文本为“衣服很大”,以二元语言模型进行特征组合为例,则本实施例最终获得的目标短文本的特征集合包括:“衣服”、“很”、“大”、“衣服很”和“很大”。
参见图6b,为确定目标短文本的特征集合的第二种执行方式的具体过程:
步骤611:获取所述目标短文本对应的分词结果,每个分词对应一个特征。
步骤612:将分词结果,确定为所述目标短文本的特征集合。
在图6b的执行过程中缺少进行特征组合的步骤,所以,可以直接将步骤S611中确定的各个特征的集合,确定为目标短文本的特征集合。
以目标短文本为“衣服很大”为例,则按图6b执行后最终获得的目标短文本的特征集合包括:“衣服”、“很”、“大”。
接着返回图5,进入步骤S503:确定目标类目下各个短文本对应特征集合中每个特征的情感倾向,以及每个特征的正面情感度和负面情感度,并将目标类目下各个特征以及各个特征对应的情感倾向、正面情感度和负面情感度,作为目标情感度估测模型的输入参数。
在步骤S501执行图1实施例的过程中,已经确定各个短文本的情感倾向。由于各个特征的情感倾向与短文本的情感倾向是一致的。因此,在短文本对应正面情感时,确定特征集合中每个特征对应正面情感;在短文本对应负面情感时,确定特征集合中每个特征对应负面情感。
步骤S504:按照预设分类器模型进行训练,并获得训练后得到的目标情感度估测模型。
预设分类器模型可以包括最大熵模型、支持向量机、神经网络算法等等。有关训练过程已有相关技术手段,在此不再赘述。
图5为构建一个类目的情感度估测模型,图3为构建所有类目的情感度估测模型的过程,两者的处理步骤很类似,因此,图5的实施例的执行过程,可以参考图4的具体执行过程,在此不再赘述。
在第二实现方式中,每个类目对应一个情感度估测模型。因此,为了避免混淆,处理设备在一个情感度估测模型构建完毕之后,还会构建情感度估测模型与类目标识之间的映射,以便后续处理器在使用时,可以准确确定与每个类目对应的情感度估测模型。
下面介绍处理设备构建情感度估测模型的第三种实现方式。
在第三种实现方式中,可以包括:两个或两个以上的类目对应的情感度估测模型,和/或,一个类目对应的情感度估测模型。两个或两个以上类目对应的情感估测模型的构建过程,可以参考图4所示的实施例。一个类目对应的情感度估测模型,可参考图5所示的实施例,在此不再赘述。
结合图2a和图2b,若上述构建情感度估测模型的处理设备为处理器200自身的情况下,处理器200完成情感度估测模型后,便可以直接使用,以利用情感度估测模型确定待处理短文本的情感倾向。
在处理设备为模型构建设备300的情况下,模型构建设备300会将情感度估测模型发送至处理器200,以便处理器200利用情感度估测模型确定待处理短文本的情感倾向。
下面介绍处理器200依据情感度估测模型确定待处理短文本的情感倾向的过程。由于情感度估测模型有三种不同的实现方式,在不同实现方式下,处理器200的执行过程也不尽相同,所以,下面分别介绍在情感度估测模型的不同实现方式下,处理器的执行过程。
第一种:
在情感度估测模型采用第一种实现方式(所有类目对应一个情感度估测模型)实现的情况下,处理器200采用以下方式来确定待处理的短文本的情感倾向。
参见图7,本申请一种情感倾向的识别方法,具体包括以下步骤:
步骤S701:确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:待处理短文本的分词和所述待处理文本所属的类目标识。
假设第一种实现方式在确定情感度估测模型的过程中,采用第一种执行方式确定短文本的特征集合;则在本步骤中也采用第一种执行方式确定待处理短文本特征集合。
参见图8a,确定待处理短文本对应的特征集合的第一种执行方式,具体包括以下步骤:
步骤S801:获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果。
步骤S802:将分词结果中的各个分词和所述类目标识进行组合,获得各个特征。
步骤S803:对所述各个特征进行n元组合,获得若干个组合特征。
步骤S804:将各个特征和若干个组合特征的集合,确定为所述待处理短文本的特征集合。
图8a的执行过程可参见图4a的执行过程,在此不再赘述。
假设第一种实现方式在确定情感度估测模型的过程中,采用第二种执行方式确定短文本的特征集合;则在本步骤中也采用第二种执行方式确定待处理短文本的特征集合。
参见图8b,确定待处理短文本对应的特征集合的第二种执行方式,具体包括以下步骤:
步骤S811:获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果。
步骤S812:将分词结果中的各个分词和所述类目标识进行组合,获得各个特征。
步骤S813:将各个特征的集合,确定为所述待处理短文本的特征集合。
图8b的执行过程可参见图4b的执行过程,在此不再赘述。
接着返回图7,步骤S702:依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两个类目、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型。
处理器将所述特征集合输入至所述情感度估测模型,由所述情感度估测模型估算后输出所述特征集合对应的正面情感度和负面情感度。
步骤S703:基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
在确定所述待处理短文本对应的情感倾向,还可以输出所述待处理短文本对应的情感倾向,以便进行其它方面的使用。
在步骤S702中估测得到待处理短文本属于正面情感的正面情感度,以及待处理文本属于负面情感的负面情感度之后,为了进一步确定待处理短文本的情感倾向,可以将正面情感度与负面情感度进行对比。若正面情感度大于负面情感度,则确定待处理短文本属于对应正面情感;若负面情感度大于正面情感度,则确定待处理短文本对应负面情感。
在一些情况下,正面情感度和负面情感度相差不大。以情感度采用概率表示为例,正面情感度的概率值为0.51,负面情感度的概率值为0.49。可以理解的是,由于正面情感度和负面情感度非常接近,所以理论上是无法准确确 定待处理短文本的情感倾向的。但是,在此情况下,仍然按照上段方式确定待处理短文本的情感倾向,则会出现误差。
因此,参见图9,本申请提供以下方式来待处理短文本的情感倾向。
步骤S901:确定所述正面情感度和所述负面情感度两者中的较大情感度。
将正面情感度和负面情感度进行对比,确定两者中的较大情感度。若正面情感度大于负面情感度,则确定正面情感度为较大情感度;若负面情感度大于正面情感度,则确定负面情感度为较大情感度。
步骤S902:判断所述较大情感度是否大于预设置信度。
为了判定较大情感度是否可信,本申请预先设定了预设置信度。预设置信度为确定较大情感度可信的程度。然后,判断较大情感度与预设置信度的大小。
步骤S903:若所述较大情感度大于预设置信度,则确定所述待处理短文本对应的情感倾向与所述较大情感度的情感倾向一致。
若较大情感度大于预设置信度,则确定较大情感度的可信度较高。因此,可以准确确定待处理短文本的情感倾向。此时,待处理短文本的情感倾向与较大情感度的情感倾向一致。
即,若较大情感度对应正面情感度,则确定待处理短文本属于对应正面情感;若较大情感度对应负面情感度,则确定待处理短文本对应负面情感。
假设较大情感度为0.8,预设置信度为0.7,则在此情况下,可以准确确定待处理短文本的情感倾向。
步骤S904:若所述较大情感度不大于预设置信度,则执行其它处理过程确定待处理文本的情感倾向。
若较大情感度不大于预设置信度,则确定较大情感度的可信度较低。因此,可以无法准确确定待处理短文本的情感倾向。假设较大情感度为0.55,预设置信度为0.7,则在此情况下,无法准确确定待处理短文本的情感倾向。
在此情况下,可以执行一些其它处理过程,以便进一步确定待处理短文本的情感倾向。此过程不是本申请的重点,在此不再赘述。
在图2a和图2b所示的***中还可以包括与处理器相连的接收设备(图示中未示出)。在处理器确定待处理短文本的情感倾向后,处理器,还用于 输出所述待处理文本的情感倾向;所述接收设备,用于接收所述待处理文本的情感倾向,以便接收设备可以利用待处理文本的情感倾向。
第二种:
在情感度估测模型采用第二种实现方式实现的情况下,处理器200采用以下方式来确定待处理的短文本的情感倾向。参见图10,本申请一种情感倾向的识别方法,具体包括以下步骤:
步骤S1001:确定待处理短文本对应的特征集合和类目标识。
假设第二种实现方式在确定情感度估测模型的过程中,采用第一种执行方式确定短文本的特征集合;则在本步骤中也采用第一种执行方式确定待处理短文本特征集合。
参见图11a,为确定待处理短文本的特征集合的第一种执行方式的具体过程:
步骤1101:获取所述待处理短文本执行分词操作后获得的分词结果。
步骤1102:利用n元语言模型对各个分词进行分词组合,获得若干个组合分词。
步骤1103:将各个分词和若干个组合分词的集合,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
在图11a的执行过程与图6a的执行过程类似,具体执行过程可参见图6a的执行过程,在此不再赘述。
假设第二种实现方式在确定情感度估测模型的过程中,采用第二种执行方式确定短文本的特征集合;则在本步骤中也采用第二种执行方式确定待处理短文本特征集合。
参见图11b,为确定待处理短文本的特征集合的第二种执行方式的具体过程:
步骤1111:获取所述待处理短文本执行分词操作后获得的分词结果。
步骤1112:将所述分词结果,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
在图11b的执行过程与图6b的执行过程类似,具体执行过程可参见图6a的执行过程,在此不再赘述。
接着返回图10,进入步骤S1002:依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本的特征集合训练后得到的、输出正面情感度和负面情感度的模型。
在第二种实现方式中,具有多个情感度估测模型。为了获得适用于待处理短文本的情感度估测模型,可以依据类目标识在多个情感度估测模型进行查找,从而确定与类目标识对应的情感度估测模型。
处理器将所述特征集合输入至所述情感度估测模型,由所述情感度估测模型估算后输出所述特征集合对应的正面情感度和负面情感度。
步骤S1003:基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。本步骤的执行过程与图7的步骤703的执行过程一致,在此不再赘述。
在图2a和图2b所示的***中,还可以包括与处理器相连的接收设备(图示中未示出)。在处理器确定所述待处理短文本对应的情感倾向后,处理器,还用于输出所述待处理文本的情感倾向;所述接收设备,用于接收所述待处理文本的情感倾向。
在情感度估测模型采用第三种实现方式实现的情况下,处理器200会预先存储类目标识与情感度估测模型的对应关系,并预先构建每个类目标识和情感度估测模型的构建方式的对应关系。
若处理器200接收到一个类目标识后,首先判断与类目标识对应的情感度估测模型的构建方式;
若情感度估测模型采用第一种实现方式构建,则适应性的按图4所示的过程确定待处理短文本的情感倾向;即:确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词和所述待处理短文本所属的类目标识;依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两种类目的、带有情感倾向的若干个短文本样本训练后 得到的、输出正面情感度和负面情感度的模型;基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
若情感度估测模型采用第二种实现方式构建,则按适应性的按图5所示的过程确定待处理短文本的情感倾向。即:确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词;依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。通过图7和图10所示的实施例,可以看出本申请具有以下有益效果:
本申请提供一种情感倾向的识别方法,本方法利用若干个带情感倾向的短文本进行训练,并获得情感度估测模型。由于每个特征集合包含短文本的分词和类目标识,所以,申请构建的情感度估测模型充分考虑了短文本所属的类目。因此,基于情感度估测模型确定出的待处理短文本的正面情感度和负面情感度,相对于现有技术而言更加准确。进而,利用正面情感度和负面情感度确定出的情感倾向也更加准确。
下面以最大熵模型为例,对本申请构建情感度估测模型的训练过程进行详细介绍:
首先构建两个矩阵:矩阵A和矩阵B,矩阵A中包含各个特征和各个特征对应的正面情感度和负面情感度。矩阵B包含两个分类结果:正面情感和负面情感。对于矩阵A中的任意个特征a,采用b表示其情感倾向。fi(a,b)表示(a,b)共同出现情况。
首先计算fi(a,b)在训练样本中的期望,由于训练模型中没有变量,所以在计算完毕后该期望值为一个常数。具体计算公式如下所示:
Figure PCTCN2017100060-appb-000001
其中,
Figure PCTCN2017100060-appb-000002
表示fi(a,b)在训练样本i中的期望,
Figure PCTCN2017100060-appb-000003
表示fi(a,b)在训练样本的经验概率分布。
fi(a,b)在模型中的概率分布的公式如下:
Figure PCTCN2017100060-appb-000004
其中,
Figure PCTCN2017100060-appb-000005
表示训练样本中短文本对应的情感倾向是b的概率,p(a|b)表示短文本的情感倾向是b的前提下,特征a的条件概率。
则fi(a,b)在最大熵模型中的计算公式为:
Figure PCTCN2017100060-appb-000006
在最大熵模型中,fi(a,b)在训练样本中的期望,与fi(a,b)在模型中的期望应该是一致的。即:
Figure PCTCN2017100060-appb-000007
采用拉格朗日乘子法,在满足公式(4)的约束条件下求解目标方程(2)的最优解,最优解如下所示:
Figure PCTCN2017100060-appb-000008
其中,
Figure PCTCN2017100060-appb-000009
为归一化因子,使得
Figure PCTCN2017100060-appb-000010
wi为特征fi的权重。
将公式(5)代入到公式(1)中从而获得最大熵模型的训练的结果,也即情感度估测模型。
如图12所示,本申请提供了一种对象分类方法。应用于处理器中,在本实施例中,可以直接利用待处理对象的短文本的情感倾向来对对象进行分类。具体包括以下步骤:
步骤S1201:确定待处理对象的短文本信息,其中,所述短文本信息包括短文本的情感倾向。
处理器可以利用标点符号将待处理对象分为若干个短文本,每个短文本可以按照本申请图7或图10提供的过程确定其情感倾向,从而可以确定出待处理对象中每个短文本的情感倾向。此外,短文本信息还可以包括:待处理对象中属于正面情感的短文本数量、属于负面情感的短文本数量、正面短文本的所占比例、负面短文本的所占比例等等。
步骤S1202:依据预先训练的类别识别模型,对所述短文本信息进行类别识别;其中,所述类别识别特征模型为:依据若干对象的短文本信息训练得到的、第一类别和第二类别的分类器。
类别识别模型为预先利用若干个对象的短文本信息训练后,得到的输出第一类别和第二类别的分类器。具体而言,可以利用最大熵模型、神经网络算法或者支持向量机等分类模型,对若干个对象的短文本信息进行训练,从而获得类别识别模型。相关技术手段,可以采用现有技术中的训练方式,在此不再赘述。
在获得待处理对象的短文本信息后,将待处理对象的短文本输入至类别识别模型,类别识别模型处理后,可以确定待处理对象的类别。
在实际过程中发现,针对一个对象而言,对象除了包括文本之外还可以包括图像。以对象为电商***的用户评价为例,用户评价中除了具有文本(字符用户评价)之外,还可以具有商品的图像。
可以理解的是,单独通过对象的短文本信息确定出的对象类别不准确,因为并没有考虑到对象的图像特征信息;同理,单独采用对象的图像特征信息确定出的对象类别也不准确,因为并没有考虑到对象的短文本信息。因此,本实施例将短文本信息和图像特征信息进行合并,采用短文本信息和图像特征信息一并确定对象类别,从而提高对象类别的准确率。
本申请又提供了一种对象分类方法,在本实施例中利用待处理对象的多个特征来对对象进行分类。如图13所示,具体包括以下步骤:
步骤S1301:确定与待处理对象对应的特征信息;其中,所述特征信息包括短文本信息和图像特征信息,并且,所述短文本信息包括短文本的情感倾向。
处理器可以利用标点符号将待处理对象分为若干个短文本,每个短文本可以按照本申请图7或图10提供的过程确定其情感倾向,从而可以确定出待处理对象中每个短文本的情感倾向。此外,短文本信息还可以包括:待处理对象中属于正面情感的短文本数量、属于负面情感的短文本数量、正面短文本的所占比例、负面短文本的所占比例等等。
处理器可以对图像进行处理,从而获得图像特征信息。图像特征信息可以包括下述图像特征中的一个或多个:图像宽度、图像高度、图像中人脸个数、图像包含的子图的个数、图像的背景是否是纯色、图像包含文字区域占比是多少、图像显著区域主颜色个数、图像主颜色个数、图像牛皮癣分数、图像主体质量分数、图像是假人模特的概率得分、图像中是真人模特的概率得分、图像展示的是商品细节的概率得分等等。
步骤S1302:依据预先训练的类别识别模型,对所述特征信息进行类别识别;其中,所述类别识别特征模型为:依据若干对象的特征信息训练得到的、第一类别和第二类别的分类器。
类别识别模型为预先利用若干个对象的短文本信息和图像特征信息训练后,得到的输出第一类别和第二类别的分类器。具体而言,可以利用最大熵模型、神经网络算法或者支持向量机等分类模型,对若干个对象的短文本信息进行训练,从而获得类别识别模型。相关技术手段,可以采用现有技术中的训练方式,在此不再赘述。
在获得待处理对象的短文本信息后,将待处理对象的短文本发送至类别识别模型,从而确定待处理对象的类别。
可以理解的是,待处理对象的特征信息中的特征种类越多,则最终获得的结果越准确。所以,为了进一步提高待处理对象的类别的准确率,特征信息还可以包括:所述待处理对象所附属于第一主体的特征信息;和/或,所述待处理对象所附属于第二主体的特征信息。当然还可以包括其它特征信息,在此不再一一列举。
例如,以用户评价为例,所述待处理对象所附属于第一主体的特征信息具体为:商品的所附属于卖家(第一主体)特征信息,例如,卖家的信用等级、卖家的销售量等。所述待处理对象所附属于第二主体的特征信息具体为:商品的所附属于买家(第二主体)特征信息,例如,买家的信用等级、发布非默认的用户评价数据量、发布带图的用户评价数量、发布带图的用户评价占比。
在特征信息中增加短文本信息、图像特征信息以及其它特征信息后,对象的特征信息便会具有多个特征信息。为了综合考虑多个特征信息,本实施 例提出采用梯度提升决策树模型对若干个训练样本进行训练,从而获得类别识别模型。
梯度提升决策树模型是以决策树为基函数的提升方法。梯度提升决策树模型包括多棵决策树,之所以采用多棵决策树是考虑对于单棵决策树会因为过度***而造成过拟合,失去泛化能力;如果***太少,又会造成学习不够充分。
下面介绍梯度提升决策树模型的训练过程:
第一,估计初值F0
初值F0可以是一个随机的数值,也可以等于0,具体数值可以根据实际情况而定,在此不做限定。
第二,按照下述方式迭代M次,获得M棵决策树
A)利用上一梯度提升决策树更新全部训练样本对应多个特征信息的估计值。
B)从所有训练样本中随机选择部分样本,作为本次构建决策树的训练样本。
C)根据样本所包含的特征,计算每种特征的信息增益,选择信息增益最大的特征进行第一次划分,左侧代表第一类别,右侧代表第二类别。计算本次的梯度,结合梯度重新估计样本的特征信息的特征值。
将上段步骤重复J次,得到J层叶子节点的决策树。
D)根据获得M棵决策树,计算训练样本在该棵决策树上的准确率,将准确率作为该棵决策树的权重。
第三,将M棵决策树进行线性组合,得到最终的梯度提升决策树模型。
梯度提升决策树模型包括多棵决策树,可以表示为多棵决策树的加法模型:F(X)=F01T1(X)+β2T2(X)+…βiTi(X)…+βMTM(X)……公式(6)
其中,F0是一个初值,Ti(X)表示待处理对象的特征信息与一个决策树的匹配度,βi表示一个决策树的权重,M表示决策树的总数量。
梯度提升决策树模型使用多棵决策树正是希望能够在训练精度和泛化能力两个方面都达到较好的结果。梯度提升决策树模型作为一种boosting算法,梯度提升决策树模型自然包含boosting的思想:将一系列弱分类器组合起来, 构成一个强分类器。它不要求每棵决策树学到太多的东西,每颗树都学一点知识,然后将每个决策树学到的知识累加起来构成一个强大的模型。
本申请又提供了一种对象分类方法,如图14所示,具体包括以下步骤:
步骤S1401:确定与待处理对象对应的特征信息。
其中,所述特征信息包括短文本信息、图像特征信息、待处理对象所附属于第一主体的特征信息、所述待处理对象所附属于第二主体的特征信息。并且,所述短文本信息包括短文本的情感倾向。
以对象对用户评价为例,则本步骤可以为:确定待处理用户评价的特征信息;其中,所述特征信息包括用户评价的文本特征信息、用户评价的图像特征信息、卖家的特征信息和买家的特征信息,并且,所述文本特征信息包括短文本的情感倾向。
步骤S1402:将所述特征信息与预先训练的梯度提升决策树模型进行识别。
继续以对象为用户评价为例,则本步骤为依据预先训练的梯度提升决策树模型,对所述待处理用户评价的特征信息进行类别识别;其中,所述类别识别模型为:依据若干用户评价样本的特征信息训练后得到的、第一类用户评价和第二类用户评价的分类器。
如图15所示,具体而言本步骤包括以下步骤:
步骤S1501:将所述特征信息输入至所述类别识别模型,也即梯度提升决策树模型。
梯度提成决策树模型有M棵树,将特征信息分别与M棵树进行匹配,从而获得与每棵树匹配后确定的类别。
步骤S1502:确定所述待处理对象对应的第一类别匹配度和第二类别匹配度。
按上述公式6确定第一类别匹配度和第二类别匹配度。
第一类别匹配度F1(X)=F01T1(X)+β2T2(X)+…βiTi(X)…+βMTM(X)。其中,Ti(X)表示特征信息与一棵树的匹配度,βi表示该树对应的权重。若一棵树确定特征信息对应第一类别,则权重为βi;若一棵树确定特征信息对应第二类别,则权重为0。
第二类别匹配度F2(X)=F01T1(X)+β2T2(X)+…βiTi(X)…+βMTM(X)。其中,Ti(X)表示特征信息与一棵树的匹配度,βi表示该树对应的权重。若一棵树确定特征信息对应第二类别,则权重为βi;若一棵树确定特征信息对应第一类别,则权重为0。
步骤S1503:对所述第一类别匹配度和第二类别匹配度进行比较。若第一类别匹配度大于第二类别匹配度,进入步骤S1504;若第二类别匹配度大于第一类别匹配度,则进入步骤S1505。
步骤S1504:确定所述待处理对象的类别为第一类别。
继续以对象为用户评价为例,则本步骤为确定待处理用户评价的类别为第一类别。第一类别为优质用户评价,那么本步骤即为确定待处理用户评价的类别为优质用户评价。步骤S1505:确定所述待处理对象的类别为第二类别。
继续以对象为用户评价为例,则本步骤为确定待处理用户评价的类别为第二类别。第二类别为劣质用户评价,那么本步骤即为确定待处理用户评价的类别为劣质用户评价。
在确定所述待处理对象为第一类别之后,将所述待处理对象添加至对象集合中;发送所述对象集合中的对象。对象集合可以被其它设备使用,在使用过程中,可以再次经过筛选确定出多个更优的对象样本,然后将对象样本再发送至处理器,以便处理器利用更优的对象样本,重新训练类别识别模型,以便类别识别模型更加准确。即,处理器可以接收多个对象样本,所述对象样本来源于所述对象集合;将所述多个对象样本,添加至训练类别识别模型的已有对象样本中;基于更新后的已有对象样本,重新训练类别识别模型。
继续以对象为用户评价为例,则本过程为:在确定所述待处理用户评价为第一类用户评价之后,将所述待处理用户评价添加至第一类用户评价集合中;发送所述第一类用户评价集合。第一用户评价集合可以对用户进行使用,在使用过程中可以在第一类用户评价集合中确定出更优的用户评价。然后,可以将更优的用户评价发送至处理设备,以便处理设备重新训练类别识别模型。即本***可以形成闭环***。
即,处理器接收多个第一类用户评价,所述第一类用户评价来源于所述第一类用户评价集合;将所述多个第一类用户评价,添加至类别识别模型已 有的用户评价样本中;基于更新后的已有的用户评价样本,重新训练类别识别模型。
参见图16,本申请提供了一种对象分类***,包括:
数据提供设备100,用于发送若干个对象。
处理器200,用于接收所述数据提供设备送的若干个对象,依据若干对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型;用于确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象。
数据接收设备400,用于接收并使用所述第一类别的对象。
数据接收设备400在使用对象集合的过程中,可以再次经过筛选确定出多个更优的对象样本,然后将对象样本再发送至处理器200,以便处理器利用更优的对象样本,重新训练类别识别模型,以便类别识别模型更加准确。
参见图17,本申请还提供了一种对象分类***,包括:
数据提供设备100,用于发送若干个对象。
模型构建设备300,用于接收所述数据提供设备送的若干个对象,依据若干个对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型,并发送所述类别识别模型。
处理器200,用于接收所述类别识别模型,并确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象。
数据接收设备400,用于接收并使用所述第一类别的对象。
数据接收设备400在使用对象集合的过程中,可以再次经过筛选确定出多个更优的对象样本,然后将对象样本再发送至处理器200,以便处理器利用更优的对象样本,重新训练类别识别模型,以便类别识别模型更加准确。
下面以一个具体场景实施例,来详细描述对象分类方法。
在电商***中有很多用户评价,如何从众多用户评价中筛选出优质用户评价,是本实施例所要解决的问题。由于电商***中用户评价数量和种类繁多,商家需要花费很多时间找出店铺中的优质用户评价,这无形中需要花费巨大的人力成本。目前在优质用户评价识别领域,工业界常用的技术主要有两种:第一种,基于短文本的识别技术;第二种,基于图像特征的识别技术。
基于短文本的识别技术相对比较容易实现,但是存在着一些局限性:不关注用户评价中买家发布的图像信息。在实际场景中,比如服饰类,用户不单单关心用户评价中的文字描述部分,还关心商品真实的样子,即图像特征信息。
基于图像特征的识别技术效果显著,但也有一定的局限性。基于图像特征的优质用户评价识别技术仅仅利用用户评价中的图像信息进行识别,并不关心已购买者具体购买后的心得体会,即短文本信息。因此,可以看出用户评价中的短文本信息和图像特征信息同样重要。
此外,申请人发现还有一些其它特征对确定优质用户评价,可以起到辅助作用。例如,卖家特征和买家特征。因此,本实施例将以上特征均作为确定用户评价为优质用户评价或劣质用户评价的依据。为此,本实施例提出基于多种特征融合的机器学习方法,即梯度提升决策树模型,来训练若干个训练样本,从而获得类别识别模型。
如图18所示,为本申请提供确定优质用户评价的流程图。从图中可以清晰地整个确定优质用户评价的过程。主要由三部分组成:
(1)构建用户评价库
在用户评价服务器中获取大量的用户评价,首先利用预处理规则过滤掉一部分劣质用户评价。预处理规则可以为:优质用户评价中图像和文本所需要满足的一些要求,即使用短文本和图像特征中少量维度的特征对大量用户评价进行过滤。
具体而言为,优质用户评价中的短文本不能均为负面情感,基于此,若用户评价中的短文本均对应负面情感,则判定为非优质用户评价。对于优质用户评价中的图像也有基本要求,图像的分辨率达到预设分辨率、图像为非对话截屏、图像中的明显广告宣传语以及水印占比小于预设值,等等。
将用户评价服务器中满足上述短文本要求和图像特征要求的用户评价,将其放入用户评价库中。针对不满足短文本要求和图像特征要求的用户评价,则将这些用户评价判定为优质用户评价,不放入用户评价库中。
通过预处理规则的过滤可以过滤出一些非优质用户评价,这样不仅能够减少优质用户评价识别模型的使用次数,而且,还可以有效地过滤掉非优质用户评价,提升优质用户评价识别模型预测的准确率。
(2)确定优质用户评价集合
利用优质用户评价识别模型对用户评价库中用户评价进行识别,若识别结果为优质用户评价,则放入到优质用户评价集合中。
(3)使用优质用户评价集合。
数据接收设备可以从优质用户评价集合中获取优质用户评价,并在实际应用过程中使用优质评价。数据接收设备在使用优质用户评价集合中优质用户评价的过程中,会根据预先设定准则重新对优质评价集合中的优质用户评价进行筛选,从而筛选出符合预先设定准则的优质用户评价。然后,将符合预先设定准则的优质用户评价发送至处理器或模型构建设备,以便处理器或模型构建设备对优质用户评价识别模型进行迭代更新。
(4)优质用户评价识别模型的迭代更新。
利用符合预先设定准则的优质用户评价,重新对优质用户评价识别模型进行训练,以便优质用户评价识别模型能够尽可能的输出满足用户需求的优质用户评价。
由于在优质用户评价集合中挑选出的优质用户评价,均满足卖家或运行人员的预设规则,所以将这些优质用户评价重新加入用户评价库中,重新对优质用户评价识别模型的更新优化,以便优质用户评价识别模型更好地识别出满足用户期望的优质用户评价。
基于上述过程可以发现:本实施例中用户可以不再需要从原始用户评价库中一条一条去筛选,只需要在优质用户评价集合中进行挑选就能快速期望的优质用户评价,有效地降低人力成本。与此同时,优质用户评价模型能够有效地利用商家提供的优质用户评价进行迭代更新,从而进一步识别出满足商家期望的优质用户评价。
本实施例方法所述的功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算设备可读取存储介质中。基于这样的理解,本申请实施例对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台计算设备(可以是个人计算机,服务器,移动计算设备或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (30)

  1. 一种情感倾向的识别方法,其特征在于,包括:
    确定待处理短文本对应类目标识;其中,一个文本相邻两个标点符号之间文字称为短文本;
    确定与所述类目标识对应的情感度估测模型的实现方式;
    若所述情感度估测模型的实现方式为所有类目对应一个情感度估测模型,则确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词和所述待处理短文本所属的类目标识;依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两种类目的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向;
    若所述情感度估测模型的实现方式为一个类目对应一个情感度估测模型,确定待处理短文本对应的特征集合;其中,所述特征集合中每个特征包括:所述待处理短文本的分词;依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
  2. 如权利要求1所述的方法,其特征在于,在确定所述待处理短文本对应的情感倾向后,还包括:
    输出所述待处理短文本对应的情感倾向。
  3. 一种情感倾向的识别方法,其特征在于,包括:
    确定待处理短文本对应的特征集合;其中,一个文本相邻两个标点符号之间的文字称为短文本;所述特征集合中每个特征包括:所述待处理短文本的分词和所述待处理短文本所属的类目标识;
    依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型包括:依据至少两种类目的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;
    基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
  4. 如权利要求3所述的方法,其特征在于,所述确定待处理短文本对应的特征集合,包括:
    获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果;
    将所述分词结果中的各个分词和所述类目标识进行组合,获得各个特征;
    将各个特征的集合,确定为所述待处理短文本的特征集合。
  5. 如权利要求3所述的方法,其特征在于,所述确定待处理短文本对应的特征集合,包括:
    获取所述待处理短文本对应的类目标识,以及所述待处理短文本执行分词操作后获得的分词结果;
    将所述分词结果中的各个分词和所述类目标识进行组合,获得各个特征;
    利用n元语言模型对所述各个特征进行特征组合,获得若干个组合特征;
    将各个特征和所述若干个组合特征的集合,确定为所述待处理短文本的特征集合。
  6. 如权利要求5所述的方法,其特征在于,所述利用n元语言模型对所述各个特征进行特征组合,获得若干个组合特征,包括:
    利用二元语言模型对所述各个特征进行特征组合,获得若干个组合特征。
  7. 如权利要求3所述的方法,其特征在于,所述依据预先训练的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测,包括:
    将所述特征集合输入至所述情感度估测模型;
    由所述情感度估测模型估算后、输出待处理短文本对应的正面情感度和负面情感度。
  8. 如权利要求3所述的方法,其特征在于,所述基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向,包括:
    确定所述正面情感度和所述负面情感度两者中的较大情感度;
    判断所述较大情感度是否大于预设置信度;
    若所述较大情感度大于预设置信度,则确定所述待处理短文本对应的情感倾向与所述较大情感度的情感倾向一致。
  9. 如权利要求3所述的方法,其特征在于,所述情感度估测模型包括:
    利用最大熵模型,依据至少两个类目标识对应的若干个短文本的特征集合训练后得到的、输出正面情感度和负面情感度的模型。
  10. 如权利要求3所述的方法,其特征在于,在确定所述待处理短文本对应的情感倾向后,还包括:
    输出所述待处理短文本对应的情感倾向。
  11. 一种情感倾向的识别方法,其特征在于,包括:
    确定待处理短文本对应的特征集合和类目标识;其中,一个文本相邻两个标点符号之间的文字称为短文本;所述特征集合中每个特征包括:所述待处理短文本的分词;
    依据与所述类目标识对应的情感度估测模型,结合待处理短文本的特征集合,对待处理短文本进行情感度估测;其中,所述情感度估测模型为:依据所述类目标识对应的、带有情感倾向的若干个短文本样本训练后得到的、输出正面情感度和负面情感度的模型;
    基于所述待处理短文本对应的正面情感度和负面情感度,确定所述待处理短文本对应的情感倾向。
  12. 如权利要求11所述的方法,其特征在于,所述确定待处理短文本对应的特征集合,包括:
    获取所述待处理短文本执行分词操作后获得的分词结果;
    利用n元语言模型对各个分词进行分词组合,获得若干个组合分词;
    将各个分词和若干个组合分词的集合,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
  13. 如权利要求11所述的方法,其特征在于,所述确定待处理短文本对应的特征集合,包括:
    获取所述待处理短文本执行分词操作后获得的分词结果;
    将所述分词结果,确定为所述待处理短文本的特征集合,一个分词对应一个特征。
  14. 如权利要求11所述的方法,其特征在于,在确定所述待处理短文本对应的情感倾向后,还包括:
    输出所述待处理短文本对应的情感倾向。
  15. 一种情感倾向的识别***,其特征在于,包括:
    数据提供设备,用于发送若干个对象;
    处理器,用于接收所述数据提供设备送的若干个对象,依据若干个对象的短文本构建情感度估测模型,并利用情感度估测模型确定待处理短文本的情感倾向。
  16. 如权利要求15所述的***,其特征在于,
    所述处理器,还用于构建情感度估测模型与对象所属的类目标识的对应关系。
  17. 如权利要求15所述的***,其特征在于,所述***还包括接收设备;
    所述处理器,还用于输出所述待处理文本的情感倾向;
    所述接收设备,用于接收所述待处理文本的情感倾向。
  18. 一种情感倾向的识别***,其特征在于,包括:
    数据提供设备,用于发送若干个对象;
    模型构建设备,用于接收所述数据提供设备送的若干个对象,依据若干个对象的短文本构建情感度估测模型,并发送所述情感度估测模型;
    处理器,用于接收所述情感度估测模型,并利用情感度估测模型确定待处理短文本的情感倾向。
  19. 如权利要求18所述的***,其特征在于,
    所述模型构建设备,还用于构建情感度估测模型与对象所属的类目标识的对应关系,并将对应关系发送至所述处理器。
  20. 如权利要求18所述的***,其特征在于,所述***还包括接收设备;
    所述处理器,还用于输出所述待处理文本的情感倾向;
    所述接收设备,用于接收所述待处理文本的情感倾向。
  21. 一种对象分类方法,其特征在于,包括:
    确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;
    依据预先训练的类别识别模型,对所述待处理对象的特征信息进行类别识别;其中,所述类别识别模型为:依据若干对象样本的特征信息训练后得到的、第一类别和第二类别的分类器。
  22. 如权利要求21所述的方法,其特征在于,所述特征信息还包括:
    构建所述对象的第一主体的特征信息;和/或,
    所述对象所附属于第二主体的特征信息。
  23. 如权利要求21所述的方法,其特征在于,所述依据预先训练的类别识别模型,对所述特征信息进行类别识别,包括:
    将所述特征信息输入至所述类别识别模型;确定所述待处理对象对应的第一类别匹配度和第二类别匹配度;
    对所述第一类别匹配度和第二类别匹配度进行比较;
    若第一类别匹配度大于第二类别匹配度,则确定所述待处理对象的类别为第一类别;
    若第二类别匹配度大于第一类别匹配度,则确定所述待处理对象的类别为第二类别。
  24. 如权利要求23所述的方法,其特征在于,还包括:
    在确定所述待处理对象为第一类别之后,将所述待处理对象添加至对象集合中;
    发送所述对象集合中的对象。
  25. 如权利要求24所述的方法,其特征在于,还包括:
    接收多个对象样本,所述对象样本来源于所述对象集合,且,满足预设规则;
    将所述多个对象样本,添加至训练类别识别模型的已有对象样本中;
    基于更新后的已有对象样本,重新训练类别识别模型。
  26. 一种用户评价的分类方法,其特征在于,包括:
    确定待处理用户评价的特征信息;其中,所述特征信息包括用户评价的文本特征信息、用户评价的图像特征信息、卖家的特征信息和买家的特征信息,并且,所述文本特征信息包括短文本的情感倾向;
    依据预先训练的梯度提升决策树模型,对所述待处理用户评价的特征信息进行类别识别;其中,所述类别识别模型为:依据若干用户评价样本的特征信息训练后得到的、第一类用户评价和第二类用户评价的分类器。
  27. 如权利要求26所述的方法,其特征在于,还包括:
    在确定所述待处理用户评价为第一类用户评价之后,将所述待处理用户评价添加至第一类用户评价集合中;
    发送所述第一类用户评价集合。
  28. 如权利要求26所述的方法,其特征在于,还包括:
    接收多个第一类用户评价,所述第一类用户评价来源于所述第一类用户评价集合;
    将所述多个第一类用户评价,添加至类别识别模型已有的用户评价样本中;
    基于更新后的已有的用户评价样本,重新训练类别识别模型。
  29. 一种对象分类***,其特征在于,包括:
    数据提供设备,用于发送若干个对象;
    处理器,用于接收所述数据提供设备送的若干个对象,依据若干对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型;用于确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象;
    数据接收设备,用于接收并使用所述第一类别的对象。
  30. 一种对象分类***,其特征在于,包括:
    数据提供设备,用于发送若干个对象;
    模型构建设备,用于接收所述数据提供设备送的若干个对象,依据若干个对象的特征信息训练后得到、输出第一类别和第二类别的类别识别模型,并发送所述类别识别模型;
    处理器,用于接收所述类别识别模型,并确定待处理对象的特征信息;其中,所述特征信息包括文本特征信息和图像特征信息,并且,所述文本特征信息包括短文本的情感倾向;依据所述类别识别模型,对所述待处理对象的特征信息进行类别识别;还用于输出第一类别的对象;
    数据接收设备,用于接收并使用所述第一类别的对象。
PCT/CN2017/100060 2016-09-09 2017-08-31 情感倾向的识别方法、对象分类方法及数据处理*** WO2018045910A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610812853.4 2016-09-09
CN201610812853.4A CN107807914A (zh) 2016-09-09 2016-09-09 情感倾向的识别方法、对象分类方法及数据处理***

Publications (1)

Publication Number Publication Date
WO2018045910A1 true WO2018045910A1 (zh) 2018-03-15

Family

ID=61562512

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/100060 WO2018045910A1 (zh) 2016-09-09 2017-08-31 情感倾向的识别方法、对象分类方法及数据处理***

Country Status (3)

Country Link
CN (1) CN107807914A (zh)
TW (1) TW201812615A (zh)
WO (1) WO2018045910A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271627A (zh) * 2018-09-03 2019-01-25 深圳市腾讯网络信息技术有限公司 文本分析方法、装置、计算机设备和存储介质
CN109344257A (zh) * 2018-10-24 2019-02-15 平安科技(深圳)有限公司 文本情感识别方法及装置、电子设备、存储介质
CN109684627A (zh) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 一种文本分类方法及装置
CN111506733A (zh) * 2020-05-29 2020-08-07 广东太平洋互联网信息服务有限公司 对象画像的生成方法、装置、计算机设备和存储介质
CN112069311A (zh) * 2020-08-04 2020-12-11 北京声智科技有限公司 一种文本提取方法、装置、设备及介质
CN113450010A (zh) * 2021-07-07 2021-09-28 中国工商银行股份有限公司 数据对象的评价结果的确定方法、装置和服务器
CN114443849A (zh) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 一种标注样本选取方法、装置、电子设备和存储介质
US20230342549A1 (en) * 2019-09-20 2023-10-26 Nippon Telegraph And Telephone Corporation Learning apparatus, estimation apparatus, methods and programs for the same

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036570B (zh) * 2018-05-31 2021-08-31 云知声智能科技股份有限公司 超声科非病历内容的过滤方法及***
CN109299782B (zh) * 2018-08-02 2021-11-12 奇安信科技集团股份有限公司 一种基于深度学习模型的数据处理方法及装置
CN110929026B (zh) * 2018-09-19 2023-04-25 阿里巴巴集团控股有限公司 一种异常文本识别方法、装置、计算设备及介质
CN109492226B (zh) * 2018-11-10 2023-03-24 上海五节数据科技有限公司 一种提高情感倾向占比低文本预断准确率的方法
CN109871807B (zh) * 2019-02-21 2023-02-10 百度在线网络技术(北京)有限公司 人脸图像处理方法和装置
CN110032645B (zh) * 2019-04-17 2021-02-09 携程旅游信息技术(上海)有限公司 文本情感识别方法、***、设备以及介质
CN110427519A (zh) * 2019-07-31 2019-11-08 腾讯科技(深圳)有限公司 视频的处理方法及装置
CN110516416B (zh) * 2019-08-06 2021-08-06 咪咕文化科技有限公司 身份验证方法、验证端和客户端

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968408A (zh) * 2012-11-23 2013-03-13 西安电子科技大学 识别用户评论的实体特征方法
CN103365867A (zh) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 一种对用户评价进行情感分析的方法和装置
CN103455562A (zh) * 2013-08-13 2013-12-18 西安建筑科技大学 一种文本倾向性分析方法及基于该方法的商品评论倾向判别器
CN105005560A (zh) * 2015-08-26 2015-10-28 苏州大学张家港工业技术研究院 一种基于最大熵模型的评价类型情绪分类方法及***

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510254A (zh) * 2009-03-25 2009-08-19 北京中星微电子有限公司 一种图像分析中更新性别分类器的方法及性别分类器
US20110251973A1 (en) * 2010-04-08 2011-10-13 Microsoft Corporation Deriving statement from product or service reviews
CN102682124B (zh) * 2012-05-16 2014-07-09 苏州大学 一种文本的情感分类方法及装置
CN105095181B (zh) * 2014-05-19 2017-12-29 株式会社理光 垃圾评论检测方法及设备
CN105069072B (zh) * 2015-07-30 2018-08-21 天津大学 基于情感分析的混合用户评分信息推荐方法及其推荐装置
CN105550269A (zh) * 2015-12-10 2016-05-04 复旦大学 一种有监督学习的产品评论分析方法及***

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365867A (zh) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 一种对用户评价进行情感分析的方法和装置
CN102968408A (zh) * 2012-11-23 2013-03-13 西安电子科技大学 识别用户评论的实体特征方法
CN103455562A (zh) * 2013-08-13 2013-12-18 西安建筑科技大学 一种文本倾向性分析方法及基于该方法的商品评论倾向判别器
CN105005560A (zh) * 2015-08-26 2015-10-28 苏州大学张家港工业技术研究院 一种基于最大熵模型的评价类型情绪分类方法及***

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271627B (zh) * 2018-09-03 2023-09-05 深圳市腾讯网络信息技术有限公司 文本分析方法、装置、计算机设备和存储介质
CN109271627A (zh) * 2018-09-03 2019-01-25 深圳市腾讯网络信息技术有限公司 文本分析方法、装置、计算机设备和存储介质
CN109344257A (zh) * 2018-10-24 2019-02-15 平安科技(深圳)有限公司 文本情感识别方法及装置、电子设备、存储介质
CN109344257B (zh) * 2018-10-24 2024-05-24 平安科技(深圳)有限公司 文本情感识别方法及装置、电子设备、存储介质
CN109684627A (zh) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 一种文本分类方法及装置
US20230342549A1 (en) * 2019-09-20 2023-10-26 Nippon Telegraph And Telephone Corporation Learning apparatus, estimation apparatus, methods and programs for the same
CN111506733A (zh) * 2020-05-29 2020-08-07 广东太平洋互联网信息服务有限公司 对象画像的生成方法、装置、计算机设备和存储介质
CN111506733B (zh) * 2020-05-29 2022-06-28 广东太平洋互联网信息服务有限公司 对象画像的生成方法、装置、计算机设备和存储介质
CN112069311A (zh) * 2020-08-04 2020-12-11 北京声智科技有限公司 一种文本提取方法、装置、设备及介质
CN112069311B (zh) * 2020-08-04 2024-06-11 北京声智科技有限公司 一种文本提取方法、装置、设备及介质
CN113450010A (zh) * 2021-07-07 2021-09-28 中国工商银行股份有限公司 数据对象的评价结果的确定方法、装置和服务器
CN114443849A (zh) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 一种标注样本选取方法、装置、电子设备和存储介质
CN114443849B (zh) * 2022-02-09 2023-10-27 北京百度网讯科技有限公司 一种标注样本选取方法、装置、电子设备和存储介质
US11907668B2 (en) 2022-02-09 2024-02-20 Beijing Baidu Netcom Science Technology Co., Ltd. Method for selecting annotated sample, apparatus, electronic device and storage medium

Also Published As

Publication number Publication date
TW201812615A (zh) 2018-04-01
CN107807914A (zh) 2018-03-16

Similar Documents

Publication Publication Date Title
WO2018045910A1 (zh) 情感倾向的识别方法、对象分类方法及数据处理***
US20200210396A1 (en) Image and Text Data Hierarchical Classifiers
JP6862579B2 (ja) 画像特徴の取得
CN108363804B (zh) 基于用户聚类的局部模型加权融合Top-N电影推荐方法
Kao et al. Visual aesthetic quality assessment with a regression model
US10810494B2 (en) Systems, methods, and computer program products for extending, augmenting and enhancing searching and sorting capabilities by learning and adding concepts on the fly
CN107357793B (zh) 信息推荐方法和装置
CN107832663A (zh) 一种基于量子理论的多模态情感分析方法
CN110245257B (zh) 推送信息的生成方法及装置
CN107818084B (zh) 一种融合点评配图的情感分析方法
CN108763214B (zh) 一种针对商品评论的情感词典自动构建方法
WO2018176913A1 (zh) 搜索方法、装置及非临时性计算机可读存储介质
CN114998602B (zh) 基于低置信度样本对比损失的域适应学习方法及***
Hidru et al. EquiNMF: Graph regularized multiview nonnegative matrix factorization
CN112884542A (zh) 商品推荐方法和装置
CN108733652B (zh) 基于机器学习的影评情感倾向性分析的测试方法
CN109948702A (zh) 一种基于卷积神经网络的服装分类和推荐模型
CN113627151A (zh) 跨模态数据的匹配方法、装置、设备及介质
CN110569495A (zh) 一种基于用户评论的情感倾向分类方法、装置及存储介质
CN109727091A (zh) 基于对话机器人的产品推荐方法、装置、介质及服务器
CN113762005A (zh) 特征选择模型的训练、对象分类方法、装置、设备及介质
CN108804416B (zh) 基于机器学习的影评情感倾向性分析的训练方法
CN111797622A (zh) 用于生成属性信息的方法和装置
Ramayanti et al. Text classification on dataset of marine and fisheries sciences domain using random forest classifier
CN117015789A (zh) 基于sns文本的用户的装修风格分析模型提供装置及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17848083

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17848083

Country of ref document: EP

Kind code of ref document: A1