CN104598644A

CN104598644A - User fond label mining method and device

Info

Publication number: CN104598644A
Application number: CN201510076723.4A
Authority: CN
Inventors: 孙晓
Original assignee: Tencent Technology Shenzhen Co Ltd; Hefei University of Technology
Current assignee: Tencent Technology Shenzhen Co Ltd; Hefei University of Technology
Priority date: 2015-02-12
Filing date: 2015-02-12
Publication date: 2015-05-06
Anticipated expiration: 2035-02-12
Also published as: CN104598644B

Abstract

The invention discloses a user fond label mining method which comprises the following steps: acquiring a text and a corresponding audio and/or a face video; carrying out word segmentation on the text to obtain a word sequence; extracting an emotion main body and an emotion object from the word sequence; extracting a text feature vector of an emotional tendency of the emotion main body to the emotion object according to the text; extracting an audio feature vector of the emotional tendency of the emotion main body to the emotion object according to the audio, and/or extracting a video feature vector of the emotional tendency of the emotion main body to the emotion object according to the face video; judging the emotional tendency of the emotion main body to the emotion object by a trained emotional tendency judgment model according to the text feature vector, the audio feature vector and the video feature vector; generating a fond label of the emotion main body to the emotion object. According to the method, the user fond label capable of reflecting the real fond of a user more accurately can be mined. Furthermore, the invention further provides a user fond label mining device.

Description

User preferences label method for digging and device

Technical field

The present invention relates to Data Mining, particularly relate to a kind of user preferences label method for digging and device.

Background technology

Common user tag is to reflect the word of user characteristics, word, phrase or short sentence.User preferences label is then for reflecting a kind of user tag of user preferences or Sentiment orientation.

The how emphasis on personalized service of existing internet, applications, for customer volume body is recommended to be applicable to the product of user and social information etc., to improve information pushing hit rate and user's viscosity.How to excavate user interest point, analyze user feeling tendency, thus generation user preferences label is the problem that a lot of internet, applications wishes to solve.

Conventional art generally generates the hobby label of this user to a certain user-defined keyword by other user in social networks, but due to the existence of factor and individual subjective factor, the user preferences label of generation differs the hobby and Sentiment orientation that reflect user surely truly.

Summary of the invention

Based on this, be necessary to provide a kind of user preferences label method for digging and the device of excavating the label that accurately reflection user truly likes.

A kind of user preferences label method for digging, comprises the following steps:

Obtain text to be analyzed and facial video and/or audio corresponding to text;

Participle is carried out to described text, obtains the word sequence forming described text;

Extract the word as emotion main body and emotion object in described word sequence;

Go out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;

Go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction, and/or, go out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction;

According to described Text eigenvector and described video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of described emotion main body to described emotion object;

The hobby label of described emotion main body to described emotion object is generated according to the described Sentiment orientation obtained.

A kind of user preferences label excavating gear, comprising:

Raw data acquisition module, for obtaining text to be analyzed and audio frequency corresponding to text and/or facial video;

Word-dividing mode, for carrying out participle to described text, obtains the word sequence forming described text;

Main body and object extraction module, for extracting the word as emotion main body and emotion object in described word sequence;

Text character extraction module, for going out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;

Audio and video characteristic extraction module, for going out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction, and/or, go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction;

Sentiment orientation judge module, for according to described Text eigenvector and described video feature vector and/or audio feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of described emotion main body to described emotion object;

Tag generation module, for generating the hobby label of described emotion main body to described emotion object according to the described Sentiment orientation obtained.

Above-mentioned user preferences label method for digging and device, emotion main body and emotion object is extracted from text, and extract and can characterize emotion main body to the Text eigenvector of the Sentiment orientation of emotion object and video feature vector and/or audio feature vector, according to text proper vector and video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object, and generate emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained further; Above-mentioned user preferences label method for digging and device, automatically emotion main body and emotion object can be extracted on the one hand, on the other hand obtain emotion main body to the Sentiment orientation of emotion object in conjunction with Text eigenvector and video feature vector and/or audio feature vector, Sentiment orientation more accurately can be obtained, therefore, not only more can go out user preferences label by intelligent excavating, and the user preferences label that more accurately can reflect that user truly likes can be excavated.

Accompanying drawing explanation

Fig. 1 is for running the part-structure block diagram of the equipment of the user preferences label method for digging described in the application in an embodiment;

Fig. 2 is the schematic flow sheet of the user preferences label method for digging in an embodiment;

Fig. 3 A is the schematic flow sheet of the step S208 of Fig. 2 in an embodiment;

Fig. 3 B is the schematic flow sheet of the emotion word characterization step extracting text in an embodiment;

Fig. 4 is the schematic flow sheet of the step S210 of Fig. 2 in an embodiment;

Fig. 5 is the schematic flow sheet of the step S212 of Fig. 2 in an embodiment;

Fig. 6 is the structural representation of the user preferences label excavating gear in an embodiment;

Fig. 7 is the structural representation of the user preferences label excavating gear in an embodiment.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

Fig. 1 is for running the part-structure block diagram of the equipment of the user preferences label method for digging described in the application in an embodiment.As shown in Figure 1, in one embodiment, this equipment comprises the processor, storage medium, internal memory and the network interface that are connected by system bus.Wherein, network interface is used for network service; Store operating system, database and the software instruction for realizing the user preferences label method for digging described in the application in storage medium, database for store generation in text to be analyzed described in the application and the facial video and/or audio of correspondence and the user preferences label method for digging implementation of the application and other data etc. of using; Internal memory is used for data cached; Processor is coordinated the work between all parts and is performed above-mentioned software instruction to realize the user preferences label method for digging described in the application.Structure shown in Fig. 1, it is only the block diagram of the part-structure relevant to the application's scheme, do not form the restriction to the equipment that the application's scheme is applied thereon, concrete equipment can comprise than parts more or less shown in figure, or combine some parts, or there is different parts layouts.

As shown in Figure 2, in one embodiment, a kind of user preferences label method for digging, comprises the following steps:

Step S202, obtains text to be analyzed and audio frequency corresponding to text and facial video.

In present specification, the audio frequency that text is corresponding is record the audio frequency that voice corresponding to text obtain, and the voice that text is corresponding are the voice that can represent text Chinese language; The facial video that text is corresponding is the video that the face-image absorbing sounding action continuously obtains, and this sounding action forms voice corresponding to the text.The audio frequency that text is corresponding and facial video corresponding to text obtain according to same sounding action synchronous recording.In partial content hereafter, also claim the facial video of this audio frequency and this corresponding.

In one embodiment, the interaction data (comprising text, Voice & Video etc.) of enrolling in the user social contact active procedure that software communication client sends to another software communication client can be received, from this interaction data, extract audio frequency and corresponding facial video, and further identify text corresponding to this audio frequency as text to be analyzed by speech recognition technology.The interaction data that software communication client sends to another software communication client passes through transit server, user preferences label method for digging described in the application can be performed by server, thus server is when receiving the interaction data that software communication client sends, step S202 can be performed according to this interaction data.

Step S204, carries out participle to text, obtains the word sequence forming text.

In one embodiment, existing participle instrument can be adopted to carry out participle to text, obtain multiple word, and the plurality of word is arranged in order formation word sequence according to its position in the text.

Step S206, extracts the word as emotion main body and emotion object in word sequence.

In one embodiment, the emotion main body of having trained and emotion object discrimination model can be adopted to extract the word as emotion main body in word sequence and the word as emotion object.This emotion main body and emotion object discrimination model obtain according to a large amount of word sequence language material training having marked emotion main body and emotion object.

In one embodiment, existing part-of-speech tagging instrument can be adopted to carry out part-of-speech tagging to the word in word sequence, further, the conditional random field models (CRFs, Conditional RandomFields) of having trained can be adopted will to extract the word as emotion main body and emotion object from the word sequence marking part of speech.This conditional random field models can obtain according to the word sequence language material training of a large amount of parts of speech having marked word and emotion main body and emotion object.

Step S208, goes out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction.

In one embodiment, comprise following characteristics component for characterizing the Text eigenvector of emotion main body to the Sentiment orientation of emotion object: the emotion word feature of text, and zero in following characteristics is to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text.

As shown in Figure 3A, step S208 can comprise the following steps:

Step S320, extracts the emotion word feature of text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text.

In one embodiment, the emotion word feature of text comprises the negative word quantity before the emotion word of the term vector average of the degree word before the emotion word of the part of speech of the emotion word of text, the term vector of the emotion word of text, text, text.

Wherein, the term vector average of the degree word before the emotion word of text is: the average of the term vector of all degree words before the emotion word in text.The term vector of word is the vector of the semanteme that can characterize word, and the distance (such as cosine similarity, Euclidean distance etc.) of the term vector of two words can be used for characterizing these two words similarity semantically.

As shown in Figure 3 B, in one embodiment, the step extracting the emotion word feature of text comprises the following steps:

Step S322, mates the word in the word sequence of composition text with default emotion vocabulary, obtains the emotion word in word sequence and the position in word sequence and is positioned at degree word before this emotion word and negative word.

Degree word is the word of expression degree, such as " very ", " a bit ", " very " etc.; Negative word is then such as, for representing the word of negative, " no ", " non-" etc.

Above-mentioned default emotion vocabulary can be the emotion vocabularys such as Hownet (knowing net) emotion vocabulary.This preset emotion vocabulary in comprise be noted as emotion word word, be noted as the word of degree word and be noted as the word of negative word.

If a certain word in the word sequence of text is noted as emotion word in the emotion vocabulary preset, then this word is labeled as the emotion word of text; Accordingly, if a certain word in the word sequence of text before this emotion word is noted as degree word or negative word in the emotion vocabulary preset, be then labeled as degree word before the emotion word of text or negative word by corresponding for this word.

Step S324, obtain the part of speech of the emotion word of text, obtain the term vector of the emotion word of text, and the term vector of each degree word before acquisition emotion word calculate the average of each term vector, obtain the term vector average of the degree word before the emotion word of text, and the quantity of all negative words before statistics emotion word, obtain the negative word quantity before the emotion word of text.

In one embodiment, the part of speech of existing part of speech marking instrument to emotion word can be adopted to mark.

In one embodiment, existing term vector can be utilized to obtain instrument and obtain the term vector of emotion word and the term vector of each degree word, it can be word2vec instrument etc. that the term vector of employing obtains instrument.Such as, 20 dimension term vectors of emotion word and 20 dimension term vectors of each degree word are obtained by word2vec instrument.

Step S326, by the term vector of the part of speech of the emotion word of text, the emotion word of text, text emotion word before the term vector average of degree word and negative word quantity before the emotion word of text form the emotion word feature of text.

In one embodiment, existing term vector can be utilized to obtain the term vector of instrument acquisition emotion object, and this term vector can be used for the semanteme characterizing term vector.

The emotion that different punctuation marks is corresponding different, such as emotion intensity etc. can be strengthened in exclamation mark.In one embodiment, can according to the punctuation mark characteristic of correspondence value in the punctuation mark list of feature values acquisition text preset as the punctuation mark feature in text; In this punctuation mark list of feature values preset, each punctuation mark characteristic of correspondence value can the semanteme of characterizing identifier number, and the form of this eigenwert can be the numerical value vector of single numerical value composition.

Different conjunctions can represent different emotions, such as " but " represent semantic turnover etc.In one embodiment, existing term vector can be utilized to obtain the term vector of the conjunction in instrument acquisition text, and this term vector can characterize the semanteme of conjunction.

Step S340, forms Text eigenvector by the emotion word feature of text and zero of acquisition to multiple feature.

Step S210, goes out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction.

As shown in Figure 4, in one embodiment, step S210 comprises the following steps:

Step S402, extracts the Mel frequency cepstral coefficient of audio frequency.

Step S404, is cut into the audio section that each word in the word sequence of text is corresponding by audio frequency.

In one embodiment, identify the audio section of each word in the word sequence corresponding to text in this audio frequency by speech recognition technology, audio frequency is cut into this each audio section.

Step S406, extracts the fundamental tone of each audio section, and calculates the average of the fundamental tone of each audio section.

In one embodiment, the fundamental tone of each audio section can be extracted based on average amplitude difference weighting correlation method (AMDF Weighted AutoCorrelation, AWAC) extraction algorithm.

Step S408, extracts the intensity of each audio section, and calculates the average of the intensity of each audio section.

In one embodiment, the intensity of each audio section can be extracted based on Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT) method.

Step S410, forms audio feature vector by the intensity of the average of the fundamental tone of audio section corresponding to the Mel frequency cepstral coefficient of audio frequency, the average of the fundamental tone of each audio section, emotion main body, the intensity of each audio section and audio section corresponding to emotion main body.

Step S212, goes out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction.

As shown in Figure 5, in one embodiment, step S212 comprises the following steps:

Step S502, becomes the video-frequency band that in the word sequence of text, each word is corresponding by facial Video segmentation.

In one embodiment, the audio frequency that facial video is corresponding can be obtained, this audio frequency is the audio frequency that user carries out enrolling in the process of the action in facial video, the audio section of each word in the word sequence corresponding to text in this audio frequency is identified by speech recognition technology, the duration of each audio section of further statistics, split facial video according to the duration of each audio section, facial Video segmentation is become the video-frequency band that each audio section is corresponding, thus obtain the video-frequency band that in the word sequence of text, each word is corresponding.

Step S504, calculates the face feature point average of each video-frequency band, and calculates the average of the face feature point average of each video-frequency band.

The point of facial characteristics can be characterized in face feature point and face-image.In one embodiment, face feature point is facial SIFT feature point.SIFT, i.e. scale invariant feature conversion (Scale-invariant featuretransform), it is a kind of descriptor for image processing field, this description has scale invariability, key point can be detected in the picture, this key point can describe the local feature of face, and this key point can be called as facial SIFT feature point.

In one embodiment, face feature point numerical value vector representation.Multiframe face-image is comprised in a video-frequency band, multiple face feature point is comprised in a face-image, the face feature point average of a face-image is the average of the numerical value vector of each face feature point comprised in this face-image, and the facial characteristics average of a video-frequency band is the face feature point mean of mean of each face-image that this video-frequency band comprises.

In one embodiment, the numerical value vector of the face feature point comprised in the face-image that each video-frequency band comprises can be extracted, the face feature point average calculating each face-image is the average of the numerical value vector of the face feature point that corresponding face-image comprises, and the face feature point average calculating each video-frequency band is further the face feature point mean of mean that corresponding video-frequency band comprises face-image.

Step S506, obtains the face feature point maximal value of video-frequency band corresponding to emotion main body.

In one embodiment, the face feature point maximal value of each face-image that video-frequency band corresponding to emotion main body comprises can be obtained, and obtain the face feature point maximal value of the maximal value in the face feature point maximal value of each face-image as this video-frequency band.The face feature point maximal value of a face-image is numerical value vector maximum in the numerical value vector of the face feature point that this face-image comprises, if the space length represented by a numerical value vector is greater than the space length of another numerical value vector representation, then think that this numerical value vector is larger than another numerical value vector.

Step S508, obtains the face feature point minimum value of video-frequency band corresponding to emotion main body.

The face feature point minimum value of video-frequency band corresponding to emotion main body can be obtained according to the mode identical with step S506.

Step S510, forms video feature vector by the face feature point average of the average of the face feature point average of each video-frequency band and video-frequency band corresponding to emotion main body, face feature point maximal value and face feature point minimum value.

Step S214, according to Text eigenvector and audio feature vector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object.

In one embodiment, this Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text and facial video corresponding to the text.

In one embodiment, above-mentioned Sentiment orientation discrimination model is to differentiate the supporting vector machine model (SVM) of the emotion main body in data object to the Sentiment orientation of emotion object, and this data object comprises video corresponding to a text and audio frequency corresponding to the text and the text.

Step S216, generates emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.

In one embodiment, Sentiment orientation discrimination model interpretable Sentiment orientation classification includes but not limited to: front, neutrality and negative.

In one embodiment, if the Sentiment orientation classification of emotion main body to emotion object belongs to front or negative, then generate corresponding front hobby label or negative hobby label, and if the Sentiment orientation classification of emotion main body to emotion object belongs to neutral, then can not generate hobby label.Such as, the Sentiment orientation classification of emotion main body to emotion object belongs to front, can generate the hobby label of following form: emotion main body likes emotion object; And the Sentiment orientation classification of emotion main body to emotion object belongs to negative, the hobby label of following form can be generated: emotion main body does not like emotion object, etc.Under the prerequisite that emotion main body is clear and definite, in hobby label, emotion main body can be omitted.

In one embodiment, after step S206, above-mentioned user preferences label method for digging is further comprising the steps of: judge whether emotion main body is the user that audio frequency and facial video are corresponding, if so, then enters step S208, if not, then terminates.

The user that audio frequency is corresponding is the sounder of the voice of audio recording, and the user that facial video is corresponding is the owner of the facial pictures absorbed in facial video.In one embodiment, can judge whether emotion main body is first person pronoun, if so, then judge that emotion main body is the user that audio frequency and facial video are corresponding, otherwise, then judge that emotion main body is not the user that audio frequency and facial video are corresponding.

Because other people hobby of speaker's comment is not necessarily accurate, in the present embodiment, only have when audio frequency and user corresponding to facial video are emotion main body, just carry out the step of follow-up generation user preferences label, therefore, user preferences label more accurately is more excavated.

In one embodiment, after step S206, above-mentioned user preferences label method for digging is further comprising the steps of: if lack the word as emotion main body in the word sequence of text, then using audio frequency and user corresponding to facial video as emotion main body.

In one embodiment, prestore the corresponding relation of each audio frequency and facial video and user, can from prestoring the audio frequency that obtains in finding step S202 this corresponding relation and user corresponding to facial video.

In one embodiment, audio frequency and facial video are recorded software by audio frequency and video and are recorded and obtain, and can obtain these audio frequency and video in recording process and record the current login user of software, as audio frequency and user corresponding to facial video.In one embodiment, audio frequency and video recording function is integrated with in software communication client, above-mentioned audio frequency and facial video are recorded by software communication client and are obtained in user social contact active procedure, the current login user of this software communication client in recording process can be obtained, as audio frequency and user corresponding to facial video.

In one embodiment; based on above-mentioned any embodiment; and do not consider the impact of above-mentioned audio feature vector; and the method for user preferences label is only generated according to above-mentioned Text eigenvector and video feature vector; and do not consider the impact of above-mentioned video feature vector; and the method for user preferences label is only generated according to Text eigenvector and video feature vector, all belong to the scope of the application's protection.The specific implementation of these two kinds of methods, on the basis of above-mentioned any embodiment, can eliminate not Consideration (audio feature vector or the video feature vector) impact in Correlation method for data processing.Such as, two kinds of methods are only generate the method for user preferences label according to the audio feature vector of the Text eigenvector of text and audio frequency corresponding to text and only generate the method for user preferences label according to the video feature vector of the Text eigenvector of text and facial video corresponding to text respectively below:

A kind of user preferences label method for digging, comprises the following steps: obtain text to be analyzed and audio frequency corresponding to text; Participle is carried out to text, obtains the word sequence forming text; Extract the word as emotion main body and emotion object in word sequence; Go out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Go out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction; According to Text eigenvector and audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object; Emotion main body is generated to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text.

A kind of user preferences label method for digging, comprises the following steps: obtain text to be analyzed and facial video corresponding to text; Participle is carried out to text, obtains the word sequence forming text; Extract the word as emotion main body and emotion object in word sequence; Go out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Go out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction; According to Text eigenvector and video feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object; Emotion main body is generated to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the Text eigenvector of a text and facial video corresponding to the text.

Below enumerate a concrete example so that above-mentioned user preferences label method for digging to be described.Such as, user has said a word, and the text that the words is corresponding is that " I likes eating apple.", be designated as T, the audio frequency that the words is corresponding is S, and the facial video of its correspondence is F:

(1) text T and audio frequency S and facial video F is obtained.

(2) participle is carried out to T, obtain word sequence: I likes eating apple.

(3) extract in word sequence as the word " I " of emotion main body and the word " apple " as emotion object.

(4) extract the Text eigenvector of T, comprising:

The emotion word extracted in T " is liked ", extracts emotion word feature further, and emotion word feature comprises the negative word quantity before the emotion word of the term vector average of the degree word before the part of speech of emotion word, the term vector of emotion word, emotion word, text.The part of speech that emotion word in T " is liked " is verb, does not have degree word, also do not have negative word before " liking ".

Extract the term vector of emotion object " apple ";

Extract the punctuation mark feature in T, the punctuation mark in T be ".”；

Extract the term vector of the conjunction in T, in this T, there is no conjunction.

Text eigenvector is formed by the term vector of emotion word feature, emotion object, the term vector of punctuation mark characteristic sum conjunction.

(5) extract the audio feature vector of S, comprising:

Extract the Mel frequency cepstral coefficient of S;

S is cut into the audio section that each word in word sequence " I likes eating apple " is corresponding: s1s2s3s4;

Extract the fundamental tone of s1, s2, s3, s4 respectively, and calculate the average of the fundamental tone of s1, s2, s3, s4;

Extract the intensity of s1, s2, s3, s4 respectively, and calculate the average of the intensity of s1, s2, s3, s4;

Audio feature vector is formed by the average of the Mel frequency cepstral coefficient of audio frequency, the fundamental tone of s1, s2, s3, s4, the fundamental tone of s4, the average of the intensity of s1, s2, s3, s4 and the intensity of s4.

(6) extract the video feature vector of F, comprising:

F is divided into the video-frequency band that each word in word sequence " I likes eating apple " is corresponding: f1f2f3f4;

Calculate the face feature point average of f1, f2, f3, f4, and calculate the average of the face feature point average of f1, f2, f3, f4;

Obtain the face feature point maximal value of f4;

Obtain the face feature point minimum value of f4;

Video feature vector is formed by the face feature point average of the average of the face feature point average of f1, f2, f3, f4 and f4, face feature point maximal value and face feature point minimum value.

(7) according to Text eigenvector and audio feature vector and video feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body " I " to emotion object " apple ".

(8) emotion main body is generated such as, to the hobby label of emotion object, " I likes apple ".

As shown in Figure 6, in one embodiment, a kind of user preferences label excavating gear, comprise raw data acquisition module 602, word-dividing mode 604, main body and object extraction module 606, Text character extraction module 608, audio and video characteristic extraction module 610, Sentiment orientation judge module 612 and tag generation module 614, wherein:

Raw data acquisition module 602 is for obtaining text to be analyzed and audio frequency corresponding to text and facial video.

In one embodiment, raw data acquisition module 602 can receive the interaction data (comprising text, Voice & Video etc.) of enrolling in the user social contact active procedure that software communication client sends to another software communication client, from this interaction data, extract audio frequency and corresponding facial video, and further identify text corresponding to this audio frequency as text to be analyzed by speech recognition technology.The interaction data that software communication client sends to another software communication client passes through transit server, user preferences label excavating gear described in the application can be arranged in server, thus server is when receiving the interaction data that software communication client sends, raw data acquisition module 602 can obtain text to be analyzed and audio frequency corresponding to text and facial video according to this interaction data.

Word-dividing mode 604, for carrying out participle to text, obtains the word sequence forming text.

In one embodiment, word-dividing mode 604 can adopt existing participle instrument to carry out participle to text, obtains multiple word, and the plurality of word is arranged in order formation word sequence according to its position in the text.

Main body and object extraction module 606 extract the word as emotion main body and emotion object in word sequence.

In one embodiment, main body and object extraction module 606 can adopt the emotion main body of having trained and emotion object discrimination model to extract the word as emotion main body in word sequence and the word as emotion object.This emotion main body and emotion object discrimination model obtain according to a large amount of word sequence language material training having marked emotion main body and emotion object.

In one embodiment, main body and object extraction module 606 can adopt existing part-of-speech tagging instrument to carry out part-of-speech tagging to the word in word sequence, further, main body and object extraction module 606 can adopt the conditional random field models (CRFs, Conditional Random Fields) of having trained will extract the word as emotion main body and emotion object from the word sequence marking part of speech.This conditional random field models can obtain according to the word sequence language material training of a large amount of parts of speech having marked word and emotion main body and emotion object.

Text character extraction module 608 is for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction.

Text character extraction module 608 for extracting the emotion word feature of text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text; Text eigenvector is formed to multiple feature by the emotion word feature of text and zero of acquisition.

In one embodiment, the process that Text character extraction module 608 extracts the emotion word feature of text comprises:

(1) word in the word sequence of composition text is mated with default emotion vocabulary, obtain the emotion word in word sequence and the position in word sequence and be positioned at degree word before this emotion word and negative word.

If a certain word in the word sequence of text is noted as emotion word in the emotion vocabulary preset, then this word is labeled as the emotion word of text by Text character extraction module 608; Accordingly, if a certain word in the word sequence of text before this emotion word is noted as degree word or negative word in the emotion vocabulary preset, then Text character extraction module 608 is labeled as degree word before the emotion word of text or negative word by corresponding for this word.

(2) part of speech of the emotion word of text is obtained, obtain the term vector of the emotion word of text, and the term vector of each degree word before acquisition emotion word calculate the average of each term vector, obtain the term vector average of the degree word before the emotion word of text, and the quantity of all negative words before statistics emotion word, obtain the negative word quantity before the emotion word of text.

In one embodiment, Text character extraction module 608 can adopt the part of speech of existing part of speech marking instrument to emotion word to mark.

In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain instrument and obtain the term vector of emotion word and the term vector of each degree word, and it can be word2vec instrument etc. that the term vector of employing obtains instrument.Such as, 20 dimension term vectors of emotion word and 20 dimension term vectors of each degree word are obtained by word2vec instrument.

(3) by the term vector of the part of speech of the emotion word of text, the emotion word of text, text emotion word before the term vector average of degree word and negative word quantity before the emotion word of text form the emotion word feature of text.

In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain the term vector of instrument acquisition emotion object, and this term vector can be used for the semanteme characterizing term vector.

The emotion that different punctuation marks is corresponding different, such as emotion intensity etc. can be strengthened in exclamation mark.In one embodiment, Text character extraction module 608 can according to the punctuation mark characteristic of correspondence value in the punctuation mark list of feature values acquisition text preset as the punctuation mark feature in text; In this punctuation mark list of feature values preset, each punctuation mark characteristic of correspondence value can the semanteme of characterizing identifier number, and the form of this eigenwert can be the numerical value vector of single numerical value composition.

Different conjunctions can represent different emotions, such as " but " represent semantic turnover etc.In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain the term vector of the conjunction in instrument acquisition text, and this term vector can characterize the semanteme of conjunction.

Audio and video characteristic extraction module 610 is for going out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction.

In one embodiment, audio and video characteristic extraction module 610 goes out to characterize the process of emotion main body to the audio feature vector of the Sentiment orientation of emotion object according to audio extraction and comprises:

(1) the Mel frequency cepstral coefficient of audio frequency is extracted.

(2) audio frequency is cut into the audio section that each word in the word sequence of text is corresponding.

In one embodiment, audio and video characteristic extraction module 610 identifies the audio section of each word in the word sequence corresponding to text in this audio frequency by speech recognition technology, audio frequency is cut into this each audio section.

(3) extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section.

In one embodiment, audio and video characteristic extraction module 610 can extract the fundamental tone of each audio section based on average amplitude difference weighting correlation method (AMDF Weighted Auto Correlation, AWAC) extraction algorithm.

(4) extract the intensity of each audio section, and calculate the average of the intensity of each audio section.

In one embodiment, audio and video characteristic extraction module 610 can extract the intensity of each audio section based on Fast Fourier Transform (FFT) (FastFourier Transform, FFT) method.

(5) audio feature vector is formed by the intensity of the average of the fundamental tone of audio section corresponding to the Mel frequency cepstral coefficient of audio frequency, the average of the fundamental tone of each audio section, emotion main body, the intensity of each audio section and audio section corresponding to emotion main body.

Audio and video characteristic extraction module 610 is also for going out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction.

In one embodiment, audio and video characteristic extraction module 610 goes out to characterize the process of emotion main body to the video feature vector of the Sentiment orientation of emotion object according to facial video extraction and comprises:

(1) facial Video segmentation is become the video-frequency band that in the word sequence of text, each word is corresponding.

In one embodiment, audio and video characteristic extraction module 610 can obtain audio frequency corresponding to facial video, this audio frequency is the audio frequency that user carries out enrolling in the process of the action in facial video, the audio section of each word in the word sequence corresponding to text in this audio frequency is identified by speech recognition technology, the duration of each audio section of further statistics, facial video is split according to the duration of each audio section, facial Video segmentation is become the video-frequency band that each audio section is corresponding, thus obtains the video-frequency band that in the word sequence of text, each word is corresponding.

(2) calculate the face feature point average of each video-frequency band, and calculate the average of the face feature point average of each video-frequency band.

In one embodiment, audio and video characteristic extraction module 610 can extract the numerical value vector of the face feature point comprised in the face-image that each video-frequency band comprises, the face feature point average calculating each face-image is the average of the numerical value vector of the face feature point that corresponding face-image comprises, and the face feature point average calculating each video-frequency band is further the face feature point mean of mean that corresponding video-frequency band comprises face-image.

(3) the face feature point maximal value of video-frequency band corresponding to emotion main body is obtained.

In one embodiment, audio and video characteristic extraction module 610 can obtain the face feature point maximal value of each face-image that video-frequency band corresponding to emotion main body comprises, and obtains the face feature point maximal value of the maximal value in the face feature point maximal value of each face-image as this video-frequency band.The face feature point maximal value of a face-image is numerical value vector maximum in the numerical value vector of the face feature point that this face-image comprises, if the space length represented by a numerical value vector is greater than the space length of another numerical value vector representation, then think that this numerical value vector is larger than another numerical value vector.

(4) the face feature point minimum value of video-frequency band corresponding to emotion main body is obtained.

(5) video feature vector is formed by the face feature point average of the average of the face feature point average of each video-frequency band and video-frequency band corresponding to emotion main body, face feature point maximal value and face feature point minimum value.

Sentiment orientation judge module 612, for according to Text eigenvector and audio feature vector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object.

Tag generation module 614 is for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.

In one embodiment, if the Sentiment orientation classification of emotion main body to emotion object belongs to front or negative, then tag generation module 614 can generate corresponding front hobby label or negative hobby label, if and the Sentiment orientation classification of emotion main body to emotion object belongs to neutral, then tag generation module 614 can not generate hobby label.Such as, the Sentiment orientation classification of emotion main body to emotion object belongs to front, and tag generation module 614 can generate the hobby label of following form: emotion main body likes emotion object; And the Sentiment orientation classification of emotion main body to emotion object belongs to negative, tag generation module 614 can generate the hobby label of following form: emotion main body does not like emotion object, etc.Under the prerequisite that emotion main body is clear and definite, in hobby label, emotion main body can be omitted.

As shown in Figure 7, in one embodiment, above-mentioned user preferences label excavating gear also comprises emotion main body judges module 702, for judging whether the emotion main body that main body and object extraction module 606 are extracted is the user that audio frequency and facial video are corresponding, if, then start Text character extraction module 608, if not, then terminate.

The user that audio frequency is corresponding is the sounder of the voice of audio recording, and the user that facial video is corresponding is the owner of the facial pictures absorbed in facial video.In one embodiment, emotion main body judges module 702 can judge whether emotion main body is first person pronoun, if so, then judges that emotion main body is the user that audio frequency and facial video are corresponding, otherwise, then judge that emotion main body is not the user that audio frequency and facial video are corresponding.

In one embodiment, if main body and object extraction module 606 are also for lacking the word as emotion main body in the word sequence of text, then using audio frequency and user corresponding to facial video as emotion main body.

In one embodiment, prestored the corresponding relation of each audio frequency and facial video and user, main body and object extraction module 606 can from prestoring the audio frequency that obtains in finding step S202 this corresponding relation and user corresponding to facial video.

In one embodiment, audio frequency and facial video are recorded software by audio frequency and video and are recorded and obtain, and main body and object extraction module 606 can obtain these audio frequency and video in recording process and record the current login user of software, as audio frequency and user corresponding to facial video.In one embodiment, audio frequency and video recording function is integrated with in software communication client, above-mentioned audio frequency and facial video are recorded by software communication client and are obtained in user social contact active procedure, main body and object extraction module 606 can obtain the current login user of this software communication client in recording process, as audio frequency and user corresponding to facial video.

In one embodiment; based on above-mentioned any embodiment; and do not consider the impact of above-mentioned audio feature vector; and the device of user preferences label is only generated according to above-mentioned Text eigenvector and video feature vector; and do not consider the impact of above-mentioned video feature vector; and the device of user preferences label is only generated according to Text eigenvector and video feature vector, all belong to the scope of the application's protection.Such as, two kinds of devices are only generate the device of user preferences label according to the audio feature vector of the Text eigenvector of text and audio frequency corresponding to text and only generate the device of user preferences label according to the video feature vector of the Text eigenvector of text and facial video corresponding to text respectively below:

A kind of user preferences label excavating gear, comprising: raw data acquisition module, for obtaining text to be analyzed and audio frequency corresponding to text; Word-dividing mode, for carrying out participle to text, obtains the word sequence forming text; Main body and object extraction module, for extracting the word as emotion main body and emotion object in word sequence; Text character extraction module, for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Audio and video characteristic extraction module, for going out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction; Sentiment orientation judge module, for according to Text eigenvector and audio feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object; Tag generation module, for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text.

A kind of user preferences label excavating gear, comprising: raw data acquisition module, for obtaining text to be analyzed and facial video corresponding to text; Word-dividing mode, for carrying out participle to text, obtains the word sequence forming text; Main body and object extraction module, for extracting the word as emotion main body and emotion object in word sequence; Text character extraction module, for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Audio and video characteristic extraction module, for going out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction; Sentiment orientation judge module, for according to Text eigenvector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object; Tag generation module, for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the Text eigenvector of a text and facial video corresponding to the text.

The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims

1. a user preferences label method for digging, comprises the following steps:

Obtain text to be analyzed and audio frequency corresponding to text and/or facial video;

Go out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction, and/or, go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction;

2. user preferences label method for digging according to claim 1, is characterized in that, the step of the word as emotion main body and emotion object in the described word sequence of described extraction comprises:

The emotion main body that employing has been trained and emotion object discrimination model extract the word as emotion main body in described word sequence and the word as emotion object.

3. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the Text eigenvector of the Sentiment orientation of described emotion object according to described Text Feature Extraction and comprises:

Extract the emotion word feature of described text, and obtain zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of described emotion object, described text, the conjunction in described text; Described Text eigenvector is formed to multiple feature by described zero of described emotion word feature and acquisition.

4. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the audio feature vector of the Sentiment orientation of described emotion object according to described audio extraction and comprises:

Extract the Mel frequency cepstral coefficient of described audio frequency;

Described audio frequency is cut into the audio section that each word in described word sequence is corresponding;

Extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section;

Extract the intensity of each audio section, calculate the average of the intensity of each audio section;

Described audio feature vector is formed by the intensity of the average of the fundamental tone of audio section corresponding to the average of the fundamental tone of described Mel frequency cepstral coefficient, each audio section, described emotion main body, the intensity of each audio section and audio section corresponding to described emotion main body.

5. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the video feature vector of the Sentiment orientation of described emotion object according to described facial video extraction and comprises:

Described facial Video segmentation is become the video-frequency band that in described word sequence, each word is corresponding;

Calculate the face feature point average of each video-frequency band, and calculate the average of each face feature point average;

Obtain the face feature point maximal value of video-frequency band corresponding to described emotion main body;

Obtain the face feature point minimum value of video-frequency band corresponding to described emotion main body;

Described video feature vector is formed by the face feature point average of the average of each face feature point average described and video-frequency band corresponding to described emotion main body, face feature point maximal value and face feature point minimum value.

6. user preferences label method for digging according to claim 1, is characterized in that, after the step of the word as emotion main body and emotion object in the described word sequence of described extraction, further comprising the steps of:

Judge whether described emotion main body is the user that described audio frequency and/or facial video are corresponding, if, then enter and describedly go out to characterize the step of described emotion main body to the Text eigenvector of the Sentiment orientation of described emotion object according to described Text Feature Extraction, if not, then terminate.

7. user preferences label method for digging according to claim 1, is characterized in that, further comprising the steps of:

If lack the word as emotion main body in described word sequence, then using described audio frequency and/or user corresponding to facial video as emotion main body.

8. a user preferences label excavating gear, is characterized in that, comprising:

9. user preferences label excavating gear according to claim 8, it is characterized in that, described main body and object extraction module extract the word as emotion main body in described word sequence and the word as emotion object for adopting the emotion main body of having trained and emotion object discrimination model.

10. user preferences label excavating gear according to claim 8, it is characterized in that, described Text character extraction module for extracting the emotion word feature of described text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of described emotion object, described text, the conjunction in described text; Described Text eigenvector is formed to multiple feature by described zero of described emotion word feature and acquisition.

11. user preferences label excavating gears according to claim 8, it is characterized in that, described audio and video characteristic extraction module goes out to characterize the process of described emotion main body to the audio feature vector of the Sentiment orientation of described emotion object according to described audio extraction and comprises: the Mel frequency cepstral coefficient extracting described audio frequency, described audio frequency is cut into the audio section that each word in described word sequence is corresponding, extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section, extract the intensity of each audio section, calculate the average of the intensity of each audio section, by described Mel frequency cepstral coefficient, the average of the fundamental tone of each audio section, the fundamental tone of the audio section that described emotion main body is corresponding, the intensity of the average of the intensity of each audio section and audio section corresponding to described emotion main body forms described audio feature vector.

12. user preferences label excavating gears according to claim 8, it is characterized in that, described audio and video characteristic extraction module goes out to characterize the process of described emotion main body to the video feature vector of the Sentiment orientation of described emotion object according to described facial video extraction and comprises: described facial Video segmentation is become the video-frequency band that in described word sequence, each word is corresponding, calculate the face feature point average of each video-frequency band, and calculate the average of each face feature point average, obtain the face feature point maximal value of video-frequency band corresponding to described emotion main body, obtain the face feature point minimum value of video-frequency band corresponding to described emotion main body, and by the average of each face feature point average described, and the face feature point average of video-frequency band corresponding to described emotion main body, face feature point maximal value and face feature point minimum value form described video feature vector.

13. user preferences label excavating gears according to claim 8, is characterized in that, also comprise:

Emotion main body judges module, for judging whether described emotion main body is the user that described audio frequency and/or facial video are corresponding, if so, then starts described Text character extraction module, if not, then terminates.

14. user preferences label excavating gears according to claim 8, it is characterized in that, if described main body and object extraction module are also for lacking the word as emotion main body in described word sequence, then using described audio frequency and/or user corresponding to facial video as emotion main body.