CN104598644A - User fond label mining method and device - Google Patents

User fond label mining method and device Download PDF

Info

Publication number
CN104598644A
CN104598644A CN201510076723.4A CN201510076723A CN104598644A CN 104598644 A CN104598644 A CN 104598644A CN 201510076723 A CN201510076723 A CN 201510076723A CN 104598644 A CN104598644 A CN 104598644A
Authority
CN
China
Prior art keywords
emotion
main body
text
word
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510076723.4A
Other languages
Chinese (zh)
Other versions
CN104598644B (en
Inventor
孙晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Hefei University of Technology
Original Assignee
Tencent Technology Shenzhen Co Ltd
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Hefei University of Technology filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510076723.4A priority Critical patent/CN104598644B/en
Publication of CN104598644A publication Critical patent/CN104598644A/en
Application granted granted Critical
Publication of CN104598644B publication Critical patent/CN104598644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user fond label mining method which comprises the following steps: acquiring a text and a corresponding audio and/or a face video; carrying out word segmentation on the text to obtain a word sequence; extracting an emotion main body and an emotion object from the word sequence; extracting a text feature vector of an emotional tendency of the emotion main body to the emotion object according to the text; extracting an audio feature vector of the emotional tendency of the emotion main body to the emotion object according to the audio, and/or extracting a video feature vector of the emotional tendency of the emotion main body to the emotion object according to the face video; judging the emotional tendency of the emotion main body to the emotion object by a trained emotional tendency judgment model according to the text feature vector, the audio feature vector and the video feature vector; generating a fond label of the emotion main body to the emotion object. According to the method, the user fond label capable of reflecting the real fond of a user more accurately can be mined. Furthermore, the invention further provides a user fond label mining device.

Description

User preferences label method for digging and device
Technical field
The present invention relates to Data Mining, particularly relate to a kind of user preferences label method for digging and device.
Background technology
Common user tag is to reflect the word of user characteristics, word, phrase or short sentence.User preferences label is then for reflecting a kind of user tag of user preferences or Sentiment orientation.
The how emphasis on personalized service of existing internet, applications, for customer volume body is recommended to be applicable to the product of user and social information etc., to improve information pushing hit rate and user's viscosity.How to excavate user interest point, analyze user feeling tendency, thus generation user preferences label is the problem that a lot of internet, applications wishes to solve.
Conventional art generally generates the hobby label of this user to a certain user-defined keyword by other user in social networks, but due to the existence of factor and individual subjective factor, the user preferences label of generation differs the hobby and Sentiment orientation that reflect user surely truly.
Summary of the invention
Based on this, be necessary to provide a kind of user preferences label method for digging and the device of excavating the label that accurately reflection user truly likes.
A kind of user preferences label method for digging, comprises the following steps:
Obtain text to be analyzed and facial video and/or audio corresponding to text;
Participle is carried out to described text, obtains the word sequence forming described text;
Extract the word as emotion main body and emotion object in described word sequence;
Go out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;
Go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction, and/or, go out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction;
According to described Text eigenvector and described video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of described emotion main body to described emotion object;
The hobby label of described emotion main body to described emotion object is generated according to the described Sentiment orientation obtained.
A kind of user preferences label excavating gear, comprising:
Raw data acquisition module, for obtaining text to be analyzed and audio frequency corresponding to text and/or facial video;
Word-dividing mode, for carrying out participle to described text, obtains the word sequence forming described text;
Main body and object extraction module, for extracting the word as emotion main body and emotion object in described word sequence;
Text character extraction module, for going out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;
Audio and video characteristic extraction module, for going out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction, and/or, go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction;
Sentiment orientation judge module, for according to described Text eigenvector and described video feature vector and/or audio feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of described emotion main body to described emotion object;
Tag generation module, for generating the hobby label of described emotion main body to described emotion object according to the described Sentiment orientation obtained.
Above-mentioned user preferences label method for digging and device, emotion main body and emotion object is extracted from text, and extract and can characterize emotion main body to the Text eigenvector of the Sentiment orientation of emotion object and video feature vector and/or audio feature vector, according to text proper vector and video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object, and generate emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained further; Above-mentioned user preferences label method for digging and device, automatically emotion main body and emotion object can be extracted on the one hand, on the other hand obtain emotion main body to the Sentiment orientation of emotion object in conjunction with Text eigenvector and video feature vector and/or audio feature vector, Sentiment orientation more accurately can be obtained, therefore, not only more can go out user preferences label by intelligent excavating, and the user preferences label that more accurately can reflect that user truly likes can be excavated.
Accompanying drawing explanation
Fig. 1 is for running the part-structure block diagram of the equipment of the user preferences label method for digging described in the application in an embodiment;
Fig. 2 is the schematic flow sheet of the user preferences label method for digging in an embodiment;
Fig. 3 A is the schematic flow sheet of the step S208 of Fig. 2 in an embodiment;
Fig. 3 B is the schematic flow sheet of the emotion word characterization step extracting text in an embodiment;
Fig. 4 is the schematic flow sheet of the step S210 of Fig. 2 in an embodiment;
Fig. 5 is the schematic flow sheet of the step S212 of Fig. 2 in an embodiment;
Fig. 6 is the structural representation of the user preferences label excavating gear in an embodiment;
Fig. 7 is the structural representation of the user preferences label excavating gear in an embodiment.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
Fig. 1 is for running the part-structure block diagram of the equipment of the user preferences label method for digging described in the application in an embodiment.As shown in Figure 1, in one embodiment, this equipment comprises the processor, storage medium, internal memory and the network interface that are connected by system bus.Wherein, network interface is used for network service; Store operating system, database and the software instruction for realizing the user preferences label method for digging described in the application in storage medium, database for store generation in text to be analyzed described in the application and the facial video and/or audio of correspondence and the user preferences label method for digging implementation of the application and other data etc. of using; Internal memory is used for data cached; Processor is coordinated the work between all parts and is performed above-mentioned software instruction to realize the user preferences label method for digging described in the application.Structure shown in Fig. 1, it is only the block diagram of the part-structure relevant to the application's scheme, do not form the restriction to the equipment that the application's scheme is applied thereon, concrete equipment can comprise than parts more or less shown in figure, or combine some parts, or there is different parts layouts.
As shown in Figure 2, in one embodiment, a kind of user preferences label method for digging, comprises the following steps:
Step S202, obtains text to be analyzed and audio frequency corresponding to text and facial video.
In present specification, the audio frequency that text is corresponding is record the audio frequency that voice corresponding to text obtain, and the voice that text is corresponding are the voice that can represent text Chinese language; The facial video that text is corresponding is the video that the face-image absorbing sounding action continuously obtains, and this sounding action forms voice corresponding to the text.The audio frequency that text is corresponding and facial video corresponding to text obtain according to same sounding action synchronous recording.In partial content hereafter, also claim the facial video of this audio frequency and this corresponding.
In one embodiment, the interaction data (comprising text, Voice & Video etc.) of enrolling in the user social contact active procedure that software communication client sends to another software communication client can be received, from this interaction data, extract audio frequency and corresponding facial video, and further identify text corresponding to this audio frequency as text to be analyzed by speech recognition technology.The interaction data that software communication client sends to another software communication client passes through transit server, user preferences label method for digging described in the application can be performed by server, thus server is when receiving the interaction data that software communication client sends, step S202 can be performed according to this interaction data.
Step S204, carries out participle to text, obtains the word sequence forming text.
In one embodiment, existing participle instrument can be adopted to carry out participle to text, obtain multiple word, and the plurality of word is arranged in order formation word sequence according to its position in the text.
Step S206, extracts the word as emotion main body and emotion object in word sequence.
In one embodiment, the emotion main body of having trained and emotion object discrimination model can be adopted to extract the word as emotion main body in word sequence and the word as emotion object.This emotion main body and emotion object discrimination model obtain according to a large amount of word sequence language material training having marked emotion main body and emotion object.
In one embodiment, existing part-of-speech tagging instrument can be adopted to carry out part-of-speech tagging to the word in word sequence, further, the conditional random field models (CRFs, Conditional RandomFields) of having trained can be adopted will to extract the word as emotion main body and emotion object from the word sequence marking part of speech.This conditional random field models can obtain according to the word sequence language material training of a large amount of parts of speech having marked word and emotion main body and emotion object.
Step S208, goes out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction.
In one embodiment, comprise following characteristics component for characterizing the Text eigenvector of emotion main body to the Sentiment orientation of emotion object: the emotion word feature of text, and zero in following characteristics is to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text.
As shown in Figure 3A, step S208 can comprise the following steps:
Step S320, extracts the emotion word feature of text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text.
In one embodiment, the emotion word feature of text comprises the negative word quantity before the emotion word of the term vector average of the degree word before the emotion word of the part of speech of the emotion word of text, the term vector of the emotion word of text, text, text.
Wherein, the term vector average of the degree word before the emotion word of text is: the average of the term vector of all degree words before the emotion word in text.The term vector of word is the vector of the semanteme that can characterize word, and the distance (such as cosine similarity, Euclidean distance etc.) of the term vector of two words can be used for characterizing these two words similarity semantically.
As shown in Figure 3 B, in one embodiment, the step extracting the emotion word feature of text comprises the following steps:
Step S322, mates the word in the word sequence of composition text with default emotion vocabulary, obtains the emotion word in word sequence and the position in word sequence and is positioned at degree word before this emotion word and negative word.
Degree word is the word of expression degree, such as " very ", " a bit ", " very " etc.; Negative word is then such as, for representing the word of negative, " no ", " non-" etc.
Above-mentioned default emotion vocabulary can be the emotion vocabularys such as Hownet (knowing net) emotion vocabulary.This preset emotion vocabulary in comprise be noted as emotion word word, be noted as the word of degree word and be noted as the word of negative word.
If a certain word in the word sequence of text is noted as emotion word in the emotion vocabulary preset, then this word is labeled as the emotion word of text; Accordingly, if a certain word in the word sequence of text before this emotion word is noted as degree word or negative word in the emotion vocabulary preset, be then labeled as degree word before the emotion word of text or negative word by corresponding for this word.
Step S324, obtain the part of speech of the emotion word of text, obtain the term vector of the emotion word of text, and the term vector of each degree word before acquisition emotion word calculate the average of each term vector, obtain the term vector average of the degree word before the emotion word of text, and the quantity of all negative words before statistics emotion word, obtain the negative word quantity before the emotion word of text.
In one embodiment, the part of speech of existing part of speech marking instrument to emotion word can be adopted to mark.
In one embodiment, existing term vector can be utilized to obtain instrument and obtain the term vector of emotion word and the term vector of each degree word, it can be word2vec instrument etc. that the term vector of employing obtains instrument.Such as, 20 dimension term vectors of emotion word and 20 dimension term vectors of each degree word are obtained by word2vec instrument.
Step S326, by the term vector of the part of speech of the emotion word of text, the emotion word of text, text emotion word before the term vector average of degree word and negative word quantity before the emotion word of text form the emotion word feature of text.
In one embodiment, existing term vector can be utilized to obtain the term vector of instrument acquisition emotion object, and this term vector can be used for the semanteme characterizing term vector.
The emotion that different punctuation marks is corresponding different, such as emotion intensity etc. can be strengthened in exclamation mark.In one embodiment, can according to the punctuation mark characteristic of correspondence value in the punctuation mark list of feature values acquisition text preset as the punctuation mark feature in text; In this punctuation mark list of feature values preset, each punctuation mark characteristic of correspondence value can the semanteme of characterizing identifier number, and the form of this eigenwert can be the numerical value vector of single numerical value composition.
Different conjunctions can represent different emotions, such as " but " represent semantic turnover etc.In one embodiment, existing term vector can be utilized to obtain the term vector of the conjunction in instrument acquisition text, and this term vector can characterize the semanteme of conjunction.
Step S340, forms Text eigenvector by the emotion word feature of text and zero of acquisition to multiple feature.
Step S210, goes out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction.
As shown in Figure 4, in one embodiment, step S210 comprises the following steps:
Step S402, extracts the Mel frequency cepstral coefficient of audio frequency.
Step S404, is cut into the audio section that each word in the word sequence of text is corresponding by audio frequency.
In one embodiment, identify the audio section of each word in the word sequence corresponding to text in this audio frequency by speech recognition technology, audio frequency is cut into this each audio section.
Step S406, extracts the fundamental tone of each audio section, and calculates the average of the fundamental tone of each audio section.
In one embodiment, the fundamental tone of each audio section can be extracted based on average amplitude difference weighting correlation method (AMDF Weighted AutoCorrelation, AWAC) extraction algorithm.
Step S408, extracts the intensity of each audio section, and calculates the average of the intensity of each audio section.
In one embodiment, the intensity of each audio section can be extracted based on Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT) method.
Step S410, forms audio feature vector by the intensity of the average of the fundamental tone of audio section corresponding to the Mel frequency cepstral coefficient of audio frequency, the average of the fundamental tone of each audio section, emotion main body, the intensity of each audio section and audio section corresponding to emotion main body.
Step S212, goes out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction.
As shown in Figure 5, in one embodiment, step S212 comprises the following steps:
Step S502, becomes the video-frequency band that in the word sequence of text, each word is corresponding by facial Video segmentation.
In one embodiment, the audio frequency that facial video is corresponding can be obtained, this audio frequency is the audio frequency that user carries out enrolling in the process of the action in facial video, the audio section of each word in the word sequence corresponding to text in this audio frequency is identified by speech recognition technology, the duration of each audio section of further statistics, split facial video according to the duration of each audio section, facial Video segmentation is become the video-frequency band that each audio section is corresponding, thus obtain the video-frequency band that in the word sequence of text, each word is corresponding.
Step S504, calculates the face feature point average of each video-frequency band, and calculates the average of the face feature point average of each video-frequency band.
The point of facial characteristics can be characterized in face feature point and face-image.In one embodiment, face feature point is facial SIFT feature point.SIFT, i.e. scale invariant feature conversion (Scale-invariant featuretransform), it is a kind of descriptor for image processing field, this description has scale invariability, key point can be detected in the picture, this key point can describe the local feature of face, and this key point can be called as facial SIFT feature point.
In one embodiment, face feature point numerical value vector representation.Multiframe face-image is comprised in a video-frequency band, multiple face feature point is comprised in a face-image, the face feature point average of a face-image is the average of the numerical value vector of each face feature point comprised in this face-image, and the facial characteristics average of a video-frequency band is the face feature point mean of mean of each face-image that this video-frequency band comprises.
In one embodiment, the numerical value vector of the face feature point comprised in the face-image that each video-frequency band comprises can be extracted, the face feature point average calculating each face-image is the average of the numerical value vector of the face feature point that corresponding face-image comprises, and the face feature point average calculating each video-frequency band is further the face feature point mean of mean that corresponding video-frequency band comprises face-image.
Step S506, obtains the face feature point maximal value of video-frequency band corresponding to emotion main body.
In one embodiment, the face feature point maximal value of each face-image that video-frequency band corresponding to emotion main body comprises can be obtained, and obtain the face feature point maximal value of the maximal value in the face feature point maximal value of each face-image as this video-frequency band.The face feature point maximal value of a face-image is numerical value vector maximum in the numerical value vector of the face feature point that this face-image comprises, if the space length represented by a numerical value vector is greater than the space length of another numerical value vector representation, then think that this numerical value vector is larger than another numerical value vector.
Step S508, obtains the face feature point minimum value of video-frequency band corresponding to emotion main body.
The face feature point minimum value of video-frequency band corresponding to emotion main body can be obtained according to the mode identical with step S506.
Step S510, forms video feature vector by the face feature point average of the average of the face feature point average of each video-frequency band and video-frequency band corresponding to emotion main body, face feature point maximal value and face feature point minimum value.
Step S214, according to Text eigenvector and audio feature vector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object.
In one embodiment, this Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text and facial video corresponding to the text.
In one embodiment, above-mentioned Sentiment orientation discrimination model is to differentiate the supporting vector machine model (SVM) of the emotion main body in data object to the Sentiment orientation of emotion object, and this data object comprises video corresponding to a text and audio frequency corresponding to the text and the text.
Step S216, generates emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.
In one embodiment, Sentiment orientation discrimination model interpretable Sentiment orientation classification includes but not limited to: front, neutrality and negative.
In one embodiment, if the Sentiment orientation classification of emotion main body to emotion object belongs to front or negative, then generate corresponding front hobby label or negative hobby label, and if the Sentiment orientation classification of emotion main body to emotion object belongs to neutral, then can not generate hobby label.Such as, the Sentiment orientation classification of emotion main body to emotion object belongs to front, can generate the hobby label of following form: emotion main body likes emotion object; And the Sentiment orientation classification of emotion main body to emotion object belongs to negative, the hobby label of following form can be generated: emotion main body does not like emotion object, etc.Under the prerequisite that emotion main body is clear and definite, in hobby label, emotion main body can be omitted.
In one embodiment, after step S206, above-mentioned user preferences label method for digging is further comprising the steps of: judge whether emotion main body is the user that audio frequency and facial video are corresponding, if so, then enters step S208, if not, then terminates.
The user that audio frequency is corresponding is the sounder of the voice of audio recording, and the user that facial video is corresponding is the owner of the facial pictures absorbed in facial video.In one embodiment, can judge whether emotion main body is first person pronoun, if so, then judge that emotion main body is the user that audio frequency and facial video are corresponding, otherwise, then judge that emotion main body is not the user that audio frequency and facial video are corresponding.
Because other people hobby of speaker's comment is not necessarily accurate, in the present embodiment, only have when audio frequency and user corresponding to facial video are emotion main body, just carry out the step of follow-up generation user preferences label, therefore, user preferences label more accurately is more excavated.
In one embodiment, after step S206, above-mentioned user preferences label method for digging is further comprising the steps of: if lack the word as emotion main body in the word sequence of text, then using audio frequency and user corresponding to facial video as emotion main body.
In one embodiment, prestore the corresponding relation of each audio frequency and facial video and user, can from prestoring the audio frequency that obtains in finding step S202 this corresponding relation and user corresponding to facial video.
In one embodiment, audio frequency and facial video are recorded software by audio frequency and video and are recorded and obtain, and can obtain these audio frequency and video in recording process and record the current login user of software, as audio frequency and user corresponding to facial video.In one embodiment, audio frequency and video recording function is integrated with in software communication client, above-mentioned audio frequency and facial video are recorded by software communication client and are obtained in user social contact active procedure, the current login user of this software communication client in recording process can be obtained, as audio frequency and user corresponding to facial video.
In one embodiment; based on above-mentioned any embodiment; and do not consider the impact of above-mentioned audio feature vector; and the method for user preferences label is only generated according to above-mentioned Text eigenvector and video feature vector; and do not consider the impact of above-mentioned video feature vector; and the method for user preferences label is only generated according to Text eigenvector and video feature vector, all belong to the scope of the application's protection.The specific implementation of these two kinds of methods, on the basis of above-mentioned any embodiment, can eliminate not Consideration (audio feature vector or the video feature vector) impact in Correlation method for data processing.Such as, two kinds of methods are only generate the method for user preferences label according to the audio feature vector of the Text eigenvector of text and audio frequency corresponding to text and only generate the method for user preferences label according to the video feature vector of the Text eigenvector of text and facial video corresponding to text respectively below:
A kind of user preferences label method for digging, comprises the following steps: obtain text to be analyzed and audio frequency corresponding to text; Participle is carried out to text, obtains the word sequence forming text; Extract the word as emotion main body and emotion object in word sequence; Go out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Go out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction; According to Text eigenvector and audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object; Emotion main body is generated to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text.
A kind of user preferences label method for digging, comprises the following steps: obtain text to be analyzed and facial video corresponding to text; Participle is carried out to text, obtains the word sequence forming text; Extract the word as emotion main body and emotion object in word sequence; Go out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Go out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction; According to Text eigenvector and video feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object; Emotion main body is generated to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the Text eigenvector of a text and facial video corresponding to the text.
Below enumerate a concrete example so that above-mentioned user preferences label method for digging to be described.Such as, user has said a word, and the text that the words is corresponding is that " I likes eating apple.", be designated as T, the audio frequency that the words is corresponding is S, and the facial video of its correspondence is F:
(1) text T and audio frequency S and facial video F is obtained.
(2) participle is carried out to T, obtain word sequence: I likes eating apple.
(3) extract in word sequence as the word " I " of emotion main body and the word " apple " as emotion object.
(4) extract the Text eigenvector of T, comprising:
The emotion word extracted in T " is liked ", extracts emotion word feature further, and emotion word feature comprises the negative word quantity before the emotion word of the term vector average of the degree word before the part of speech of emotion word, the term vector of emotion word, emotion word, text.The part of speech that emotion word in T " is liked " is verb, does not have degree word, also do not have negative word before " liking ".
Extract the term vector of emotion object " apple ";
Extract the punctuation mark feature in T, the punctuation mark in T be ".”;
Extract the term vector of the conjunction in T, in this T, there is no conjunction.
Text eigenvector is formed by the term vector of emotion word feature, emotion object, the term vector of punctuation mark characteristic sum conjunction.
(5) extract the audio feature vector of S, comprising:
Extract the Mel frequency cepstral coefficient of S;
S is cut into the audio section that each word in word sequence " I likes eating apple " is corresponding: s1s2s3s4;
Extract the fundamental tone of s1, s2, s3, s4 respectively, and calculate the average of the fundamental tone of s1, s2, s3, s4;
Extract the intensity of s1, s2, s3, s4 respectively, and calculate the average of the intensity of s1, s2, s3, s4;
Audio feature vector is formed by the average of the Mel frequency cepstral coefficient of audio frequency, the fundamental tone of s1, s2, s3, s4, the fundamental tone of s4, the average of the intensity of s1, s2, s3, s4 and the intensity of s4.
(6) extract the video feature vector of F, comprising:
F is divided into the video-frequency band that each word in word sequence " I likes eating apple " is corresponding: f1f2f3f4;
Calculate the face feature point average of f1, f2, f3, f4, and calculate the average of the face feature point average of f1, f2, f3, f4;
Obtain the face feature point maximal value of f4;
Obtain the face feature point minimum value of f4;
Video feature vector is formed by the face feature point average of the average of the face feature point average of f1, f2, f3, f4 and f4, face feature point maximal value and face feature point minimum value.
(7) according to Text eigenvector and audio feature vector and video feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body " I " to emotion object " apple ".
(8) emotion main body is generated such as, to the hobby label of emotion object, " I likes apple ".
As shown in Figure 6, in one embodiment, a kind of user preferences label excavating gear, comprise raw data acquisition module 602, word-dividing mode 604, main body and object extraction module 606, Text character extraction module 608, audio and video characteristic extraction module 610, Sentiment orientation judge module 612 and tag generation module 614, wherein:
Raw data acquisition module 602 is for obtaining text to be analyzed and audio frequency corresponding to text and facial video.
In one embodiment, raw data acquisition module 602 can receive the interaction data (comprising text, Voice & Video etc.) of enrolling in the user social contact active procedure that software communication client sends to another software communication client, from this interaction data, extract audio frequency and corresponding facial video, and further identify text corresponding to this audio frequency as text to be analyzed by speech recognition technology.The interaction data that software communication client sends to another software communication client passes through transit server, user preferences label excavating gear described in the application can be arranged in server, thus server is when receiving the interaction data that software communication client sends, raw data acquisition module 602 can obtain text to be analyzed and audio frequency corresponding to text and facial video according to this interaction data.
Word-dividing mode 604, for carrying out participle to text, obtains the word sequence forming text.
In one embodiment, word-dividing mode 604 can adopt existing participle instrument to carry out participle to text, obtains multiple word, and the plurality of word is arranged in order formation word sequence according to its position in the text.
Main body and object extraction module 606 extract the word as emotion main body and emotion object in word sequence.
In one embodiment, main body and object extraction module 606 can adopt the emotion main body of having trained and emotion object discrimination model to extract the word as emotion main body in word sequence and the word as emotion object.This emotion main body and emotion object discrimination model obtain according to a large amount of word sequence language material training having marked emotion main body and emotion object.
In one embodiment, main body and object extraction module 606 can adopt existing part-of-speech tagging instrument to carry out part-of-speech tagging to the word in word sequence, further, main body and object extraction module 606 can adopt the conditional random field models (CRFs, Conditional Random Fields) of having trained will extract the word as emotion main body and emotion object from the word sequence marking part of speech.This conditional random field models can obtain according to the word sequence language material training of a large amount of parts of speech having marked word and emotion main body and emotion object.
Text character extraction module 608 is for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction.
In one embodiment, comprise following characteristics component for characterizing the Text eigenvector of emotion main body to the Sentiment orientation of emotion object: the emotion word feature of text, and zero in following characteristics is to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text.
Text character extraction module 608 for extracting the emotion word feature of text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of emotion object, text, the conjunction in text; Text eigenvector is formed to multiple feature by the emotion word feature of text and zero of acquisition.
In one embodiment, the emotion word feature of text comprises the negative word quantity before the emotion word of the term vector average of the degree word before the emotion word of the part of speech of the emotion word of text, the term vector of the emotion word of text, text, text.
Wherein, the term vector average of the degree word before the emotion word of text is: the average of the term vector of all degree words before the emotion word in text.The term vector of word is the vector of the semanteme that can characterize word, and the distance (such as cosine similarity, Euclidean distance etc.) of the term vector of two words can be used for characterizing these two words similarity semantically.
In one embodiment, the process that Text character extraction module 608 extracts the emotion word feature of text comprises:
(1) word in the word sequence of composition text is mated with default emotion vocabulary, obtain the emotion word in word sequence and the position in word sequence and be positioned at degree word before this emotion word and negative word.
Above-mentioned default emotion vocabulary can be the emotion vocabularys such as Hownet (knowing net) emotion vocabulary.This preset emotion vocabulary in comprise be noted as emotion word word, be noted as the word of degree word and be noted as the word of negative word.
If a certain word in the word sequence of text is noted as emotion word in the emotion vocabulary preset, then this word is labeled as the emotion word of text by Text character extraction module 608; Accordingly, if a certain word in the word sequence of text before this emotion word is noted as degree word or negative word in the emotion vocabulary preset, then Text character extraction module 608 is labeled as degree word before the emotion word of text or negative word by corresponding for this word.
(2) part of speech of the emotion word of text is obtained, obtain the term vector of the emotion word of text, and the term vector of each degree word before acquisition emotion word calculate the average of each term vector, obtain the term vector average of the degree word before the emotion word of text, and the quantity of all negative words before statistics emotion word, obtain the negative word quantity before the emotion word of text.
In one embodiment, Text character extraction module 608 can adopt the part of speech of existing part of speech marking instrument to emotion word to mark.
In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain instrument and obtain the term vector of emotion word and the term vector of each degree word, and it can be word2vec instrument etc. that the term vector of employing obtains instrument.Such as, 20 dimension term vectors of emotion word and 20 dimension term vectors of each degree word are obtained by word2vec instrument.
(3) by the term vector of the part of speech of the emotion word of text, the emotion word of text, text emotion word before the term vector average of degree word and negative word quantity before the emotion word of text form the emotion word feature of text.
In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain the term vector of instrument acquisition emotion object, and this term vector can be used for the semanteme characterizing term vector.
The emotion that different punctuation marks is corresponding different, such as emotion intensity etc. can be strengthened in exclamation mark.In one embodiment, Text character extraction module 608 can according to the punctuation mark characteristic of correspondence value in the punctuation mark list of feature values acquisition text preset as the punctuation mark feature in text; In this punctuation mark list of feature values preset, each punctuation mark characteristic of correspondence value can the semanteme of characterizing identifier number, and the form of this eigenwert can be the numerical value vector of single numerical value composition.
Different conjunctions can represent different emotions, such as " but " represent semantic turnover etc.In one embodiment, Text character extraction module 608 can utilize existing term vector to obtain the term vector of the conjunction in instrument acquisition text, and this term vector can characterize the semanteme of conjunction.
Audio and video characteristic extraction module 610 is for going out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction.
In one embodiment, audio and video characteristic extraction module 610 goes out to characterize the process of emotion main body to the audio feature vector of the Sentiment orientation of emotion object according to audio extraction and comprises:
(1) the Mel frequency cepstral coefficient of audio frequency is extracted.
(2) audio frequency is cut into the audio section that each word in the word sequence of text is corresponding.
In one embodiment, audio and video characteristic extraction module 610 identifies the audio section of each word in the word sequence corresponding to text in this audio frequency by speech recognition technology, audio frequency is cut into this each audio section.
(3) extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section.
In one embodiment, audio and video characteristic extraction module 610 can extract the fundamental tone of each audio section based on average amplitude difference weighting correlation method (AMDF Weighted Auto Correlation, AWAC) extraction algorithm.
(4) extract the intensity of each audio section, and calculate the average of the intensity of each audio section.
In one embodiment, audio and video characteristic extraction module 610 can extract the intensity of each audio section based on Fast Fourier Transform (FFT) (FastFourier Transform, FFT) method.
(5) audio feature vector is formed by the intensity of the average of the fundamental tone of audio section corresponding to the Mel frequency cepstral coefficient of audio frequency, the average of the fundamental tone of each audio section, emotion main body, the intensity of each audio section and audio section corresponding to emotion main body.
Audio and video characteristic extraction module 610 is also for going out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction.
In one embodiment, audio and video characteristic extraction module 610 goes out to characterize the process of emotion main body to the video feature vector of the Sentiment orientation of emotion object according to facial video extraction and comprises:
(1) facial Video segmentation is become the video-frequency band that in the word sequence of text, each word is corresponding.
In one embodiment, audio and video characteristic extraction module 610 can obtain audio frequency corresponding to facial video, this audio frequency is the audio frequency that user carries out enrolling in the process of the action in facial video, the audio section of each word in the word sequence corresponding to text in this audio frequency is identified by speech recognition technology, the duration of each audio section of further statistics, facial video is split according to the duration of each audio section, facial Video segmentation is become the video-frequency band that each audio section is corresponding, thus obtains the video-frequency band that in the word sequence of text, each word is corresponding.
(2) calculate the face feature point average of each video-frequency band, and calculate the average of the face feature point average of each video-frequency band.
The point of facial characteristics can be characterized in face feature point and face-image.In one embodiment, face feature point is facial SIFT feature point.SIFT, i.e. scale invariant feature conversion (Scale-invariant featuretransform), it is a kind of descriptor for image processing field, this description has scale invariability, key point can be detected in the picture, this key point can describe the local feature of face, and this key point can be called as facial SIFT feature point.
In one embodiment, face feature point numerical value vector representation.Multiframe face-image is comprised in a video-frequency band, multiple face feature point is comprised in a face-image, the face feature point average of a face-image is the average of the numerical value vector of each face feature point comprised in this face-image, and the facial characteristics average of a video-frequency band is the face feature point mean of mean of each face-image that this video-frequency band comprises.
In one embodiment, audio and video characteristic extraction module 610 can extract the numerical value vector of the face feature point comprised in the face-image that each video-frequency band comprises, the face feature point average calculating each face-image is the average of the numerical value vector of the face feature point that corresponding face-image comprises, and the face feature point average calculating each video-frequency band is further the face feature point mean of mean that corresponding video-frequency band comprises face-image.
(3) the face feature point maximal value of video-frequency band corresponding to emotion main body is obtained.
In one embodiment, audio and video characteristic extraction module 610 can obtain the face feature point maximal value of each face-image that video-frequency band corresponding to emotion main body comprises, and obtains the face feature point maximal value of the maximal value in the face feature point maximal value of each face-image as this video-frequency band.The face feature point maximal value of a face-image is numerical value vector maximum in the numerical value vector of the face feature point that this face-image comprises, if the space length represented by a numerical value vector is greater than the space length of another numerical value vector representation, then think that this numerical value vector is larger than another numerical value vector.
(4) the face feature point minimum value of video-frequency band corresponding to emotion main body is obtained.
(5) video feature vector is formed by the face feature point average of the average of the face feature point average of each video-frequency band and video-frequency band corresponding to emotion main body, face feature point maximal value and face feature point minimum value.
Sentiment orientation judge module 612, for according to Text eigenvector and audio feature vector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object.
In one embodiment, this Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text and facial video corresponding to the text.
In one embodiment, above-mentioned Sentiment orientation discrimination model is to differentiate the supporting vector machine model (SVM) of the emotion main body in data object to the Sentiment orientation of emotion object, and this data object comprises video corresponding to a text and audio frequency corresponding to the text and the text.
Tag generation module 614 is for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.
In one embodiment, Sentiment orientation discrimination model interpretable Sentiment orientation classification includes but not limited to: front, neutrality and negative.
In one embodiment, if the Sentiment orientation classification of emotion main body to emotion object belongs to front or negative, then tag generation module 614 can generate corresponding front hobby label or negative hobby label, if and the Sentiment orientation classification of emotion main body to emotion object belongs to neutral, then tag generation module 614 can not generate hobby label.Such as, the Sentiment orientation classification of emotion main body to emotion object belongs to front, and tag generation module 614 can generate the hobby label of following form: emotion main body likes emotion object; And the Sentiment orientation classification of emotion main body to emotion object belongs to negative, tag generation module 614 can generate the hobby label of following form: emotion main body does not like emotion object, etc.Under the prerequisite that emotion main body is clear and definite, in hobby label, emotion main body can be omitted.
As shown in Figure 7, in one embodiment, above-mentioned user preferences label excavating gear also comprises emotion main body judges module 702, for judging whether the emotion main body that main body and object extraction module 606 are extracted is the user that audio frequency and facial video are corresponding, if, then start Text character extraction module 608, if not, then terminate.
The user that audio frequency is corresponding is the sounder of the voice of audio recording, and the user that facial video is corresponding is the owner of the facial pictures absorbed in facial video.In one embodiment, emotion main body judges module 702 can judge whether emotion main body is first person pronoun, if so, then judges that emotion main body is the user that audio frequency and facial video are corresponding, otherwise, then judge that emotion main body is not the user that audio frequency and facial video are corresponding.
Because other people hobby of speaker's comment is not necessarily accurate, in the present embodiment, only have when audio frequency and user corresponding to facial video are emotion main body, just carry out the step of follow-up generation user preferences label, therefore, user preferences label more accurately is more excavated.
In one embodiment, if main body and object extraction module 606 are also for lacking the word as emotion main body in the word sequence of text, then using audio frequency and user corresponding to facial video as emotion main body.
In one embodiment, prestored the corresponding relation of each audio frequency and facial video and user, main body and object extraction module 606 can from prestoring the audio frequency that obtains in finding step S202 this corresponding relation and user corresponding to facial video.
In one embodiment, audio frequency and facial video are recorded software by audio frequency and video and are recorded and obtain, and main body and object extraction module 606 can obtain these audio frequency and video in recording process and record the current login user of software, as audio frequency and user corresponding to facial video.In one embodiment, audio frequency and video recording function is integrated with in software communication client, above-mentioned audio frequency and facial video are recorded by software communication client and are obtained in user social contact active procedure, main body and object extraction module 606 can obtain the current login user of this software communication client in recording process, as audio frequency and user corresponding to facial video.
In one embodiment; based on above-mentioned any embodiment; and do not consider the impact of above-mentioned audio feature vector; and the device of user preferences label is only generated according to above-mentioned Text eigenvector and video feature vector; and do not consider the impact of above-mentioned video feature vector; and the device of user preferences label is only generated according to Text eigenvector and video feature vector, all belong to the scope of the application's protection.Such as, two kinds of devices are only generate the device of user preferences label according to the audio feature vector of the Text eigenvector of text and audio frequency corresponding to text and only generate the device of user preferences label according to the video feature vector of the Text eigenvector of text and facial video corresponding to text respectively below:
A kind of user preferences label excavating gear, comprising: raw data acquisition module, for obtaining text to be analyzed and audio frequency corresponding to text; Word-dividing mode, for carrying out participle to text, obtains the word sequence forming text; Main body and object extraction module, for extracting the word as emotion main body and emotion object in word sequence; Text character extraction module, for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Audio and video characteristic extraction module, for going out to characterize the audio feature vector of emotion main body to the Sentiment orientation of emotion object according to audio extraction; Sentiment orientation judge module, for according to Text eigenvector and audio feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object; Tag generation module, for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the audio feature vector of the Text eigenvector of a text and audio frequency corresponding to the text.
A kind of user preferences label excavating gear, comprising: raw data acquisition module, for obtaining text to be analyzed and facial video corresponding to text; Word-dividing mode, for carrying out participle to text, obtains the word sequence forming text; Main body and object extraction module, for extracting the word as emotion main body and emotion object in word sequence; Text character extraction module, for going out to characterize the Text eigenvector of emotion main body to the Sentiment orientation of emotion object according to Text Feature Extraction; Audio and video characteristic extraction module, for going out to characterize the video feature vector of emotion main body to the Sentiment orientation of emotion object according to facial video extraction; Sentiment orientation judge module, for according to Text eigenvector and video feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of emotion main body to emotion object; Tag generation module, for generating emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained.This Sentiment orientation judgment models obtains according to the data sample training having marked in a large number Sentiment orientation, and each data sample comprises the video feature vector of the Text eigenvector of a text and facial video corresponding to the text.
Above-mentioned user preferences label method for digging and device, emotion main body and emotion object is extracted from text, and extract and can characterize emotion main body to the Text eigenvector of the Sentiment orientation of emotion object and video feature vector and/or audio feature vector, according to text proper vector and video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of emotion main body to emotion object, and generate emotion main body to the hobby label of emotion object according to the Sentiment orientation obtained further; Above-mentioned user preferences label method for digging and device, automatically emotion main body and emotion object can be extracted on the one hand, on the other hand obtain emotion main body to the Sentiment orientation of emotion object in conjunction with Text eigenvector and video feature vector and/or audio feature vector, Sentiment orientation more accurately can be obtained, therefore, not only more can go out user preferences label by intelligent excavating, and the user preferences label that more accurately can reflect that user truly likes can be excavated.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (14)

1. a user preferences label method for digging, comprises the following steps:
Obtain text to be analyzed and audio frequency corresponding to text and/or facial video;
Participle is carried out to described text, obtains the word sequence forming described text;
Extract the word as emotion main body and emotion object in described word sequence;
Go out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;
Go out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction, and/or, go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction;
According to described Text eigenvector and described video feature vector and/or audio feature vector, adopt the Sentiment orientation discrimination model of having trained, judge the Sentiment orientation of described emotion main body to described emotion object;
The hobby label of described emotion main body to described emotion object is generated according to the described Sentiment orientation obtained.
2. user preferences label method for digging according to claim 1, is characterized in that, the step of the word as emotion main body and emotion object in the described word sequence of described extraction comprises:
The emotion main body that employing has been trained and emotion object discrimination model extract the word as emotion main body in described word sequence and the word as emotion object.
3. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the Text eigenvector of the Sentiment orientation of described emotion object according to described Text Feature Extraction and comprises:
Extract the emotion word feature of described text, and obtain zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of described emotion object, described text, the conjunction in described text; Described Text eigenvector is formed to multiple feature by described zero of described emotion word feature and acquisition.
4. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the audio feature vector of the Sentiment orientation of described emotion object according to described audio extraction and comprises:
Extract the Mel frequency cepstral coefficient of described audio frequency;
Described audio frequency is cut into the audio section that each word in described word sequence is corresponding;
Extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section;
Extract the intensity of each audio section, calculate the average of the intensity of each audio section;
Described audio feature vector is formed by the intensity of the average of the fundamental tone of audio section corresponding to the average of the fundamental tone of described Mel frequency cepstral coefficient, each audio section, described emotion main body, the intensity of each audio section and audio section corresponding to described emotion main body.
5. user preferences label method for digging according to claim 1, is characterized in that, describedly goes out to characterize the step of described emotion main body to the video feature vector of the Sentiment orientation of described emotion object according to described facial video extraction and comprises:
Described facial Video segmentation is become the video-frequency band that in described word sequence, each word is corresponding;
Calculate the face feature point average of each video-frequency band, and calculate the average of each face feature point average;
Obtain the face feature point maximal value of video-frequency band corresponding to described emotion main body;
Obtain the face feature point minimum value of video-frequency band corresponding to described emotion main body;
Described video feature vector is formed by the face feature point average of the average of each face feature point average described and video-frequency band corresponding to described emotion main body, face feature point maximal value and face feature point minimum value.
6. user preferences label method for digging according to claim 1, is characterized in that, after the step of the word as emotion main body and emotion object in the described word sequence of described extraction, further comprising the steps of:
Judge whether described emotion main body is the user that described audio frequency and/or facial video are corresponding, if, then enter and describedly go out to characterize the step of described emotion main body to the Text eigenvector of the Sentiment orientation of described emotion object according to described Text Feature Extraction, if not, then terminate.
7. user preferences label method for digging according to claim 1, is characterized in that, further comprising the steps of:
If lack the word as emotion main body in described word sequence, then using described audio frequency and/or user corresponding to facial video as emotion main body.
8. a user preferences label excavating gear, is characterized in that, comprising:
Raw data acquisition module, for obtaining text to be analyzed and audio frequency corresponding to text and/or facial video;
Word-dividing mode, for carrying out participle to described text, obtains the word sequence forming described text;
Main body and object extraction module, for extracting the word as emotion main body and emotion object in described word sequence;
Text character extraction module, for going out to characterize the Text eigenvector of described emotion main body to the Sentiment orientation of described emotion object according to described Text Feature Extraction;
Audio and video characteristic extraction module, for going out to characterize the audio feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described audio extraction, and/or, go out to characterize the video feature vector of described emotion main body to the Sentiment orientation of described emotion object according to described facial video extraction;
Sentiment orientation judge module, for according to described Text eigenvector and described video feature vector and/or audio feature vector, adopts the Sentiment orientation discrimination model of having trained, judges the Sentiment orientation of described emotion main body to described emotion object;
Tag generation module, for generating the hobby label of described emotion main body to described emotion object according to the described Sentiment orientation obtained.
9. user preferences label excavating gear according to claim 8, it is characterized in that, described main body and object extraction module extract the word as emotion main body in described word sequence and the word as emotion object for adopting the emotion main body of having trained and emotion object discrimination model.
10. user preferences label excavating gear according to claim 8, it is characterized in that, described Text character extraction module for extracting the emotion word feature of described text, and obtains zero in following characteristics to multiple feature: the term vector of the punctuation mark feature in the term vector of described emotion object, described text, the conjunction in described text; Described Text eigenvector is formed to multiple feature by described zero of described emotion word feature and acquisition.
11. user preferences label excavating gears according to claim 8, it is characterized in that, described audio and video characteristic extraction module goes out to characterize the process of described emotion main body to the audio feature vector of the Sentiment orientation of described emotion object according to described audio extraction and comprises: the Mel frequency cepstral coefficient extracting described audio frequency, described audio frequency is cut into the audio section that each word in described word sequence is corresponding, extract the fundamental tone of each audio section, and calculate the average of the fundamental tone of each audio section, extract the intensity of each audio section, calculate the average of the intensity of each audio section, by described Mel frequency cepstral coefficient, the average of the fundamental tone of each audio section, the fundamental tone of the audio section that described emotion main body is corresponding, the intensity of the average of the intensity of each audio section and audio section corresponding to described emotion main body forms described audio feature vector.
12. user preferences label excavating gears according to claim 8, it is characterized in that, described audio and video characteristic extraction module goes out to characterize the process of described emotion main body to the video feature vector of the Sentiment orientation of described emotion object according to described facial video extraction and comprises: described facial Video segmentation is become the video-frequency band that in described word sequence, each word is corresponding, calculate the face feature point average of each video-frequency band, and calculate the average of each face feature point average, obtain the face feature point maximal value of video-frequency band corresponding to described emotion main body, obtain the face feature point minimum value of video-frequency band corresponding to described emotion main body, and by the average of each face feature point average described, and the face feature point average of video-frequency band corresponding to described emotion main body, face feature point maximal value and face feature point minimum value form described video feature vector.
13. user preferences label excavating gears according to claim 8, is characterized in that, also comprise:
Emotion main body judges module, for judging whether described emotion main body is the user that described audio frequency and/or facial video are corresponding, if so, then starts described Text character extraction module, if not, then terminates.
14. user preferences label excavating gears according to claim 8, it is characterized in that, if described main body and object extraction module are also for lacking the word as emotion main body in described word sequence, then using described audio frequency and/or user corresponding to facial video as emotion main body.
CN201510076723.4A 2015-02-12 2015-02-12 Favorite label mining method and device Active CN104598644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510076723.4A CN104598644B (en) 2015-02-12 2015-02-12 Favorite label mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510076723.4A CN104598644B (en) 2015-02-12 2015-02-12 Favorite label mining method and device

Publications (2)

Publication Number Publication Date
CN104598644A true CN104598644A (en) 2015-05-06
CN104598644B CN104598644B (en) 2020-10-30

Family

ID=53124429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510076723.4A Active CN104598644B (en) 2015-02-12 2015-02-12 Favorite label mining method and device

Country Status (1)

Country Link
CN (1) CN104598644B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294797A (en) * 2016-08-15 2017-01-04 北京聚爱聊网络科技有限公司 A kind of generation method and apparatus of video gene
CN106503805A (en) * 2016-11-14 2017-03-15 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN106844330A (en) * 2016-11-15 2017-06-13 平安科技(深圳)有限公司 The analysis method and device of article emotion
CN107330001A (en) * 2017-06-09 2017-11-07 国政通科技股份有限公司 The creation method and system of a kind of diversification label
CN107491435A (en) * 2017-08-14 2017-12-19 深圳狗尾草智能科技有限公司 Method and device based on Computer Automatic Recognition user feeling
CN108305643A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN108777804A (en) * 2018-05-30 2018-11-09 腾讯科技(深圳)有限公司 media playing method and device
CN108810577A (en) * 2018-06-15 2018-11-13 深圳市茁壮网络股份有限公司 A kind of construction method, device and the electronic equipment of user's portrait
CN108810625A (en) * 2018-06-07 2018-11-13 腾讯科技(深圳)有限公司 A kind of control method for playing back of multi-medium data, device and terminal
CN108831450A (en) * 2018-03-30 2018-11-16 杭州鸟瞰智能科技股份有限公司 A kind of virtual robot man-machine interaction method based on user emotion identification
CN109119076A (en) * 2018-08-02 2019-01-01 重庆柚瓣家科技有限公司 A kind of old man user exchanges the collection system and method for habit
CN109165283A (en) * 2018-08-20 2019-01-08 北京智能管家科技有限公司 Resource recommendation method, device, equipment and storage medium
CN110334182A (en) * 2019-06-24 2019-10-15 中国南方电网有限责任公司 Online service method with speech emotion recognition
CN110428807A (en) * 2019-08-15 2019-11-08 三星电子(中国)研发中心 A kind of audio recognition method based on deep learning, system and device
CN111222011A (en) * 2020-01-06 2020-06-02 腾讯科技(深圳)有限公司 Video vector determination method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140007149A1 (en) * 2012-07-02 2014-01-02 Wistron Corp. System, apparatus and method for multimedia evaluation
CN103714071A (en) * 2012-09-29 2014-04-09 株式会社日立制作所 Label emotional tendency quantifying method and label emotional tendency quantifying system
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN104239373A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Document tag adding method and document tag adding device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140007149A1 (en) * 2012-07-02 2014-01-02 Wistron Corp. System, apparatus and method for multimedia evaluation
CN103714071A (en) * 2012-09-29 2014-04-09 株式会社日立制作所 Label emotional tendency quantifying method and label emotional tendency quantifying system
CN104239373A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Document tag adding method and document tag adding device
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈国青等: "《中国信息***研究 新兴技术背景下的机遇与挑战》", 30 November 2011 *
马刚: "《基于语义的Web数据挖掘》", 31 January 2014 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294797A (en) * 2016-08-15 2017-01-04 北京聚爱聊网络科技有限公司 A kind of generation method and apparatus of video gene
CN106294797B (en) * 2016-08-15 2019-10-18 北京数码视讯科技股份有限公司 A kind of generation method and device of video gene
CN106503805A (en) * 2016-11-14 2017-03-15 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN106503805B (en) * 2016-11-14 2019-01-29 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis method
CN106844330A (en) * 2016-11-15 2017-06-13 平安科技(深圳)有限公司 The analysis method and device of article emotion
CN106844330B (en) * 2016-11-15 2018-04-20 平安科技(深圳)有限公司 The analysis method and device of article emotion
CN107330001A (en) * 2017-06-09 2017-11-07 国政通科技股份有限公司 The creation method and system of a kind of diversification label
CN108305643B (en) * 2017-06-30 2019-12-06 腾讯科技(深圳)有限公司 Method and device for determining emotion information
CN108305643A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN107491435A (en) * 2017-08-14 2017-12-19 深圳狗尾草智能科技有限公司 Method and device based on Computer Automatic Recognition user feeling
CN107491435B (en) * 2017-08-14 2021-02-26 苏州狗尾草智能科技有限公司 Method and device for automatically identifying user emotion based on computer
CN108831450A (en) * 2018-03-30 2018-11-16 杭州鸟瞰智能科技股份有限公司 A kind of virtual robot man-machine interaction method based on user emotion identification
CN108777804B (en) * 2018-05-30 2021-07-27 腾讯科技(深圳)有限公司 Media playing method and device
CN108777804A (en) * 2018-05-30 2018-11-09 腾讯科技(深圳)有限公司 media playing method and device
CN108810625A (en) * 2018-06-07 2018-11-13 腾讯科技(深圳)有限公司 A kind of control method for playing back of multi-medium data, device and terminal
CN108810577A (en) * 2018-06-15 2018-11-13 深圳市茁壮网络股份有限公司 A kind of construction method, device and the electronic equipment of user's portrait
CN108810577B (en) * 2018-06-15 2021-02-09 深圳市茁壮网络股份有限公司 User portrait construction method and device and electronic equipment
CN109119076A (en) * 2018-08-02 2019-01-01 重庆柚瓣家科技有限公司 A kind of old man user exchanges the collection system and method for habit
CN109165283A (en) * 2018-08-20 2019-01-08 北京智能管家科技有限公司 Resource recommendation method, device, equipment and storage medium
CN109165283B (en) * 2018-08-20 2021-12-28 北京如布科技有限公司 Resource recommendation method, device, equipment and storage medium
CN110334182A (en) * 2019-06-24 2019-10-15 中国南方电网有限责任公司 Online service method with speech emotion recognition
CN110428807A (en) * 2019-08-15 2019-11-08 三星电子(中国)研发中心 A kind of audio recognition method based on deep learning, system and device
CN111222011A (en) * 2020-01-06 2020-06-02 腾讯科技(深圳)有限公司 Video vector determination method and device
CN111222011B (en) * 2020-01-06 2023-11-14 腾讯科技(深圳)有限公司 Video vector determining method and device

Also Published As

Publication number Publication date
CN104598644B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN104598644A (en) User fond label mining method and device
US11315546B2 (en) Computerized system and method for formatted transcription of multimedia content
CN107507612B (en) Voiceprint recognition method and device
CN107492379B (en) Voiceprint creating and registering method and device
CN106156365B (en) A kind of generation method and device of knowledge mapping
CN107481720B (en) Explicit voiceprint recognition method and device
CN106980624B (en) Text data processing method and device
US9230547B2 (en) Metadata extraction of non-transcribed video and audio streams
WO2018045646A1 (en) Artificial intelligence-based method and device for human-machine interaction
US11848009B2 (en) Adaptive interface in a voice-activated network
US10108707B1 (en) Data ingestion pipeline
CN111128223A (en) Text information-based auxiliary speaker separation method and related device
JP5496863B2 (en) Emotion estimation apparatus, method, program, and recording medium
CN108304424B (en) Text keyword extraction method and text keyword extraction device
US20230089308A1 (en) Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering
CN109583401A (en) Question searching method capable of automatically generating answers and user equipment
US20170270701A1 (en) Image processing device, animation display method and computer readable medium
Al-Azani et al. Enhanced video analytics for sentiment analysis based on fusing textual, auditory and visual information
CN109325124A (en) A kind of sensibility classification method, device, server and storage medium
CN111586469A (en) Bullet screen display method and device and electronic equipment
CN112233680A (en) Speaker role identification method and device, electronic equipment and storage medium
Chakroun et al. New approach for short utterance speaker identification
CN117441165A (en) Reducing bias in generating language models
JP6087704B2 (en) Communication service providing apparatus, communication service providing method, and program
CN115705705A (en) Video identification method, device, server and storage medium based on machine learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant