CN103020712A - Distributed classification device and distributed classification method for massive micro-blog data - Google Patents

Distributed classification device and distributed classification method for massive micro-blog data Download PDF

Info

Publication number
CN103020712A
CN103020712A CN2012105838868A CN201210583886A CN103020712A CN 103020712 A CN103020712 A CN 103020712A CN 2012105838868 A CN2012105838868 A CN 2012105838868A CN 201210583886 A CN201210583886 A CN 201210583886A CN 103020712 A CN103020712 A CN 103020712A
Authority
CN
China
Prior art keywords
controller
microblogging
data
microblogging data
master controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105838868A
Other languages
Chinese (zh)
Other versions
CN103020712B (en
Inventor
王国仁
信俊昌
聂铁铮
赵相国
丁琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201210583886.8A priority Critical patent/CN103020712B/en
Publication of CN103020712A publication Critical patent/CN103020712A/en
Application granted granted Critical
Publication of CN103020712B publication Critical patent/CN103020712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed classification device and a distributed classification method for massive micro-blog data, and belongs to the field of data mining technology. The distributed classification device is of a distributed structure. The method includes that each secondary controller transmits an intermediate result to a master controller according to an ELM (extreme learning machine) processing technology, and the intermediate result is generated by the secondary controller and is used for generating a final micro-blog data classifier; the master controller acquires the final micro-blog data classifier according to an ELM principle after receiving all the intermediate results transmitted by the secondary controllers; and the generated micro-blog classifier classifies the micro-blog data. The distributed classification device and the distributed classification method have the advantages that the shortcoming that an existing method implemented by an extreme learning machine technology only can be applied to a centralized environment and cannot be adapted to ELM classification for large-scale training sample sets is overcome, the massive micro-blog data can be processed and analyzed, the effectiveness of the massive micro-blog data accumulated during application can be sufficiently played, and an effective application service effect is realized.

Description

A kind of distributed sorter and method of magnanimity microblogging data
Technical field
The invention belongs to the data mining technology field, relate to a kind of extreme learning machine sorter and method based on distributed proccessing, particularly a kind of distributed sorter and method of magnanimity microblogging data.
Background technology
At present, all can produce a large amount of information all the time on the internet, the form of expression of these information is varied, and wherein the quantity of information of microblogging platform generation is also increasing sharply.Microblogging is miniature blog (Micro-Blogs), is the blog form that a kind of permission user upgraded and can publish brief text (usually about 140 words) in time.The fast development of microblogging be so that anyone can become microblog users, and delivers and reading information in the client of any support microblogging at any time, carries out interaction, expresses the emotion information of oneself.Microblogging has become the powerful information carrier in internet, and the micro-blog information amount reached the magnanimity scale, becomes at present most popular information sharing, propagation and interaction platform.For this reason, how to take adequate measures and technology, from the microblogging data of magnanimity, excavate Useful Information, the judgement of following making prediction property of things has been become focus and the difficult point of current data excavation applications research.
In existing correlative study for the microblogging data, the data volume of handled microblogging data is less often, just can process under centralized environment; Yet be accompanied by the growth at full speed of microblogging data in the internet, the data volume of microblogging data adopts existing method to be difficult to realize large-scale data analysis considerably beyond the processing power of single computing machine.
Summary of the invention
For the deficiencies in the prior art, the objective of the invention is to propose a kind of distributed sorter and method of magnanimity microblogging data, limit of utilization learning machine (Extreme Learning Machine, ELM) technology is classified to the microblogging data, and then can effectively process and analyze the microblogging data of magnanimity, to reach the effectiveness of the magnanimity microblogging data that accumulate in the application is not fully exerted, better is the purpose of application service.
Technical scheme of the present invention is achieved in that a kind of distributed sorter of magnanimity microblogging data, this device adopts distributed frame, comprise that a master controller and at least one are from controller, and each is all interconnected with master controller from controller, master controller intercoms from controller mutually with each, all are from separate between controller, complete independently task separately; Disposal route according to ELM, each intermediate result for generating final microblogging data sorter of self being processed from controller sends to master controller, master controller according to the principle of ELM, obtains final microblogging data sorter after receiving all intermediate results of sending from controller.
Describedly comprise from controller:
To measuring device: be used for and convert the form of vector representation from controller to every microblogging training data of classification results, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
Stripper: the eigenvectors matrix X that is used for peeling off all microblogging data of the microblogging data training set after processing to measuring device iWith the classification results matrix T i
Converter: the principle of limit of utilization learning machine (ELM) is used for the eigenvectors matrix X that stripper is extracted iConvert the hidden layer output matrix H among the ELM to i
The preceding paragraph counter: the principle of limit of utilization learning machine (ELM) is used for according to hidden layer output matrix H i, calculate intermediate result H i TH i, and submit to master controller.
Consequent counter: the principle of limit of utilization learning machine (ELM) is used for according to hidden layer output matrix H iWith microblogging data centralization classification results matrix T i, calculate intermediate result H i TT i, and submit to master controller.
Described master controller comprises:
Preceding paragraph totalizer: be used for merging the intermediate result H that each is submitted to from controller i TH i, obtain summarized results H TH.
Consequent totalizer: be used for merging the intermediate result H that each is submitted to from controller i TT i, obtain summarized results H TT.
Parameter generators: the principle of limit of utilization learning machine (ELM), be used for the result according to the preceding paragraph totalizer that gathers and the output of consequent totalizer, calculate the weight vectors parameter beta of output node.
Taxonomy generator: the parameter beta that obtains according to parameter generators makes up the sorter of microblogging data, is used for microblogging data to be tested are classified.
A kind of Distributed Classification of magnanimity microblogging data may further comprise the steps:
Step 1: the preparation of microblogging training dataset;
The preparation of microblogging training dataset comprises grasping original microblogging data and manually the microblogging data being marked two parts.Can adopt following dual mode: first kind of way is to be grasped the original microblogging data of required processing by master controller, and manually mark for each bar training data, then the classification results that represents these microblogging data arrives these microblogging data allocations accordingly from controller; The second way is to be communicated by letter from controller with each by master controller, notify the information of each microblogging data that need to grasp from controller, each grasps original microblogging data from controller self, and for the original microblogging data that self grasp manually mark, represent the classification results of these microblogging data;
Step 2: master controller is the desired parameters initialization, and sends to all from controller;
The principle of limit of utilization learning machine (ELM) generates parameter in advance at random by master controller, comprising: the weight vectors w of hidden node number L, input node 1, w 2..., w L, hidden node side-play amount b 1, b 2..., b L, and these parameters are sent to all from controller;
Step 3: each is processed local microblogging data set separately from controller, and result is sent to master controller, generates the microblogging data sorter by master controller;
Step 3-1: microblogging data vector;
To carry out vectorization with every microblogging training data of classification results part, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
Step 3-2: the peeling off of microblogging data;
For each microblogging data set through feature extraction from controller microblogging data training set, peel off proper vector part and the classification results part of these data, form each from the eigenvectors matrix X of the microblogging data training set of controller iWith the classification results matrix T i, namely so that each all generates separately local microblogging data set (X from controller i, T i), wherein, X iBe the eigenmatrix of microblogging data set, T iClassification results matrix for the microblogging data set.
Step 3-3: each generates intermediate result from controller basis local microblogging data set separately, and sends to master controller;
Each is from controller n iWeight vectors w according to the input node that receives 1, w 2..., w LThreshold value b with i hidden node 1, b 2..., b L, and local microblogging training dataset (X i, T i), calculate and make up the required intermediate result of sorter, and intermediate result is submitted to master controller;
Step 3-3-1: with the eigenmatrix X of local microblogging data set iBe converted into the hidden layer output matrix H of ELM i
Step 3-3-2: according to hidden layer output matrix H i, calculate intermediate result U i=H i TH i
Step 3-3-3: according to hidden layer output matrix H iClassification results matrix T with local training dataset i, calculate intermediate result V i=H i TT i
Step 3-4: master controller receives and gathers each from the intermediate result of controller; According to the Computing Principle of the intermediate result that gathers according to ELM, calculate the weight vectors parameter beta of output node, and then try to achieve the microblogging data sorter;
Step 3-4-1: merge the intermediate result U that each is submitted to from controller i, obtain summarized results U=∑ U i=∑ H i TH i=H TH;
Step 3-4-2: merge the intermediate result V that each is submitted to from controller i, obtain summarized results V=∑ V i=∑ H i TT i=H TT;
Step 3-4-3: the weight vectors parameter beta of calculating output node according to the U that gathers and V:
β = ( I λ + H T H ) - 1 H T T = ( I λ + U ) - 1 V
Wherein, I is unit matrix, and λ is the parameter of user's appointment, () -1It is matrix inversion operation;
And then the formula of definite microblogging data sorter,
f(x)=h(x)β
Wherein, the classification results of f (x) expression microblogging data to be sorted, the hidden layer output vector of h (x) expression microblogging data to be sorted;
Step 4: the automatic classification of microblogging data
The automatic classification of microblogging data can be taked dual mode: first kind of way is that master controller continues crawl microblogging data, the microblogging data sorter that uses step 3 to generate is directly exported the classification results of microblogging data to be sorted, the second is that master controller sends to each from controller with the microblogging data sorter that step 3 generates, then each uses sorter that the microblogging data to be sorted of self are classified from controller, tries to achieve classification results.
Beneficial effect: the present invention is a kind of distributed sorter and method of magnanimity microblogging data, overcome limit of utilization learning machine technology in the past and only can be applied to centralized environment, the defective that can't adapt to the ELM classification of large-scale training sample set, become possibility so that process and analyze magnanimity microblogging data, the effectiveness of the magnanimity microblogging data of accumulation was not fully exerted during order was used, and had played better to be the effect of application service.
Description of drawings
Fig. 1 is the distributed architecture synoptic diagram of one embodiment of the present invention;
Fig. 2 be the master controller of one embodiment of the present invention with from the connection diagram of controller;
Fig. 3 is that the master controller of one embodiment of the present invention reaches from the controller structured flowchart;
Fig. 4 is the distributed microblogging data training set synoptic diagram of one embodiment of the present invention;
Fig. 5 is the distributed microblogging data training method process flow diagram of one embodiment of the present invention;
Fig. 6 is the method flow diagram that produces the microblogging data sorter in one embodiment of the present invention;
Fig. 7 is the local intermediate result synoptic diagram after one embodiment of the present invention transforms from controller;
Fig. 8 is that one embodiment of the present invention gathers synoptic diagram from controller calculating intermediate result and master controller.
Embodiment
Below in conjunction with accompanying drawing embodiments of the present invention are described in further detail.
In microblogging data now, comprised a large amount of microblog users emotion informations, these information tables understand that microblog users is to certain event, commodity, personage's etc. viewpoint and view, these emotion informations have very high research and using value, also just so that obtained to pay close attention to widely for the sentiment analysis of microblogging data, had wide application prospect, aspects such as viewpoint analysis, commodity evaluation, will of the people detection.Therefore, in specific embodiments of the invention, come the microblogging data are classified according to the emotion tendency of microblogging data.
The present invention be under distributed environment to magnanimity microblogging data analysis, wherein distributed architecture is as shown in Figure 1.Comprise a host node n 0With a plurality of from node n 1, n 2..., n s, wherein, host node n 0Respectively with a plurality of from node n 1, n 2..., n sInterconnected, can with all from node n 1, n 2..., n sIntercom mutually.
One embodiment of the present invention adopt overall connection diagram as shown in Figure 2, comprising a master controller and a plurality of from controller (from controller 1, from controller 2 ..., from controller m), each is all interconnected with master controller from controller.Principle according to extreme learning machine (ELM), each is from the microblogging training dataset of controller processing self part, produce the intermediate result that is used for generating final sorter separately, and these intermediate results are sent to master controller, master controller is after receiving these intermediate results, according to the principle of extreme learning machine (ELM), produce final microblogging data sorter equally.
Wherein, comprise to measuring device, stripper, converter, preceding paragraph counter and consequent counter from controller.Master controller comprises preceding paragraph totalizer, consequent totalizer, parameter generators and taxonomy generator.
To measuring device: be used for and convert the form of vector representation from controller to every microblogging training data of classification results, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
Stripper: the eigenvectors matrix X that is used for peeling off all microblogging data of the microblogging data training set after processing to measuring device iWith the classification results matrix T i
Converter: the principle of limit of utilization learning machine (ELM) is used for the eigenvectors matrix X that will extract to stripper iConvert the hidden layer output matrix H among the ELM to i
The preceding paragraph counter: the principle of limit of utilization learning machine (ELM) is used for according to hidden layer output matrix H i, calculate intermediate result H i TH i, and submit to master controller.
Consequent counter: the principle of limit of utilization learning machine (ELM) is used for according to hidden layer output matrix H iWith microblogging data centralization classification results matrix T i, calculate intermediate result H i TT i, and submit to master controller.
Preceding paragraph totalizer: be used for merging the intermediate result H that each is submitted to from controller i TH i, obtain summarized results H TH.
Consequent totalizer: be used for merging the intermediate result H that each is submitted to from controller i TT i, obtain summarized results H TT.
Parameter generators: the principle of limit of utilization learning machine (ELM), be used for the result according to the preceding paragraph totalizer that gathers and the output of consequent totalizer, calculate the weight vectors parameter beta of output node.
Taxonomy generator: the parameter beta that obtains according to parameter generators makes up the sorter of microblogging data, is used for microblogging data to be tested are classified.
In the present embodiment, respectively all adopt the realization of ELM technology to the analysis of microblogging data from controller and master controller, ELM technology wherein is specific as follows:
Extreme learning machine is a kind of training method based on single hidden layer feedforward neural network (Single Hidden-Layer Feedforward Neural Networks, SLFNs).ELM arranges hidden layer at random to connection weights and the bias of input layer before training, in the implementation of algorithm, do not need to adjust the input weights at networking and the bias of hidden layer unit, can either produce only optimal solution to the output layer weight and analyse solution, good generalization ability and the pace of learning that is exceedingly fast can be provided.
The ultimate principle of ELM is: in training process, ELM produces input weight and hidden node threshold value at first at random, and then calculates the output weight of SLFNs according to training data.Suppose given N training sample (x j, t j), x wherein jThe proper vector part of training sample, t jIt is the classification results part of sample.The number of hidden nodes is that L, excitation function are that the SLFNs of g (x) can be expressed as formally:
Σ i = 1 L β i g ( x j ) = Σ i = 1 L β i g ( w i · x j + b i ) = o j , j = 1,2 , . . . , N . - - - ( 1 )
Wherein, w iIt is the weight vectors that connects i hidden node and input node; β iIt is the weight vectors that connects i hidden node and output node; b iIt is the threshold value of i hidden node; o jJ the output vector of SLFNs.
If SLFNs can free from error approximate training sample, will satisfy so
Figure BDA00002680815000061
Namely there is w i, β iAnd b i, so that Σ i = 1 L β i g ( w i · x j + b i ) = t j , Brief note is H β=T.Wherein,
H ( w 1 , w 2 , . . . , w L , b 1 , b 2 , . . . , b L , x 1 , x 2 , . . . , x N ) = g ( w 1 · x 1 + b 1 ) g ( w 2 · x 1 + b 2 ) · · · g ( w L · x 1 + b L ) g ( w 1 · x 2 + b 1 ) g ( w 2 · x 2 + b 2 ) · · · g ( w L · x 2 + b L ) · · · · · · · · · · · · g ( w 1 · x N + b 1 ) g ( w 2 · x N + b 2 ) · · · g ( w L · x N + b L ) - - - ( 2 )
β = [ β 1 T , β 2 T , · · · β L T ] T , T = [ t 1 T , t 2 T , · · · t N T ] T · Wherein, matrix x TTransposed matrix for matrix x.
Matrix H is called the hidden layer output matrix.Among formula H β=T, only having β is unknown number, can get
Figure BDA00002680815000066
Figure BDA00002680815000067
It is the Moore-Penrose generalized inverse of H.
On the basis of basic extreme learning machine, several scholars have further proposed based on the ELM of hidden layer Feature Mapping at random, at this moment
Figure BDA00002680815000068
Wherein I is unit matrix, and λ is the parameter of user's appointment;
In addition, also has the ELM (Kernel based ELM) based on kernel function, the mutation of a plurality of ELM such as complete complicated ELM (Fully Complex ELM), on-line continuous ELM (Online Sequential ELM), increment ELM (Incremental ELM) and integrated ELM (Ensemble ofELM), all be widely used in different applications, reached good practical application effect.
Present embodiment is according to the microblogging data relevant with the apple panel computer, present microblog users is analyzed the emotion tendency of apple panel computer, by such emotional orientation analysis, help relevant product producer, supplier, dealer etc. to make correct judgement for the development trend in apple panel computer future, what also can help the apple panel computer simultaneously purchases and purchases in advance user's intensification to the understanding of apple panel computer, and then makes suitable selection.
Figure 4 shows that by a master controller (be host node n 0), three from controller (namely from node n 1, n 2And n 3) the common distributed system that consists of.According to the ultimate principle of said process and ELM, in distributed system shown in Figure 4, need to carry out following processing.
Present embodiment adopts a kind of Distributed Classification pair microblogging data relevant with panel computer of magnanimity microblogging data perceptual analysis of admiring, and flow process as shown in Figure 5.This flow process begins and step 501.
In step 502, prepare the microblogging training data.According to aforementioned content, the preparation of microblogging training data comprises dual mode, adopts first kind of way in the present embodiment.The original microblogging data that the master controller crawl is relevant with the apple panel computer, original microblogging data contain a plurality of fields, for example, deliver time, utterer, type, access rights, body text content, picture URL, video URL etc.The content of text field of only obtaining in the present embodiment in these raw data gets final product, and is used for emotional orientation analysis.Simultaneously, need artificial mark increase an emotion tendency dimension, i.e. the classification results part of microblogging data is used for the emotion tendency of expression microblogging content, and the emotion tendency with text in the present embodiment is divided into three ranks, agree with, neutral, oppose.What the below listed is 7 microblogging data of advancing artificial Emotion tagging, and master controller is distributed to three from controller with these 7 training datas, and wherein statement 1-2 issues from controller n 1, statement 3-5 sends to from controller n 2, statement 6-7 sends to from controller n 3
From controller n 1The microblogging training dataset:
Statement 1: apple panel computer quality is pretty good, and reaction velocity is enough fast, and feel is also fine.(the emotion tendency of statement 1 is: agree with)
Statement 2: the apple panel computer has been used the section of the having time, function very little, not have legendary so good, too general.(the emotion tendency of statement 2 is: oppose)
From controller n 2The microblogging training dataset:
Statement 3: apple panel computer speed is very fast, and networking is stable, and the game online is all relatively more perfect, praises one! (the emotion tendency of statement 3 is: agree with)
Statement 4: the line of products that the apple panel computer is single and high price, do not know under other adversary's the competition of three magnitudes, how long can also continue.(the emotion tendency of statement 4 is: neutrality)
Statement 5: apple panel computer operating system is uncomfortable, and screen proportion sees that the widescreen film is very not well, the export trouble, and it is very expensive to download software.(the emotion tendency of statement 5 is: oppose)
From controller n 3The microblogging training dataset:
Statement 6: apple panel computer speed is very fast, and resolution is also very high, and application program is quite abundant.(the emotion tendency of statement 6 is: agree with)
Statement 7: apple panel computer fuselage is too heavy, picks up inconvenience, and downloading needs by itunes, and is pretty troublesome! (the emotion tendency of statement 7 is: oppose)
In step 503: master controller is the desired parameters initialization, and sends to all from controller;
Predefined parameter generates in advance at random by master controller, and parameter comprises: the weight vectors w of input node 1, w 2, w 3Threshold value b with hidden node 1, b 2, b 3And these parameters are issued to from node n 1, n 2And n 3, and set the number of hidden nodes L=3.
w 1=(-0.9286,0.3575,-0.2155,0.4121,-0.9077,0.3897)
w 2=(0.6983,0.5155,0.3110,-0.9363,-0.8057,-0.3658)
w 3=(0.8680,0.4863,-0.6576,-0.4462,0.6469,0.9004)
b 1=0.0344
b 2=0.4387
b 3=0.3816
In step 504: each is processed local microblogging data set separately from controller, and result is sent to master controller, is produced the sorter of microblogging data by master controller; Idiographic flow as shown in Figure 6, this flow process starts from step 601.
In step 602, will carry out vectorization with every microblogging training data of classification results part, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
The vectorization of data portion is that data portion is carried out feature extraction.Feature extraction is the basis of emotional orientation analysis, and the quality of feature extraction directly affects the result of emotion tendency prediction.Feature extraction is with the method for shining upon (or conversion) primitive character to be transformed to most representative new feature.Present embodiment is mainly studied commendation emotion word in the text data, derogatory sense emotion word, degree adverb, negative word as the impact of feature on the emotion tendentiousness of text analysis.Lower mask body is introduced:
The emotion word: the emotion word refers to have noun, verb, adjective and some Chinese idioms and the idiom etc. of emotion tendency.The emotion tendency of text is mainly transmitted by the emotion word, and therefore, the emotion word is one of key character of emotion tendentiousness of text analysis and prediction.According to the needs of sentiment analysis, present embodiment is divided into two kinds with the emotion word in the text data, i.e. commendatory term and derogatory term.Commendatory term be part of speech with praise, the sure word of emotion is such as " liking ", " approval ", " appreciation ", " praising ", " praising ", " worshipping ", " fine " etc.Derogatory term: be the meaning of a word with demote, negate, the word of hatred, contempt emotion, such as " detest ", " opposition ", " ignorant ", " gloomy ", " meanness ", " deception " etc.Present embodiment is divided into Three Estate [+3 ,+2 ,+1] with commendation emotion word, and the commendation degree reduces successively, and derogatory sense emotion word also is divided into Three Estate [1 ,-2 ,-3], and the derogatory sense degree raises successively.
The proper vector that the emotion word relates to mainly contains four, is respectively commendatory term word frequency, commendatory term average rank, derogatory term word frequency, derogatory term average rank.Word frequency
Figure BDA00002680815000081
Average rank
Figure BDA00002680815000082
Degree adverb: degree adverb is a kind of of adverbial word, the expression degree.As " very, very, the utmost point, very,, top, too, more, very, extremely, especially, exceptionally, more, more, all the more, a bit, slightly, a little, slightly, almost, too, especially " etc., wherein the word frequency of present embodiment extraction degree adverbial word is as a proper vector.
Negative adverb: negative adverb is a kind of of adverbial word, represents sure, negative.As " not, do not have, do not have, need not (don't), must, must, must, accurate, really, not, not, not, not, whether, needn't, never " etc., wherein present embodiment is extracted the word frequency of negative adverb as a proper vector.
In sum, the Text eigenvector that present embodiment is extracted mainly contains six, is respectively commendatory term word frequency, commendatory term average rank, derogatory term word frequency, derogatory term average rank, degree adverb word frequency and negative adverb word frequency.Simultaneously in the classification results of the microblogging data part, the emotion tendency of text is divided into three ranks, agree with, neutral, oppose, with [+1 ,+2 ,+3] expression.So both can obtain proper vector and part and the classification results part of every microblogging data, concrete form is as follows:
Figure BDA00002680815000091
According to above-mentioned feature extracting method, 7 microblogging data are extracted corresponding vectorization, the result is as follows:
Statement 1: apple panel computer quality is pretty good, and reaction velocity is enough fast, and feel is also fine.The emotion tendency of statement 1 is: agree with)
Statement 1 is analyzed: can be divided into 8 words in the statement 1, wherein commendatory term has " well ", " soon ", " fine " 3, then the commendatory term word frequency of statement 1 is 3/8, the rank of corresponding commendatory term is respectively+and 1, + 2, + 2, then the commendatory term average rank of statement 1 is (1+2+2)/3, do not contain derogatory term in the statement 1, therefore its derogatory term word frequency and average rank are 0, and degree adverb is " very ", and word frequency is 1/8, the word frequency of negative adverb is 0, emotion tendency is for agreeing with, and classification results be+1, so statement 1 can convert (0.375 to after passing through extraction, 1.667,0,0,0.125,0,1).
Use identical method, can obtain the proper vector part of other statement.
Statement 2: the apple panel computer has been used the section of the having time, function very little, not have legendary so good, too general.(the emotion tendency of statement 2 is: oppose)
Statement 2 is analyzed: (0.083,2,0.167 ,-1.5,0.25,0.083,3).
Statement 3: apple panel computer speed is very fast, and networking is stable, and the game online is all relatively more perfect, praises one! (the emotion tendency of statement 3 is: agree with)
Statement 3 is analyzed: (0.333,2.5,0,0,0.25,0,1).
Statement 4: the line of products that the apple panel computer is single and high price, do not know under other adversary's the competition of three magnitudes, how long can also continue.(the emotion tendency of statement 4 is: neutrality)
Statement 4 is analyzed: (0.077,2,0.077 ,-1,0,0,2).
Statement 5: apple panel computer operating system is uncomfortable, and screen proportion sees that the widescreen film is very not well, the export trouble, and it is very expensive to download software.(the emotion tendency of statement 5 is: oppose)
Statement 5 is analyzed: (0,0,0.188 ,-2.333,0.125,0.063,3).
Statement 6: apple panel computer speed is very fast, and resolution is also very high, and application program is quite abundant.(the emotion tendency of statement 6 is: agree with)
Statement 6 is analyzed: (0.273,2.333,0,0,0.273,0,1).
Statement 7: apple panel computer fuselage is too heavy, picks up inconvenience, and downloading needs by itunes, and is pretty troublesome! (the emotion tendency of statement 7 is: oppose)
Statement 7 is analyzed: (0,0,0.154 ,-2.5,0.154,0.077,3).
In step 603, each is peeled off from the microblogging training data of controller after to the vectorization of self part, peels off proper vector part and the classification results part of these data, namely so that each all generates separately local microblogging data set (X from controller i, T i), wherein, X iBe the eigenmatrix of microblogging data set, T iClassification results matrix for the microblogging data set.In distributed environment shown in Figure 4, from controller n 1Training data be:
Statement 1 (0.375,1.667,0,0,0.125,0,1)
Statement 2 (0.083,2,0.167 ,-1.5,0.25,0.083,3)
From controller n 1The microblogging data through the eigenmatrix X of the microblogging training data after peeling off 1With the classification results matrix T 1As follows:
Eigenmatrix
Figure BDA00002680815000101
The classification results matrix T 1 = 1 3
From controller n 2Training data be:
Statement 3 (0.333,2.5,0,0,0.25,0,1)
Statement 4 (0.077,2,0.077 ,-1,0,0,2)
Statement 5 (0,0,0.188 ,-2.333,0.125,0.063,3)
From controller n 2The microblogging data through the microblogging training data eigenmatrix X after peeling off 2With the classification results matrix T 2As follows:
Eigenmatrix
Figure BDA00002680815000103
The classification results matrix T 2 = 1 2 3
From controller n 3Training data be:
Statement 6 (0.273,2.333,0,0,0.273,0,1)
Statement 7 (0,0,0.154 ,-2.5,0.154,0.07,3)
From controller n 3The microblogging data through the microblogging training data eigenmatrix X after peeling off 3With the classification results matrix T 3As follows:
Eigenmatrix
Figure BDA00002680815000105
The classification results matrix T 3 = 1 3
In step 604: each is from controller n iAccording to the parameter w that receives 1, w 2..., w LAnd b 1, b 2..., b L, and local microblogging data set (X i, T i), calculate the required intermediate result of ELM, and intermediate result is submitted to master controller; Wherein, at (X i, T i) in, X iBe the eigenmatrix of microblogging data set, T iBe the classification results matrix of microblogging data set, as shown in Figure 7.
Need to prove, in ELM, for the eigenmatrix X of input data herein iIn each element need to carry out normalization so that X iIn all element all between [1 ,+1], the difference that method for normalizing is chosen can cause the difference of input data.In addition, for excitation function g (w iX i+ b i), ELM provides multiple excitation function for user selection, and the difference of choosing of excitation function equally can be so that intermediate result be different, and then causes the difference of final classification results.In the specific embodiment of the present invention, also be first the vector of these statements to be carried out normalization, then select an activation function, and then try to achieve the required intermediate result of ELM.The below describes from controller three respectively:
For from node n 1:
At step 604-1 from controller n 1The data of processing are statement 1 (0.375,1.667,0,0,0.125,0,1) and statement 2 (0.083,2,0.167 ,-1.5,0.25,0.083,3), and the parameter of reception is w 1, w 2, w 3, b 1, b 2, b 3, normalization and choose excitation function after can get
The hidden layer output matrix H 1 = g ( w 1 · x 1 + b 1 ) g ( w 2 · x 1 + b 2 ) g ( w 3 · x 1 + b 3 ) g ( w 1 · x 2 + b 1 ) g ( w 2 · x 2 + b 2 ) g ( w 3 · x 2 + b 3 ) = 0.5287 0.7409 0.7524 0.5442 0.7244 0.7404 ,
The classification results matrix T 1 = 1 3
At step 604-2, according to H 1, calculate intermediate result U 1, can get U 1 = H 1 T H 1 = 0.5867 0.7932 0.8081 0.7932 1.0737 1.0938 0.8081 1.0938 1.1143 ;
At step 604-3, according to H 1And T 1, calculate intermediate result V 1, can get V 1 = H 1 T T 1 = 2.1913 2.9141 2.9736 , And with intermediate result U 1And V 1Submit to master controller.
For from controller 2:
At step 604-4 from controller n 2The data of processing are statement 3 (0.333,2.5,0,0,0.25,0,1), statement 4 (0.077,2,0.077 ,-1,0,0,2) and statement 5 (0,0,0.188 ,-2.333,0.125,0.063,3), and the parameter of reception is w 1, w 2, w 3, b 1, b 2, b 3, normalization and choose excitation function after can get the hidden layer output matrix
H 2 = g ( w 1 · x 3 + b 1 ) g ( w 2 · x 3 + b 2 ) g ( w 3 · x 3 + b 3 ) g ( w 1 · x 4 + b 1 ) g ( w 2 · x 4 + b 2 ) g ( w 3 · x 4 + b 3 ) g ( w 1 · x 5 + b 1 ) g ( w 2 · x 5 + b 2 ) g ( w 3 · x 5 + b 3 ) = 0.5441 0.7194 0.7388 0.5467 0.7244 0.7163 0.7398 0.7388 0.8114
The classification results matrix T 2 = 1 2 3
Step 604-5 is according to H 2, calculate intermediate result U 2, can get U 2 = H 2 T H 2 = 1.1422 1.3340 1.3961 1.3340 1.5881 1.6521 1.3961 1.6521 1.7222 ;
Step 604-6 is according to H 2And T 2, calculate intermediate result V 2, can get V 2 = H 2 T T 2 = 3.8569 4.3846 4.6146 , And with intermediate result U 2And V 2Submit to master controller.
For from controller 3:
Step 604-7 is from controller n 3The data of processing are statement 6 (0.273,2.333,0,0,0.273,0,1) and statement 7 (0,0,0.154 ,-2.5,0.154,0.07,3), and the parameter of reception is w 1, w 2, w 3, b 1, b 2, b 3, normalization and choose excitation function after can get
The hidden layer output matrix H 3 = g ( w 1 · x 6 + b 1 ) g ( w 2 · x 6 + b 2 ) g ( w 3 · x 6 + b 3 ) g ( w 1 · x 7 + b 1 ) g ( w 2 · x 7 + b 2 ) g ( w 3 · x 7 + b 3 ) = 0.3993 0.7005 0.8426 0.2272 0.6769 0.8216
The classification results matrix T 3 = 1 3
Step 604-8 is according to H 3, calculate intermediate result U 3, can get U 3 = H 3 T H 3 = 0.2111 0.4335 0.5458 0.4335 0.9489 1.2141 0.5458 1.2141 1.5593 ;
Step 604-9 is according to H 3And T 3, calculate intermediate result V 3, can get V 3 = H 3 T T 3 = 1.0809 2.7312 3.6074 , And with intermediate result U 3And V 3Submit to master controller.
In step 605, master controller n 0Receive from controller n 1The U that submits to 1And V 1, receive from controller n 2The U that submits to 2And V 2, receive from controller n 3The U that submits to 3And V 3, and calculate final result, as shown in Figure 8.
Step 605-1 merges the intermediate result U that each is submitted to from controller 1, U 2, U 3, obtain summarized results
U = U 1 + U 2 + U 3 = 1.9400 2.5607 2.7500 2.5607 3.6107 3.9600 2.7500 3.9600 4.3958 ;
Step 605-2 merges the intermediate result V that each is submitted to from controller 1, V 2, V 3, obtain summarized results
V = V 1 + V 2 + V 3 = 7.1390 11.0317 11.1956 ;
Step 605-3 according to the U that gathers and V, calculates the weight vectors parameter beta of output node,
β = ( 1 λ + U ) - 1 V = - 16.8925 9.9534 6.6591 42.3653 - 19.4846 - 23.3897 - 28.1804 10.8984 16.6435
So far, can obtain the weight vectors parameter beta.
At step 605-4, according to the parameter beta that parameter generators obtains, structure can be predicted the sorter of microblogging data emotional orientation analysis, is used for microblogging data to be tested are carried out emotional orientation analysis, and formula is as follows:
f(x)=h(x)β
In step 505: the automatic classification of microblogging data.
The automatic classification of microblogging data mainly contains dual mode, adopt first kind of way in the present embodiment, continue crawl microblogging data by master controller, use the directly classification results of output microblogging data to be sorted of the microblogging data sorter generated, following two are continued the microblogging data to be sorted that grasp and use the result who obtains behind the identical feature extracting method for master controller.
Statement 8: the apple panel computer is given friend, and friend is delithted with, and is very good! Speed, moulding are all fine! Like!
Statement 8 is analyzed: (0.286,2.25,0,0,0.214, unknown classification results).
Statement 9: apple panel computer screen quality is very low, uses also very bothersomely, and cruising time is very poor.
Statement 9 is analyzed: (0,0,0.25 ,-2.333,0.25,0, unknown classification results).
After using same method for normalizing, choosing same excitation function, the classification results of trying to achieve statement 8 is as follows:
Hidden layer output matrix h (x 8)=[g (w 1X 8+ b 1) g (w 2X 8+ b 2) g (w 3X 8+ b 3)]=[0.5467 0.7244 0.7388]
Be brought in the formula of sorter, try to achieve
f(x)=h(x)β=[0.6332-0.6207-1.0061]
For the above results, ELM takes a kind of maximized method to judge the classification results of microblogging data to be predicted, ultimate principle is to judge the dimension at the element place of the maximum in the vector of trying to achieve the result, then tag along sort corresponding to this dimension is the classification results of these data to be predicted, element such as maximum in the sorter Output rusults of statement 8 is 0.6332, corresponding dimension is 1, and the classification results of statement 8 is exactly the classification of label 1 expression so, namely " agrees with ".
The forecasting process of statement 9 is identical with statement 8, is summarized as follows: the classification results of trying to achieve statement 9 is as follows:
Hidden layer output matrix h (x 9)=[g (w 1X 9+ b 1) g (w 2X 9+ b 2) g (w 3X 9+ b 3)]=[0.2222 0.6704 0.9174]
Be brought in the formula of sorter, try to achieve
f(x)=h(x)β=[-1.2055?-0.8521?1.0684]
Element maximum in the sorter Output rusults of statement 9 is 1.0684, and corresponding dimension is 3, and the classification results of statement 9 is exactly the classification of label 3 expressions so, i.e. " opposition ".
When test data is statement 8 and statement 9, use the microblogging data sorter that has generated, the emotion tendency that obtains statement 8 and statement 9 that can be correct can be classified to microblogging data to be sorted accurately.
Except the emotion tendency of analyzing the microblogging data, the present invention also can be used for analyzing in numerous application such as box office receipts, song clicking rate, finance product recommendation, stock analysis, equipment performance, hot news event analysis, public opinion analysis.
Although more than described the specific embodiment of the present invention, the those skilled in the art in this area should be appreciated that these only illustrate, and can make numerous variations or modification to these embodiments, and not deviate from principle of the present invention and essence.Scope of the present invention is only limited by appended claims.

Claims (5)

1. the distributed sorter of magnanimity microblogging data, adopt distributed frame, it is characterized in that: comprise that a master controller and at least one are from controller, and each is all interconnected with master controller from controller, master controller intercoms from controller mutually with each, and all are from separate between controller;
Describedly comprise from controller:
To measuring device: be used for and convert the form of vector representation from controller to every microblogging training data of classification results, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
Stripper: the eigenvectors matrix X that is used for peeling off all microblogging data of the microblogging data training set after processing to measuring device iWith the classification results matrix T i
Converter: the principle of limit of utilization learning machine ELM is used for the eigenvectors matrix X that stripper is extracted iConvert the hidden layer output matrix H among the ELM to i
The preceding paragraph counter: the principle of limit of utilization learning machine ELM is used for according to hidden layer output matrix H i, calculate intermediate result H i TH i, and submit to master controller;
Consequent counter: the principle of limit of utilization learning machine ELM is used for according to hidden layer output matrix H iWith microblogging data centralization classification results matrix T i, calculate intermediate result H i TT i, and submit to master controller;
Described master controller comprises:
Preceding paragraph totalizer: be used for merging the intermediate result H that each is submitted to from controller i TH i, obtain summarized results H TH;
Consequent totalizer: be used for merging the intermediate result H that each is submitted to from controller i TT i, obtain summarized results H TT;
Parameter generators: the principle of limit of utilization learning machine ELM, be used for the result according to the preceding paragraph totalizer that gathers and the output of consequent totalizer, calculate the weight vectors parameter beta of output node;
Taxonomy generator: the parameter beta that obtains according to parameter generators makes up the sorter of microblogging data, is used for microblogging data to be tested are classified.
2. the distributed sorter of magnanimity microblogging data as claimed in claim 1, it is characterized in that: described each intermediate result of self being processed, be used for generating final microblogging data sorter from controller sends to master controller, after master controller receives all intermediate results of sending from controller, according to the principle of ELM, obtain final microblogging data sorter.
3. the Distributed Classification of magnanimity microblogging data adopts the distributed sorter of magnanimity microblogging data claimed in claim 1 to realize, it is characterized in that: may further comprise the steps:
Step 1: the preparation of microblogging training dataset;
The preparation of microblogging training dataset comprises grasping original microblogging data and manually the microblogging data being marked two parts; Adopt following dual mode: first kind of way is to be grasped the original microblogging data of required processing by master controller, and manually mark for each bar training data, then the classification results that represents these microblogging data arrives these microblogging data allocations accordingly from controller; The second way is to be communicated by letter from controller with each by master controller, notify the information of each microblogging data that need to grasp from controller, each grasps original microblogging data from controller self, and for the original microblogging data that self grasp manually mark, represent the classification results of these microblogging data;
Step 2: master controller is the desired parameters initialization, and sends to all from controller;
The principle of limit of utilization learning machine ELM generates parameter in advance at random by master controller, comprising: the weight vectors w of hidden node number L, input node 1, w 2..., w L, hidden node side-play amount b 1, b 2..., b L, and these parameters are sent to all from controller;
Step 3: each is processed local microblogging data set separately from controller, and result is sent to master controller, generates the microblogging data sorter by master controller;
Step 3-1: microblogging data vector;
To carry out vectorization with every microblogging training data of classification results part, comprising the proper vector x of the data division of every microblogging data iWith classification results part t i
Step 3-2: the peeling off of microblogging data;
For each microblogging data set through feature extraction from controller microblogging data training set, peel off proper vector part and the classification results part of these data, form each from the eigenvectors matrix X of the microblogging data training set of controller iWith the classification results matrix T i, namely so that each all generates separately local microblogging data set (X from controller i, T i), wherein, X iBe the eigenmatrix of microblogging data set, T iClassification results matrix for the microblogging data set;
Step 3-3: each generates intermediate result from controller basis local microblogging data set separately, and sends to master controller;
Step 3-4: master controller receives and gathers each from the intermediate result of controller; According to the Computing Principle of the intermediate result that gathers according to ELM, calculate the weight vectors parameter beta of output node, and then try to achieve the microblogging data sorter;
Step 4: the automatic classification of microblogging data
The automatic classification of microblogging data can be taked dual mode: first kind of way is that master controller continues crawl microblogging data, the microblogging data sorter that uses step 3 to generate is directly exported the classification results of microblogging data to be sorted, the second is that master controller sends to each from controller with the microblogging data sorter that step 3 generates, then each uses sorter that the microblogging data to be sorted of self are classified from controller, tries to achieve classification results.
4. the Distributed Classification of magnanimity microblogging data according to claim 3 is characterized in that: step 3-3 described each generate intermediate result from controller according to separately local microblogging data set, and send to master controller, specific as follows:
Each is from controller n iWeight vectors w according to the input node that receives 1, w 2..., w LThreshold value b with i hidden node 1, b 2..., b L, and local microblogging training dataset (X i, T i), calculate and make up the required intermediate result of sorter, and intermediate result is submitted to master controller;
Step 3-3-1: with the eigenmatrix X of local microblogging data set iBe converted into the hidden layer output matrix H of ELM i
Step 3-3-2: according to hidden layer output matrix H i, calculate intermediate result U i=H i TH i
Step 3-3-3: according to hidden layer output matrix H iClassification results matrix T with local training dataset i, calculate intermediate result V i=H i TT i
5. the Distributed Classification of magnanimity microblogging data according to claim 3 is characterized in that: the weight vectors parameter beta of the described calculating output node of step 3-4, specific as follows:
Step 3-4-1: merge the intermediate result U that each is submitted to from controller i, obtain summarized results U=∑ U i=∑ H i TH i=H TH;
Step 3-4-2: merge the intermediate result V that each is submitted to from controller i, obtain summarized results V=∑ V i=∑ H i TT i=H TT;
Step 3-4-3: the weight vectors parameter beta of calculating output node according to the U that gathers and V:
β = ( I λ + H T H ) - 1 H T T = ( I λ + U ) - 1 V
Wherein, I is unit matrix, and λ is the parameter of user's appointment, () -1It is matrix inversion operation;
And then the formula of definite microblogging data sorter is as follows:
f(x)=h(x)β
In the formula, the classification results of f (x) expression microblogging data to be sorted, the hidden layer output vector of h (x) expression microblogging data to be sorted.
CN201210583886.8A 2012-12-28 2012-12-28 A kind of distributed sorter of massive micro-blog data and method Active CN103020712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210583886.8A CN103020712B (en) 2012-12-28 2012-12-28 A kind of distributed sorter of massive micro-blog data and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210583886.8A CN103020712B (en) 2012-12-28 2012-12-28 A kind of distributed sorter of massive micro-blog data and method

Publications (2)

Publication Number Publication Date
CN103020712A true CN103020712A (en) 2013-04-03
CN103020712B CN103020712B (en) 2015-10-28

Family

ID=47969298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210583886.8A Active CN103020712B (en) 2012-12-28 2012-12-28 A kind of distributed sorter of massive micro-blog data and method

Country Status (1)

Country Link
CN (1) CN103020712B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593462A (en) * 2013-11-25 2014-02-19 中国科学院深圳先进技术研究院 Microblog-data-oriented flu epidemic surveillance analysis method and system
CN105760899A (en) * 2016-03-31 2016-07-13 大连楼兰科技股份有限公司 Adboost training learning method and device based on distributed computation and detection cost ordering
WO2017133568A1 (en) * 2016-02-05 2017-08-10 阿里巴巴集团控股有限公司 Mining method and device for target characteristic data
CN107590134A (en) * 2017-10-26 2018-01-16 福建亿榕信息技术有限公司 Text sentiment classification method, storage medium and computer
CN109034366A (en) * 2018-07-18 2018-12-18 北京化工大学 Application based on the ELM integrated model of more activation primitives in chemical engineering modeling
CN109657061A (en) * 2018-12-21 2019-04-19 合肥工业大学 A kind of Ensemble classifier method for the more word short texts of magnanimity
CN110381456A (en) * 2019-07-19 2019-10-25 珠海格力电器股份有限公司 Flow management system, flow threshold calculation method and air conditioning system
CN113177163A (en) * 2021-04-28 2021-07-27 烟台中科网络技术研究所 Method, system and storage medium for social dynamic information sentiment analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1185796A (en) * 1997-09-01 1999-03-30 Canon Inc Automatic document classification device, learning device, classification device, automatic document classification method, learning method, classification method and storage medium
US20120189194A1 (en) * 2011-01-26 2012-07-26 Microsoft Corporation Mitigating use of machine solvable hips
CN102789498A (en) * 2012-07-16 2012-11-21 钱钢 Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1185796A (en) * 1997-09-01 1999-03-30 Canon Inc Automatic document classification device, learning device, classification device, automatic document classification method, learning method, classification method and storage medium
US20120189194A1 (en) * 2011-01-26 2012-07-26 Microsoft Corporation Mitigating use of machine solvable hips
CN102789498A (en) * 2012-07-16 2012-11-21 钱钢 Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HUANG GUANGBIN ET AL: "Extreme Learning Machine for Regression and Multiclass Classification", 《IEEE TRANSACTIONS ON SYSTEMS,MAN AND CYBERNETICS-PARTB》 *
王磊等: "基于二叉级联结构的并行极速学习机算法", 《吉林大学学报(信息科学版)》 *
赵相国等: "基于ELM的蛋白质二级结构预测及其后处理", 《东北大学学报(自然科学版)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593462A (en) * 2013-11-25 2014-02-19 中国科学院深圳先进技术研究院 Microblog-data-oriented flu epidemic surveillance analysis method and system
CN103593462B (en) * 2013-11-25 2017-02-15 中国科学院深圳先进技术研究院 Microblog-data-oriented flu epidemic surveillance analysis method and system
WO2017133568A1 (en) * 2016-02-05 2017-08-10 阿里巴巴集团控股有限公司 Mining method and device for target characteristic data
CN105760899A (en) * 2016-03-31 2016-07-13 大连楼兰科技股份有限公司 Adboost training learning method and device based on distributed computation and detection cost ordering
CN105760899B (en) * 2016-03-31 2019-04-05 大连楼兰科技股份有限公司 Training learning method and device based on distributed computing and detection cost sequence
CN107590134A (en) * 2017-10-26 2018-01-16 福建亿榕信息技术有限公司 Text sentiment classification method, storage medium and computer
CN109034366A (en) * 2018-07-18 2018-12-18 北京化工大学 Application based on the ELM integrated model of more activation primitives in chemical engineering modeling
CN109657061A (en) * 2018-12-21 2019-04-19 合肥工业大学 A kind of Ensemble classifier method for the more word short texts of magnanimity
CN109657061B (en) * 2018-12-21 2020-11-27 合肥工业大学 Integrated classification method for massive multi-word short texts
CN110381456A (en) * 2019-07-19 2019-10-25 珠海格力电器股份有限公司 Flow management system, flow threshold calculation method and air conditioning system
CN113177163A (en) * 2021-04-28 2021-07-27 烟台中科网络技术研究所 Method, system and storage medium for social dynamic information sentiment analysis

Also Published As

Publication number Publication date
CN103020712B (en) 2015-10-28

Similar Documents

Publication Publication Date Title
CN103020712A (en) Distributed classification device and distributed classification method for massive micro-blog data
Koncel-Kedziorski et al. Text generation from knowledge graphs with graph transformers
Shi et al. WE-LDA: a word embeddings augmented LDA model for web services clustering
CN102495860B (en) Expert recommendation method based on language model
CN103853824B (en) In-text advertisement releasing method and system based on deep semantic mining
Wu et al. Neural news recommendation with heterogeneous user behavior
Li et al. Knowledge-grounded dialogue generation with a unified knowledge representation
CN101354714B (en) Method for recommending problem based on probability latent semantic analysis
CN102831119B (en) Short text clustering Apparatus and method for
CN111222332A (en) Commodity recommendation method combining attention network and user emotion
CN106897914A (en) A kind of Method of Commodity Recommendation and system based on topic model
CN109993583A (en) Information-pushing method and device, storage medium and electronic device
CN110210933A (en) A kind of enigmatic language justice recommended method based on generation confrontation network
Yin et al. Ranking products through online reviews considering the mass assignment of features based on BERT and q-rung orthopair fuzzy set theory
CN103729431A (en) Massive microblog data distributed classification device and method with increment and decrement function
Pathan et al. Unsupervised aspect extraction algorithm for opinion mining using topic modeling
Huang et al. Sentiment analysis in e-commerce platforms: A review of current techniques and future directions
CN109902273A (en) The modeling method and device of keyword generation model
Perez-Castro et al. Efficiency of automatic text generators for online review content generation
Zong et al. Double sparse learning model for speech emotion recognition
Zhang et al. Local-global graph pooling via mutual information maximization for video-paragraph retrieval
Chakraborty et al. LSTM-ANN based price hike sentiment analysis from Bangla social media comments
Jangra et al. Semantic extractor-paraphraser based abstractive summarization
Wang et al. The application of factorization machines in user behavior prediction
Tran et al. Sentiment classification for beauty-fashion reviews

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220324

Address after: 100081 No. 5 South Main Street, Haidian District, Beijing, Zhongguancun

Patentee after: BEIJING INSTITUTE OF TECHNOLOGY

Address before: 110819 No. 3 lane, Heping Road, Heping District, Shenyang, Liaoning 11

Patentee before: Northeastern University