CN109325112B - A kind of across language sentiment analysis method and apparatus based on emoji - Google Patents

A kind of across language sentiment analysis method and apparatus based on emoji Download PDF

Info

Publication number
CN109325112B
CN109325112B CN201810678889.7A CN201810678889A CN109325112B CN 109325112 B CN109325112 B CN 109325112B CN 201810678889 A CN201810678889 A CN 201810678889A CN 109325112 B CN109325112 B CN 109325112B
Authority
CN
China
Prior art keywords
text
language
emoji
vector
characterization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810678889.7A
Other languages
Chinese (zh)
Other versions
CN109325112A (en
Inventor
刘譞哲
陈震鹏
沈晟
陆璇
马郓
黄罡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201810678889.7A priority Critical patent/CN109325112B/en
Publication of CN109325112A publication Critical patent/CN109325112A/en
Application granted granted Critical
Publication of CN109325112B publication Critical patent/CN109325112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

Across the language sentiment analysis method and apparatus based on emoji that the present invention relates to a kind of.This method comprises: 1) the unmarked text creation term vector of a large amount of source language and the target language based on collection;2) text in unmarked text comprising emoji is selected based on term vector, emoji prediction task is established by the inclusion of the text of emoji, to obtain a characterization model;3) the original language corpus of labeled feeling polarities is translated into object language, is characterized using the document that sentence characterization model obtains original text and translates obtained text, then characterizes training sentiment classification model using document;4) sentiment classification model obtained using training is carried out emotional semantic classification to the new text of object language, obtains its feeling polarities.The present invention realizes across language sentiment analysis using the emoji text easily climbed in social platform, can alleviate that markup resources are rare, the unbalanced problem of markup resources in different language.

Description

A kind of across language sentiment analysis method and apparatus based on emoji
Technical field
The present invention is a kind of across language sentiment analysis method and apparatus based on emoji, belongs to software technology field.
Background technique
In recent years, with the development of internet, a large amount of user has been emerged on network and has generated text, such as blog, micro- Rich, forum's discussion, comment etc..It is emerging that a large amount of user's generation text has caused the research that researcher carries out automatic sentiment analysis to it Interest.Since at the beginning of 2000, sentiment analysis has become most popular one of the research topic of natural language processing field, and wide It is general to be applied to the research fields such as Web excavation, data mining, information retrieval, general fit calculation and human-computer interaction.Researcher is for emotion The enthusiasm of analysis work is largely attributed to the fact that its higher practical application value.Sentiment analysis technology has been applied to client Feedback and tracking, sales forecast, product ranking, Stock Market Forecasting, opinion integration, election prediction etc. many real scenes, and generate compared with Big actual benefit.
But the research of majority sentiment analysis is all carried out on English text at present.This present Research is largely Because the work of early stage sentiment analysis is mainly carried out by the researcher for the country that English is mother tongue.These researchs, which provide, some has mark The corpus and benchmark dataset of note carry out later period research for researcher and provide convenience.Further, researchers open Beginning focuses in English text research, the stagnation to work so as to cause sentiment analysis on other language.However, according to system Meter, only 25.3% Internet user use English (https: //www.internetworldstats.com/ stats7.html).This shows that other language also possess huge user group, carries out sentiment analysis work on other language It is same most important.Such present Research promotes a collection of researcher to start to carry out across language sentiment analysis research.The research purport A kind of universal model is being trained using the labeled data in resourceful language (i.e. original language is often referred to English), it should be across Language sentiment analysis model equally can to labeled data resource not plentiful language (i.e. object language, such as Japanese) text carry out Emotional semantic classification.
Key across language sentiment analysis is to search out the vocabulary that can be connected between an original language and object language letter The medium of ditch.The parallel text of most mainstream work selection source language and the target language is as this medium.Parallel text This is i.e. for same semanteme, macaronic different text expression.The generation of parallel language is highly dependent on machine translation skill Art.But current translation technology often loses the emotion information in prototype statement in translation process, gives across language sentiment analysis Cause difficulty.For example, " blacksheep " in English is often used for referring to " blacksheep ", but is translating into Japanese shown in Fig. 1 Afterwards, the semantic information (sheep of black) for only remaining original English is lost the emotion meaning of satire.In addition, though original language (English) has a data volume relative abundance of the resource of mark compared with other language, but in fact, these data of today deep Still limit to very much in face of degree learning algorithm, can not often learn the vector characterization of words and phrases out well.Therefore, it is badly in need of finding one The new mode of learning of the missing of the problem of appearance emotion in translation process is lost and flag data can be alleviated. One kind possible solution to be remote supervisory study.Remote supervisory learning art needs researcher Manual definition regular next life At weak label data, the knot close to the data training using authentic signature is reached by the study to a large amount of weak label datas Fruit.
Summary of the invention
Across language sentiment analysis technical field at present there are aiming at the problem that, the purpose of the present invention is based on the wide of emoji It is general to solve the method and apparatus across language sentiment analysis using the semi-supervised representative learning frame of one kind is provided.
For across language emotional semantic classification problem, the weak labeling requirement of selection meets two characteristics.On the one hand, the labeling requirement It is all widely used in each language.On the other hand, which can implicitly reveal out emotion information.Such selection criteria Under, the present invention uses emoji (emoticon) as weak label.Emoji is because it does not have aphasis and can be used to express difference The speciality of emotion is widely used by the user of different genders and country, can be as this Chinese real feelings of each language Weak label.Therefore, across the language sentiment analysis representative learning method based on emoji that the invention proposes a kind of, it is intended to utilize The resource of original language (English) trains the model of the text emotion of energy class object language.
The technical solution adopted by the invention is as follows:
A kind of across language sentiment analysis method based on emoji, key step are as follows:
1. the unsupervised learning stage: the unmarked text creation word of a large amount of source language and the target language based on collection to Amount;
2. remote supervisory learns the stage: the term vector based on creation, the text in unmarked text comprising emoji is selected, Emoji is established by these texts and predicts task, to obtain a characterization model;
3. the supervised learning stage: the original language corpus of labeled feeling polarities being translated into object language, using original Sentence characterization model obtains original text and translates the document characterization of obtained text, then characterizes one feelings of training using these documents Feel disaggregated model;
4. the emotional semantic classification stage: the sentiment classification model obtained using training carries out emotion to the new text of object language Classification, obtains its feeling polarities.
Fig. 2 is the flow chart of the above method.The specific technical solution of above-mentioned steps is as follows:
1. the unsupervised learning stage
At this stage, it has used and has pushed away the method for Te Wenben and Word2Vec on a large scale to train to obtain term vector. These texts can be acquired by Twitter API (https: //developer.twitter.com/).Traditional Although One-hot representation method can distinguish each word, that is still discretely indicated, can not be established between word and word Semantic relation, to increase the difficulty of later period text-processing task.In order to solve this problem, present invention uses Each word is encoded in continuous vector space by the mechanism of Word2Vec term vector by model training.The process is only It is the semantic information for capturing word using unlabelled corpus and being characterized, therefore is unsupervised.In the process of specific implementation In, the term vector model parameter that pre-training obtains is first passed through to initialize the characterization of the term vector in general frame part, and is being connect The remote supervisory study stage got off carries out the adjusting and optimizing of parameter, in fixed relevant parameter of last supervised learning stage.
2. remote supervisory learns the stage
The characterization (i.e. term vector) of word level based on the creation of unsupervised learning stage, the present invention devise one and are based on The prediction task of emoji come include simultaneously text semantic and emotion information characterization mechanism.In prediction emoji task, use The sentence of identical emoji can similarly be characterized in vector space.Emoji prediction model is further illustrated in Fig. 3 Frame structure.The text code of sentence surface is wherein carried out using two two-way LSTM layers and one Attention layers.This In, by using the mechanism of Skip-connection (great-jump-forward transmitting), so that Attention layers of input is term vector layer In addition two LSTM layers of output, to realize information without hindrance transmitting in entire model.Finally, Attention layers of output For Softmax layers of classification.
Next, LSTM layers two-way, Attention layers and Softmax layers will be introduced respectively.
Two-way shot and long term memory network (Bi-LSTM) layer: each training sample can be expressed as (x, e), wherein x=[d1, d2,…,dL], it indicates to remove the corresponding term vector sequence of text after emoji, and e is corresponded in text and is included originally emoji.In step t, LSTM carries out the calculating of nodes state according to following formula:
i(t)=δ (Uix(t)+Wih(t-1)+bi),
f(t)=δ (Ufx(t)+Wfh(t-1)+bf),
o(t)=δ (Uox(t)+Woh(t-1)+bo),
c(t)=ft⊙c(t-1)+i(t)⊙tanh(Ucx(t)+Wch(t-1)+bc),
h(t)=o(t)⊙tanh(c(t)),
Wherein x(t),i(t),f(t),o(t),c(t)And h(t)Respectively indicate input vector of the LSTM at step t, input gate-shaped State forgets door state, output door state, internal storage location state and hiding layer state.W, U, b respectively represent recirculating network structure Parameter, the parameter of input structure and bias term parameter.Symbol ⊙ indicates element product.Corresponding step can be obtained according to the output of model The word sequence of each sentence characterizes vector under rapid t.
Further, in order to obtain the contextual information in past relevant with each word and future usage, use is two-way LSTM to encode word sequence.The characterization vector of i-th of element of the obtained word sequence of forward and backward LSTM is directly connected to Obtain final characterization hi.Specific formula for calculation is as follows:
The characterization vector h obtained in this wayiThe forward and backward context letter of corresponding i-th of word has been captured simultaneously Breath.
Attention layers: due to what is be previously mentioned, term vector layer, forward direction being connected to by Skip-connection LSTM layers, it is backward LSTM layers as Attention layers of input vector, input i-th of word in sentence thus can be as follows It is characterized as ui:
ui=[di,hi1,hi2],
D in above formulai, hi1And hi2Respectively indicate i-th of word term vector layer, LSTM layers of forward direction, after in LSTM layers Characterization.The task and emotional semantic classification of prediction emoji are served the same role due to not being each word, present invention introduces Attention mechanism determines each word in the importance of current generation representative learning process.I-th word Attention layers of score can be calculated according to following formula:
Wherein WaFor Attention layers of parameter matrix, and each sentence may be expressed as one group of word sequence, further It is characterized as the weighted average of each vocabulary sign after connection in word sequence, and weight therein is that above formula is calculated Attention value.Specifically, the characterization of each sentence is following form, and wherein L is word included in sentence Number:
Softmax layers: being then transferred in Softmax layers from Attention layers of obtained sentence characterization, by Softmax Corresponding probability vector Y will be returned after layer.The corresponding sentence of each element representation of probability vector Y includes that some is specific The probability of emoji.Specifically, i-th of element of probability vector can be calculated as follows:
Wherein, T representing matrix transposition, wiIndicate i-th of weight parameter, biIndicate i-th of bias term parameter, K indicates probability The dimension of vector.After obtaining the corresponding probability vector of each sentence, uses cross entropy as loss function, declined using gradient Mode parameter is updated, minimize the prediction error of model.In above-mentioned remote supervisory study and unsupervised learning pair After the adjustment of parameter, the vector characterization of each sentence can be extracted from Attention layers of output.
And since the data volume in supervised learning stage later is limited, in order to avoid the excessively huge bring mistake of model parameter Fitting problems, during the vector table of the final document level of training sign will the sub- characterization model of fixed sentence, corresponding parameter will It will not be adjusted again.
3. the supervised learning stage
After the remote supervisory study stage, there is identical semantic and emotion information sentence will be in table in each language It is mapped in the vector space of sign close.And finally wish to solve the problems, such as be document across language sentiment analysis, therefore there is still a need for Method that is a kind of compact and catching effective information characterizes document.In each document, different sentences is to entire document Emotional expression has different degrees of importance.Therefore, equally each to polymerize using the Attention mechanism of document level Different sentences in document.Here each document characterization is denoted as r, the sentence characterization in document is denoted as v, passes through following formula R is calculated:
Wherein WbFor Attention layers of weight matrix, and βiFor the Attention value of i-th of sentence in document.It uses Google translates each original language sample x ∈ LSTranslate into object language, and text after being translated using identical method Vector characterization.There is mark English text x for eachsAnd its corresponding cypher text xt, it is assumed that it passes through above-mentioned Attention The vector obtained after layer is characterized as rsAnd rt, it is directly connected to obtain r in the stage of supervised learningc=[rs,rt], and will obtain RcAs last Softmax layers of input, and minimizes the intersection entropy loss between neural network forecast result and true tag and come more New corresponding network parameter.
Accordingly with above method, the present invention also provides a kind of across the language sentiment analysis device based on emoji, packet It includes:
Unsupervised learning module, be responsible for a large amount of source language and the target language based on collection unmarked text creation word to Amount;
Remote supervisory study module is responsible for being selected the text in unmarked text comprising emoji based on the term vector, be led to It crosses the text comprising emoji and establishes emoji prediction task, to obtain a characterization model;
Supervised learning module is responsible for the original language corpus of labeled feeling polarities translating into object language, using described Sentence characterization model obtains original text and translates the document characterization of obtained text, then characterizes training emotion using the document and divides Class model;
Emotional semantic classification module, is responsible for using the obtained sentiment classification model of training, to the new text of object language into Row emotional semantic classification obtains its feeling polarities.
Compared with prior art, the positive effect of the present invention are as follows:
The present invention alleviates that markup resources are rare, in different language using the emoji text easily climbed in social platform The unbalanced problem of markup resources.Specifically, because being remote supervisory study, use emoji as the weak label of emotion, because The demand of this corpus for manually marking is less.In addition, because emoji be widely used in each language, with emoji come The weak label in across language sentiment analysis is made, it is pervasive to each language.
Detailed description of the invention
Fig. 1 is Google's translation sample schematic diagram.
Fig. 2 is the flow chart of the method for the present invention.
Fig. 3 is remote supervisory learning framework figure.
Fig. 4 is the exemplary diagram for extracting sample in text from pushing away comprising multiple emoji.
Specific embodiment
Below with across the language analysis task (https: //www.uni-weimar.de/en/ of classical Amazon comment Media/chairs/computer-science-department/webis/data/corp us-webis-cls-10/) come into One step illustrates and verifies method of the invention.The task is right using Japanese, French, German as object language using English as original language In each language, comprising data, DVD, three fields of music sentiment analysis task.Because of its representativeness, task conduct always Benchmark dataset of the sphere of learning in across language sentiment analysis field.In order to verify method of the invention on the data set, press The training of following steps implementation model.
Firstly, having crawled English, Japanese, French, German on spy from pushing away and pushing away text, and as follows pre-processed:
1) what removal forwarded pushes away text, to guarantee that every words appear in its original context;
2) removal pushes away text comprising URL, to guarantee that the emotion of every words only depends on the semanteme of itself, independent of outside Resource;
3) word cutting has been carried out for all texts that pushes away, and has switched to lowercase.Since Japanese is not by space-separated Word, the present embodiment have selected this tokenizer of MeCab (http://taku910.github.io/mecab) to come to Japanese Individually processing;
4) it for pushing away@and number in text, is substituted with unified spcial character;
5) there is the word of redundancy letter to be restored to their original appearance those, for example, by " cooool " and " cooooooool " all switchs to " cool ".
Word2Vec is used for these pretreated texts and obtains the characterization of each word in source language and the target language.
The text comprising emoji is extracted in text next, pushing away from these.For each language, pushes away in text and be extracted from it 64 kinds of most emoji, then filter out the sentence without these emoji.In addition, may include more in some sentences A emoji.Text is pushed away for each, all creates a sample for every kind of emoji wherein included.For example, (a) figure institute in Fig. 4 The sentence shown can derive two samples shown in (b) figure, (c) figure.The emoji sample obtained using these establishes emoji Prediction task is respectively trained to obtain a characterization model for source language and the target language.
Finally, the English corpus with affective tag is translated as object language, parallel text is constituted.By parallel text point The sentence characterization that every words in text are obtained in the sentence characterization model of corresponding language obtained in the previous step is not lost, for training supervision Learning model.
Obtained supervised learning model can be used for the emotion of the text of class object language.Following table 1 illustrates this hair Classification accuracy of the bright method in 9 tasks of Amazon benchmark dataset.
Classification accuracy (%) of 1. the method for the present invention of table in 9 tasks of Amazon benchmark dataset
In the present invention, the unsupervised learning stage obtains during term vector may be used also other than using Word2Vec algorithm To use other classic algorithms, such as GloVe algorithm.The remote supervisory study stage encodes text in addition to using two-way LSTM layers Outside, CNN (convolutional neural networks) model also can be used.In addition, the number of plies of two-way LSTM can also adjust.
Another embodiment of the present invention provides a kind of across the language sentiment analysis device based on Emoji comprising:
Unsupervised learning module, be responsible for a large amount of source language and the target language based on collection unmarked text creation word to Amount;
Remote supervisory study module is responsible for being selected the text in unmarked text comprising emoji based on the term vector, be led to It crosses the text comprising emoji and establishes emoji prediction task, to obtain a characterization model;
Supervised learning module is responsible for the original language corpus of labeled feeling polarities translating into object language, using described Sentence characterization model obtains original text and translates the document characterization of obtained text, then characterizes training emotion using the document and divides Class model;
Emotional semantic classification module, is responsible for using the obtained sentiment classification model of training, to the new text of object language into Row emotional semantic classification obtains its feeling polarities.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims (10)

1. a kind of across language sentiment analysis method based on emoji, which comprises the following steps:
1) the unmarked text creation term vector of a large amount of source language and the target language based on collection;
2) text in unmarked text comprising emoji is selected based on the term vector, is built by the text comprising emoji Vertical emoji predicts task, to obtain a characterization model;
3) the original language corpus of labeled feeling polarities is translated into object language, obtains original text using the sentence characterization model The document characterization of the text obtained with translation, then characterizes training sentiment classification model using the document;
4) sentiment classification model obtained using training is carried out emotional semantic classification to the new text of object language, obtains its feelings Feel polarity.
2. making in this stage the method according to claim 1, wherein step 1) is the unsupervised learning stage It trains to obtain term vector with Te Wenben and Word2Vec method is pushed away on a large scale.
3. being predicted the method according to claim 1, wherein step 2) is that remote supervisory learns the stage in emoji It is similarly characterized using the sentence of identical emoji in vector space in task;The emoji prediction task is double using two The text code of sentence surface is carried out to LSTM layers and one Attention layers, and by using Skip-connection's Mechanism, so that Attention layers of input is the output that term vector layer adds two LSTM layers, to realize information in entire model In without hindrance transmitting, last Attention layers of output is used for Softmax layers of classification.
4. according to the method described in claim 3, it is characterized in that, described two-way LSTM layers carry out in network according to following formula The calculating of node state:
i(t)=δ (Uix(t)+Wih(t-1)+bi),
f(t)=δ (Ufx(t)+Wfh(t-1)+bf),
o(t)=δ (Uox(t)+Woh(t-1)+bo),
c(t)=ft⊙c(t-1)+i(t)⊙tanh(Ucx(t)+Wch(t-1)+bc),
h(t)=o(t)⊙tanh(c(t)),
Wherein, x(t),i(t),f(t),o(t),c(t)And h(t)Respectively indicate input vector of the LSTM at step t, input door state, Forget door state, output door state, internal storage location state and hiding layer state;W, U, b respectively represent the ginseng of recirculating network structure Number, the parameter of input structure and bias term parameter;Symbol ⊙ indicates element product.
5. according to the method described in claim 4, it is characterized in that, described two-way LSTM layers obtain forward and backward LSTM The characterization vector of i-th of element of word sequence is directly connected to obtain final characterization vector hi, make to characterize vector hiIt captures simultaneously The forward and backward contextual information of corresponding i-th of word.
6. according to the method described in claim 5, it is characterized in that, described Attention layers is determined using Attention mechanism Each word is in the importance of current generation representative learning process, and the Attention layer score of i-th of word is according to following formula It is calculated:
Wherein, WaFor Attention layers of parameter matrix;uiFor the table of i-th of word in Attention layers of input sentence Levy vector, ui=[di,hi1,hi2], di, hi1And hi2Respectively indicate i-th of word term vector layer, LSTM layers of forward direction, after to Characterization in LSTM layers;L is word number included in sentence.
7. according to the method described in claim 6, it is characterized in that, what the Softmax layers of basis was obtained from Attention layers Sentence characterization, obtains corresponding probability vector Y;The corresponding sentence of each element representation of the probability vector Y includes some The probability of specific emoji;After obtaining the corresponding probability vector of each sentence, uses cross entropy as loss function, use gradient The mode of decline is updated parameter, minimizes the prediction error of model.
8. the method according to the description of claim 7 is characterized in that i-th of element of the probability vector Y is according to following formula It is calculated:
Wherein, T representing matrix transposition, wiIndicate i-th of weight parameter, biIndicate i-th of bias term parameter, K indicates probability vector Dimension;
Wherein,It is the characterization of each sentence.
9. the method according to claim 1, wherein step 3) is the supervised learning stage, using document level Attention layers sub come the different sentences polymerizeing in each document;Each document characterization is denoted as r, the sentence characterization in document It is denoted as v, r is calculated by following formula:
Wherein WbFor Attention layers of weight matrix, βiFor the Attention value of i-th of sentence in document;By each source language Say sample x ∈ LSTranslate into object language, and after being translated text vector characterization;There is mark English text x for eachs And its corresponding cypher text xt, it is assumed that it is characterized as r by the vector obtained after above-mentioned Attention layersAnd rt, supervising The stage of study is directly connected to obtain rc=[rs,rt], the r that will be obtainedcAs last Softmax layers of input, and it is minimum Change the intersection entropy loss between neural network forecast result and true tag to update corresponding network parameter.
10. a kind of across language sentiment analysis device based on emoji characterized by comprising
Unsupervised learning module is responsible for the unmarked text creation term vector of a large amount of source language and the target language based on collection;
Remote supervisory study module is responsible for being selected the text in unmarked text comprising emoji based on the term vector, passes through institute It states the text comprising emoji and establishes emoji prediction task, to obtain a characterization model;
Supervised learning module is responsible for the original language corpus of labeled feeling polarities translating into object language, utilizes the sentence table Sign model obtains original text and translates the document characterization of obtained text, then characterizes training emotional semantic classification mould using the document Type;
Emotional semantic classification module is responsible for the sentiment classification model obtained using training, carries out feelings to the new text of object language Sense classification, obtains its feeling polarities.
CN201810678889.7A 2018-06-27 2018-06-27 A kind of across language sentiment analysis method and apparatus based on emoji Active CN109325112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810678889.7A CN109325112B (en) 2018-06-27 2018-06-27 A kind of across language sentiment analysis method and apparatus based on emoji

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810678889.7A CN109325112B (en) 2018-06-27 2018-06-27 A kind of across language sentiment analysis method and apparatus based on emoji

Publications (2)

Publication Number Publication Date
CN109325112A CN109325112A (en) 2019-02-12
CN109325112B true CN109325112B (en) 2019-08-20

Family

ID=65263553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810678889.7A Active CN109325112B (en) 2018-06-27 2018-06-27 A kind of across language sentiment analysis method and apparatus based on emoji

Country Status (1)

Country Link
CN (1) CN109325112B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134962A (en) * 2019-05-17 2019-08-16 中山大学 A kind of across language plain text irony recognition methods based on inward attention power
CN112084295A (en) * 2019-05-27 2020-12-15 微软技术许可有限责任公司 Cross-language task training
CN110309268B (en) * 2019-07-12 2021-06-29 中电科大数据研究院有限公司 Cross-language information retrieval method based on concept graph
US11694042B2 (en) * 2020-06-16 2023-07-04 Baidu Usa Llc Cross-lingual unsupervised classification with multi-view transfer learning
CN112348257A (en) * 2020-11-09 2021-02-09 中国石油大学(华东) Election prediction method driven by multi-source data fusion and time sequence analysis
CN113032559B (en) * 2021-03-15 2023-04-28 新疆大学 Language model fine tuning method for low-resource adhesive language text classification
CN112860901A (en) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 Emotion analysis method and device integrating emotion dictionaries
CN113919340A (en) * 2021-08-27 2022-01-11 北京邮电大学 Self-media language emotion analysis method based on unsupervised unknown word recognition
CN113761204B (en) * 2021-09-06 2023-07-28 南京大学 Emoji text emotion analysis method and system based on deep learning
CN113792143B (en) * 2021-09-13 2023-12-12 中国科学院新疆理化技术研究所 Multi-language emotion classification method, device, equipment and storage medium based on capsule network
CN114429143A (en) * 2022-01-14 2022-05-03 东南大学 Cross-language attribute level emotion classification method based on enhanced distillation
CN116108859A (en) * 2023-03-17 2023-05-12 美云智数科技有限公司 Emotional tendency determination, sample construction and model training methods, devices and equipment
CN116561325B (en) * 2023-07-07 2023-10-13 中国传媒大学 Multi-language fused media text emotion analysis method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488623A (en) * 2013-09-04 2014-01-01 中国科学院计算技术研究所 Multilingual text data sorting treatment method
CN105068988A (en) * 2015-07-21 2015-11-18 中国科学院自动化研究所 Multi-dimension multi-granularity emotion analysis method
CN106326214A (en) * 2016-08-29 2017-01-11 中译语通科技(北京)有限公司 Method and device for cross-language emotion analysis based on transfer learning
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170030570A (en) * 2014-07-07 2017-03-17 머신 존, 인크. System and method for identifying and suggesting emoticons
US20160132607A1 (en) * 2014-08-04 2016-05-12 Media Group Of America Holdings, Llc Sorting information by relevance to individuals with passive data collection and real-time injection
CN107729320B (en) * 2017-10-19 2021-04-13 西北大学 Emoticon recommendation method based on time sequence analysis of user session emotion trend

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488623A (en) * 2013-09-04 2014-01-01 中国科学院计算技术研究所 Multilingual text data sorting treatment method
CN105068988A (en) * 2015-07-21 2015-11-18 中国科学院自动化研究所 Multi-dimension multi-granularity emotion analysis method
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN106326214A (en) * 2016-08-29 2017-01-11 中译语通科技(北京)有限公司 Method and device for cross-language emotion analysis based on transfer learning

Also Published As

Publication number Publication date
CN109325112A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN109325112B (en) A kind of across language sentiment analysis method and apparatus based on emoji
CN106980683B (en) Blog text abstract generating method based on deep learning
Dashtipour et al. Exploiting deep learning for Persian sentiment analysis
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN108875051A (en) Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN110929030A (en) Text abstract and emotion classification combined training method
CN107247702A (en) A kind of text emotion analysis and processing method and system
Lin et al. Automatic translation of spoken English based on improved machine learning algorithm
CN109472026A (en) Accurate emotion information extracting methods a kind of while for multiple name entities
CN110390018A (en) A kind of social networks comment generation method based on LSTM
Wu et al. Sentiment classification using attention mechanism and bidirectional long short-term memory network
CN111639176B (en) Real-time event summarization method based on consistency monitoring
Zhang et al. A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews
Jian et al. [Retracted] LSTM‐Based Attentional Embedding for English Machine Translation
Guo et al. Who is answering whom? Finding “Reply-To” relations in group chats with deep bidirectional LSTM networks
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN115129807A (en) Fine-grained classification method and system for social media topic comments based on self-attention
Lei et al. An input information enhanced model for relation extraction
CN105095302B (en) Public praise-oriented analysis and inspection system, device and method
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
Meng et al. Regional bullying text recognition based on two-branch parallel neural networks
Pradhan et al. A multichannel embedding and arithmetic optimized stacked Bi-GRU model with semantic attention to detect emotion over text data
He et al. Distant supervised relation extraction via long short term memory networks with sentence embedding
Zhang et al. Conditional pre‐trained attention based Chinese question generation
Fu et al. A study on recursive neural network based sentiment classification of Sina Weibo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant