CN111552815A - Extension method and device of emotion corpus and computer readable storage medium - Google Patents

Extension method and device of emotion corpus and computer readable storage medium Download PDF

Info

Publication number
CN111552815A
CN111552815A CN202010248850.9A CN202010248850A CN111552815A CN 111552815 A CN111552815 A CN 111552815A CN 202010248850 A CN202010248850 A CN 202010248850A CN 111552815 A CN111552815 A CN 111552815A
Authority
CN
China
Prior art keywords
emotion
standard
corpus
words
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010248850.9A
Other languages
Chinese (zh)
Other versions
CN111552815B (en
Inventor
过弋
王志宏
尹心明
樊志杰
陈家明
王家辉
张重磊
蔡新玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Third Research Institute of the Ministry of Public Security
Original Assignee
East China University of Science and Technology
Third Research Institute of the Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology, Third Research Institute of the Ministry of Public Security filed Critical East China University of Science and Technology
Priority to CN202010248850.9A priority Critical patent/CN111552815B/en
Publication of CN111552815A publication Critical patent/CN111552815A/en
Application granted granted Critical
Publication of CN111552815B publication Critical patent/CN111552815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of databases, and discloses an extension method and device of an emotion corpus and a computer-readable storage medium. The method for expanding the emotion corpus comprises the following steps: acquiring a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored correspondingly to the standard emotion words; acquiring an extended corpus according to the standard emotion words, and adding and storing the extended corpus to a standard emotion corpus; calculating the emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus into a standard emotion corpus in a correlation manner; and acquiring the emotion category of the expanded corpus according to the standard emotion category, and storing the emotion category of the expanded corpus and the expanded corpus into the standard emotion corpus in a correlation manner. Compared with the prior art, the method and the device for expanding the emotion corpus and the computer-readable storage medium have the advantage of automatic expansion.

Description

Extension method and device of emotion corpus and computer readable storage medium
Technical Field
The present invention relates to the field of databases, and in particular, to a method and an apparatus for extending an emotion corpus, and a computer-readable storage medium.
Background
With the development of intelligent bionic technologies such as autonomous learning, the research on the emotional expression of sentences is more and more intensive, and the emotional expression of sentences in the prior art is performed based on the emotional expression of words in the sentences. In the prior art, the emotion of each word is generally classified and summarized by establishing an emotion word bank.
However, the inventor of the present invention finds that the emotion word bank in the prior art is mostly constructed and updated manually, that is, after the emotion word bank is established, the emotion word bank is continuously updated by workers, which is time-consuming, labor-consuming and incapable of being automatically expanded.
Disclosure of Invention
The embodiment of the invention aims to provide an extension method and device of an emotion corpus and a computer readable storage medium, so that the emotion corpus can be updated autonomously according to the existing content, and new emotion corpuses can be added autonomously.
In order to solve the above technical problem, an embodiment of the present invention provides an extension method for an emotion corpus, including the following steps: acquiring a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored correspondingly to the standard emotion words; acquiring an extended corpus according to the standard emotion words, and adding and storing the extended corpus to the standard emotion corpus; calculating the emotion polarity of the extended corpus according to the standard emotion polarity, and storing the emotion polarity of the extended corpus and the extended corpus in the standard emotion corpus in a correlation manner; and acquiring the emotion category of the extended corpus according to the standard emotion category, and storing the emotion category of the extended corpus and the extended corpus in the standard emotion corpus in an associated manner.
The embodiment of the invention also provides an extension device of the emotion corpus, which comprises the following components: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of emotion corpus expansion as previously described.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the extension method of the emotion corpus.
Compared with the prior art, the implementation mode of the invention can autonomously acquire the extension corpus according to the standard emotion words of the standard emotion corpus terminal after acquiring the preset standard emotion corpus, and add and store the newly added extension prediction into the preset standard emotion corpus.
In addition, the expanded corpus includes expanded words, and the obtaining of the expanded corpus according to the standard emotion words specifically includes: acquiring words with word vector similarity larger than first preset similarity with the standard emotion words as candidate words to obtain a plurality of candidate words; obtaining word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarities of each candidate word; acquiring the number of the candidate similarities which are larger than a second preset similarity in the candidate similarities of the candidate words as the candidate number of the candidate words; and taking the candidate words with the candidate number larger than a preset threshold value as the expansion words.
In addition, the calculating the emotion polarity of the extended corpus according to the standard emotion polarity specifically includes: acquiring word vector similarity of each standard emotion word and the expansion word as sampling similarity; acquiring a plurality of standard emotion words with the sampling similarity larger than a third preset similarity as sampling standard emotion words; acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity; calculating the product of the sampling similarity corresponding to each sampling standard emotion word and the sampling standard emotion polarity; accumulating the products; if the accumulated result is positive, the emotion polarity of the expansion words is 1; and if the accumulated result is negative, the emotion polarity of the expansion word is-1.
In addition, the obtaining of the emotion classification of the extended corpus according to the standard emotion classification specifically includes: acquiring the standard emotion category corresponding to the sampling standard emotion word as a sampling standard emotion category; and taking the emotion classification with the largest number in the sampling standard emotion classifications as the emotion classification of the expansion words.
In addition, the expanded corpus includes expanded emoticons, and the obtaining of the expanded corpus according to the standard emotional words specifically includes: obtaining a statement sample library, wherein the statement sample library comprises a plurality of statements; and acquiring the emoticons which appear in the same sentence together with the standard emotional words as the extended emoticons. The expansion expectation comprises expansion emoticons, the emoticons can be further added into the emotion corpus, and the use scene of the emotion corpus is expanded.
In addition, the calculating the emotion polarity of the extended corpus according to the standard emotion polarity specifically includes: acquiring standard emotion words which appear in the same sentence together with the extended emoticons as sampling standard emotion words; acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity; and calculating the sum of the sampling standard emotion polarities as the emotion polarity of the extended emoticon.
In addition, the obtaining of the emotion classification of the extended corpus according to the standard emotion classification specifically includes: acquiring emotion significance and emotion correlation of the extended emoticons according to the standard emotion types, wherein the emotion significance is used for representing the strength of the extended emoticons for expressing different emotion types, and the emotion correlation is used for representing the capability of the extended emoticons for distinguishing different emotion types; and acquiring the emotion category of the extended emoticon according to the product of the emotion significance and the emotion correlation.
In addition, the obtaining of the emotion significance and emotion correlation of the extended emoticon according to the standard emotion category specifically includes: acquiring the times of the common appearance of each extended expression symbol and the standard emotion words of each emotion type in the same sentence, and acquiring first co-occurrence times of each extended expression symbol and each emotion type; acquiring the times of the common appearance of all the extended emoticons and the standard emotion words of each emotion type in the same sentence, and acquiring second co-occurrence times corresponding to each emotion type; acquiring the times of the common appearance of all the extended emoticons and the standard emotion words of all the emotion types in the same sentence to obtain a third co-occurrence time; acquiring the times of the common appearance of each extended expression symbol and the standard emotion words of all emotion types in the same sentence to obtain a fourth co-occurrence time; and acquiring the emotion significance according to the first co-occurrence frequency and the second co-occurrence frequency, and acquiring the emotion correlation according to the third co-occurrence frequency and the fourth co-occurrence frequency.
Drawings
FIG. 1 is a flowchart of a method for expanding an emotion corpus according to a first embodiment of the present invention;
FIG. 2 is a flowchart of obtaining expanded words in the method for expanding an emotion corpus according to the first embodiment of the present invention;
FIG. 3 is a flowchart of obtaining extended emoticons in the method for extending an emotion corpus according to the first embodiment of the present invention;
fig. 4 is a schematic structural diagram of an emotion corpus expansion apparatus according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The first embodiment of the invention relates to an emotion corpus expansion method. The specific process is shown in fig. 1, and comprises the following steps:
step S101: and acquiring a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored corresponding to the standard emotion words.
Specifically, in the present embodiment, the standard emotion corpus is a pre-established emotion corpus in which a plurality of standard emotion words, and standard emotion polarities and standard emotion categories stored in correspondence with the standard emotion words are stored. The standard emotional words are known words expressing more clear emotions, for example, "happy", etc. can express the emotion of "happy"; "anger", "anger" and the like may express the emotion of "anger" and the like. It should be noted that there are many different types and different sizes of classification bases for the emotional expression of the words in the prior art, and the following table shows a common Chinese word emotional expression table, in which the emotional expression of the Chinese words is divided into 7 emotional major classes and 21 emotional minor classes. It should be understood that the following table is merely an illustration of a specific classification method in the present embodiment, and is not limited thereto, and other classification methods may be used in other embodiments of the present invention, and the method may be flexibly set according to actual needs.
Figure BDA0002434790220000051
Figure BDA0002434790220000061
In addition, the standard emotion corpus also stores the emotion polarity of each standard emotion word as a standard emotion polarity, and stores the emotion category of each standard emotion word as a standard emotion category. For example, the standard emotion category of the standard emotion word "happy" is "music", and the emotion polarity is 1; the standard emotion word "heart hurt" corresponds to the standard emotion category "sadness", and the emotion polarity is-1. Etc., not listed herein.
Step S102: and acquiring the expanded linguistic data according to the standard emotion words, and adding and storing the expanded linguistic data to the standard emotion corpus.
Specifically, in the present embodiment, the expanded corpus includes at least two categories of expanded words and expanded emoticons.
Further, the specific steps of obtaining the expansion words are shown in fig. 2, and include:
step S201: and acquiring words with word vector similarity greater than first preset similarity with the standard emotion words as candidate words to obtain a plurality of candidate words.
Specifically, in this step, for each standard emotion word, a word whose word vector similarity is greater than a first preset similarity is obtained as a candidate word, where the first preset similarity is a user-defined similarity, and the size of the word can be flexibly set as needed. For example, the word vector similarity of the standard emotion word "joy" and the non-standard emotion word "joy" is 0.64, and when the first preset similarity is less than 0.64, the word vector similarity is "joy" which is a candidate word.
Step S202: and obtaining word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarities of each candidate word.
Specifically, in this step, after a plurality of candidate words are obtained, word vector similarities between the candidate words are mutually calculated to obtain a plurality of candidate similarities. For example, for the standard emotional word "joy", a plurality of candidate words "joy", "24774" are acquired, after the center "and" happy ", the similarities between" joy "," 24774 "," center "are mutually calculated.
Step S203: and acquiring the number of candidate similarities larger than a second preset similarity in the plurality of candidate similarities of each candidate word as the candidate number of each candidate word.
Specifically, in this embodiment, each candidate word corresponds to a plurality of candidate similarities, and the number of candidate similarities larger than the second preset similarity is obtained from the plurality of candidate similarities and is used as the candidate number of candidate words. For example, the word vector similarity between "happy", "24774", "treating", "open heart" is shown in the following table,
happy Joy of joy 24774A Chinese medicinal composition, and its preparation method Happy music Joyous
Happy 0.61 0.37 0.59 0.58
Joy of joy 0.61 0.32 0.63 0.64
24774A Chinese medicinal composition, and its preparation method 0.37 0.32 0.32 0.45
Happy music 0.59 0.63 0.32 0.58
Joyous 0.58 0.64 0.45 0.58
Setting the second preset similarity to be 0.5; then 24774, the number of candidates for "treating" is 3, and the number of candidates for "treating" is 1; the number of candidates for "happy" is 1, and the number of candidates for "happy" is 1.
Step S204: and taking the candidate words with the candidate number larger than a preset threshold value as expansion words.
Specifically, in this embodiment, the preset threshold is a threshold set by the user as needed. For example, in the present embodiment, when the preset threshold is set to 2, the word "happy", and "happy" may be used as the expansion word, "24774", and the center may not be used as the expansion word.
In addition, in this embodiment, when the expanded corpus includes the expanded emoticons, the specific step of acquiring the expanded emoticons according to the standard emotional words is shown in fig. 3, and includes:
step S301: obtaining a statement sample library, wherein the statement sample library comprises a plurality of statements.
Specifically, in this embodiment, the sentence sample library is an arbitrarily obtained network chat record, and includes a plurality of sentences. It should be understood that the above sentence sample library is only a specific example of the present embodiment, and is not limited thereto, and in other embodiments of the present invention, the sentence sample library may also be other sentences including emoticons, and is not listed here.
Step S302: and acquiring the emoticons which appear in the same sentence together with the standard emotional words as the extended emoticons.
Specifically, in the present embodiment, emoticons that appear in the same sentence together with a standard emotional word, such as "happy", are acquired as extended emoticons.
Step S103: and calculating the emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus into the standard emotion corpus in an associated manner.
In particular, when expanding a phraseWhen the material includes expansion words, ESW is applied to each expansion wordiThe method can be represented by a group of standard emotional words, namely, the word vector similarity of each standard emotional word and each expansion word is obtained to be used as the sampling similarity; acquiring a plurality of standard emotion words with sampling similarity greater than third preset similarity as sampling standard emotion words; namely ESWi={<BSWi1,Si1>,<BSWi2,Si2>,…,<BSWin,Sin>In which BSWijDenotes the jth and ESWiStandard emotional words with similarity degrees larger than third preset similarity degree, SijExpressing the similarity, the emotion polarity calculation formula of each expansion word is as follows:
Figure BDA0002434790220000081
wherein, P (BSW)ij) Representing standard affective words BSWijIs the sampling standard emotion polarity, is 1 if positive, is-1 if negative, then the expansion word ESWiIs based on the cumulative sum of the products of the polarities of all the corresponding standard emotion words and the similarities between the standard emotion words, and is expressed as P (ESW)i)。
Further, when the expanded corpus includes expanded emoticons, SE is applied to the expanded emoticonsiObtaining as a sample standard emotion word a standard emotion word with which it co-occurs in the same sentence, e.g. SWjTo express the jth and emoticon SEiCo-occurrence of emotional words; then the emoticon SE is expandedjThe emotion polarity calculation formula is as follows:
Figure BDA0002434790220000091
wherein, P (SW)j) Express emotion word SWjWith a positive polarity of 1 and a negative polarity of-1.
Step S104: and acquiring the emotion category of the expanded corpus according to the standard emotion category, and storing the emotion category of the expanded corpus and the expanded corpus into the standard emotion corpus in a correlation manner.
Specifically, when the expanded corpus includes expanded terms, the ESW is applied to each expanded termiThe method can be represented by a group of standard emotional words, namely, the word vector similarity of each standard emotional word and each expansion word is obtained to be used as the sampling similarity; acquiring a plurality of standard emotion words with sampling similarity greater than third preset similarity as sampling standard emotion words; namely ESWi={<BSWi1,Si1>,<BSWi2,Si2>,…,<BSWin,Sin>In which BSWijDenotes the jth and ESWiStandard emotional words with similarity degrees larger than third preset similarity degree, SijExpressing the similarity, the emotion type calculation formula of each expansion word is as follows:
Figure BDA0002434790220000092
C=
{ happy, feared, angry, frightened, badly, sadly };
wherein, C (BSW)ij) Expressed in the standard emotional word BSWijThe emotion category of (1), i.e. one of the emotion categories C ═ le, good, fear, anger, fright, nausea, sadness }, the extension word ESWiThe emotion classification (c) is the cumulative sum of the emotion classifications of all the standard emotion words corresponding to the emotion classification (c) and the largest emotion classification is taken as the emotion classification of the extension word (c (esw)).
Further, when the extended corpus includes extended emoticons, acquiring emotion significance and emotion correlation of the extended emoticons according to the standard emotion categories, wherein the emotion significance is used for representing the strength of the extended emoticons for expressing different emotion categories, and the emotion correlation is used for representing the capability of the extended emoticons for distinguishing different emotion categories.
The emotion significance is calculated in the following manner:
Figure BDA0002434790220000101
wherein, ESSijIndicating the ith emoticon tendencySignificance in jth emotion category, CoCoCoCount (SE)i,CSWj) Indicates the co-occurrence times of the ith emoticon and the jth emotion category word, i.e. the first co-occurrence time, CoCoCoCount (SE, CSW)j) Representing the number of co-occurrences of all emoticons with the jth emotion category, i.e., the second number of co-occurrences.
The emotional relevance is calculated as follows.
Figure BDA0002434790220000102
Wherein ESRiExpressing the emotion category correlation of the ith emoticon, CoCoCoCoCoCoCount (SE, CSW) expressing the frequency of the co-occurrence of all emoticons and all emotion category words in the data set, namely the third co-occurrence frequency; CoCoCoCocount (SE)iCSW) represents the co-occurrence number of the ith emoticon with all emotion category words, i.e., the fourth co-occurrence number.
The final emoticon emotion category bias is calculated as follows:
ESCij=ESSij*ESRi
the method calculates the tendencies of all kinds of emotion categories corresponding to each expression symbol, sorts the emotions from small to large, and then selects the emotion category with the highest tendency as the emotion category of the expression symbol.
Compared with the prior art, after the preset standard emotion corpus is obtained, the first embodiment of the invention can automatically obtain the extension corpus according to the standard emotion words of the terminal sum of the standard emotion corpus, and adds and stores the newly added extension prediction into the preset standard emotion corpus.
A second embodiment of the present invention relates to an emotion corpus expansion device, as shown in fig. 4, including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executable by the at least one processor 401 to enable the at least one processor 401 to perform the method for emotion corpus expansion as described above.
Where the memory 402 and the processor 401 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.
A third embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A method for expanding an emotion corpus is characterized by comprising the following steps:
acquiring a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored correspondingly to the standard emotion words;
acquiring an extended corpus according to the standard emotion words, and adding and storing the extended corpus to the standard emotion corpus;
calculating the emotion polarity of the extended corpus according to the standard emotion polarity, and storing the emotion polarity of the extended corpus and the extended corpus in the standard emotion corpus in a correlation manner;
and acquiring the emotion category of the extended corpus according to the standard emotion category, and storing the emotion category of the extended corpus and the extended corpus in the standard emotion corpus in an associated manner.
2. The method for expanding an emotion corpus according to claim 1, wherein the expanded corpus includes expanded words, and the obtaining of the expanded corpus according to the standard emotion words specifically includes:
acquiring words with word vector similarity larger than first preset similarity with the standard emotion words as candidate words to obtain a plurality of candidate words;
obtaining word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarities of each candidate word;
acquiring the number of the candidate similarities which are larger than a second preset similarity in the candidate similarities of the candidate words as the candidate number of the candidate words;
and taking the candidate words with the candidate number larger than a preset threshold value as the expansion words.
3. The method for expanding an emotion corpus according to claim 2, wherein the calculating the emotion polarity of the expanded corpus according to the standard emotion polarity specifically includes:
acquiring word vector similarity of each standard emotion word and the expansion word as sampling similarity;
acquiring a plurality of standard emotion words with the sampling similarity larger than a third preset similarity as sampling standard emotion words;
acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity;
calculating the product of the sampling similarity corresponding to each sampling standard emotion word and the sampling standard emotion polarity;
accumulating the products; if the accumulated result is positive, the emotion polarity of the expansion words is 1; and if the accumulated result is negative, the emotion polarity of the expansion word is-1.
4. The method for expanding an emotion corpus according to claim 3, wherein the obtaining of the emotion classification of the expanded corpus according to the standard emotion classification specifically includes:
acquiring the standard emotion category corresponding to the sampling standard emotion word as a sampling standard emotion category;
and taking the emotion classification with the largest number in the sampling standard emotion classifications as the emotion classification of the expansion words.
5. The method for expanding an emotion corpus according to claim 1, wherein the expanded corpus includes expanded emoticons, and the obtaining of the expanded corpus according to the standard emotion words specifically includes:
obtaining a statement sample library, wherein the statement sample library comprises a plurality of statements;
and acquiring the emoticons which appear in the same sentence together with the standard emotional words as the extended emoticons.
6. The method for expanding an emotion corpus according to claim 5, wherein the calculating the emotion polarity of the expanded corpus according to the standard emotion polarity specifically includes:
acquiring standard emotion words which appear in the same sentence together with the extended emoticons as sampling standard emotion words;
acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity;
and calculating the sum of the sampling standard emotion polarities as the emotion polarity of the extended emoticon.
7. The method for expanding an emotion corpus according to claim 5, wherein the obtaining of the emotion classification of the expanded corpus according to the standard emotion classification specifically includes:
acquiring emotion significance and emotion correlation of the extended emoticons according to the standard emotion types, wherein the emotion significance is used for representing the strength of the extended emoticons for expressing different emotion types, and the emotion correlation is used for representing the capability of the extended emoticons for distinguishing different emotion types;
and acquiring the emotion category of the extended emoticon according to the product of the emotion significance and the emotion correlation.
8. The method for expanding an emotion corpus according to claim 7, wherein the obtaining of the emotion significance and emotion correlation of the expanded emoticons according to the standard emotion category specifically includes:
acquiring the times of the common appearance of each extended expression symbol and the standard emotion words of each emotion type in the same sentence, and acquiring first co-occurrence times of each extended expression symbol and each emotion type;
acquiring the times of the common appearance of all the extended emoticons and the standard emotion words of each emotion type in the same sentence, and acquiring second co-occurrence times corresponding to each emotion type;
acquiring the times of the common appearance of all the extended emoticons and the standard emotion words of all the emotion types in the same sentence to obtain a third co-occurrence time;
acquiring the times of the common appearance of each extended expression symbol and the standard emotion words of all emotion types in the same sentence to obtain a fourth co-occurrence time;
and acquiring the emotion significance according to the first co-occurrence frequency and the second co-occurrence frequency, and acquiring the emotion correlation according to the third co-occurrence frequency and the fourth co-occurrence frequency.
9. An apparatus for expanding an emotion corpus, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of emotion corpus expansion as claimed in any one of claims 1 to 8.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for extending an emotion corpus of any one of claims 1 to 8.
CN202010248850.9A 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium Active CN111552815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010248850.9A CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010248850.9A CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111552815A true CN111552815A (en) 2020-08-18
CN111552815B CN111552815B (en) 2023-11-17

Family

ID=72005506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010248850.9A Active CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111552815B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342944A (en) * 2021-04-29 2021-09-03 腾讯科技(深圳)有限公司 Corpus generalization method, apparatus, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王振宇等: "基于HowNet和PMI的词语情感极性计算", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342944A (en) * 2021-04-29 2021-09-03 腾讯科技(深圳)有限公司 Corpus generalization method, apparatus, device and storage medium
CN113342944B (en) * 2021-04-29 2023-04-07 腾讯科技(深圳)有限公司 Corpus generalization method, apparatus, device and storage medium

Also Published As

Publication number Publication date
CN111552815B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
US20100268725A1 (en) Acquisition of semantic class lexicons for query tagging
CN107291840B (en) User attribute prediction model construction method and device
CN106294505B (en) Answer feedback method and device
CN113836938B (en) Text similarity calculation method and device, storage medium and electronic device
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN114492429B (en) Text theme generation method, device, equipment and storage medium
CN104881397A (en) Method and apparatus for expanding abbreviations
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN116127060A (en) Text classification method and system based on prompt words
CN110969005B (en) Method and device for determining similarity between entity corpora
CN111552815A (en) Extension method and device of emotion corpus and computer readable storage medium
CN114385791A (en) Text expansion method, device, equipment and storage medium based on artificial intelligence
CN113919424A (en) Training of text processing model, text processing method, device, equipment and medium
CN110489740B (en) Semantic analysis method and related product
CN112287077A (en) Statement extraction method and device for combining RPA and AI for document, storage medium and electronic equipment
CN114490956A (en) Keyword extraction method and device
CN115168537B (en) Training method and device for semantic retrieval model, electronic equipment and storage medium
CN112015895A (en) Patent text classification method and device
CN110069780B (en) Specific field text-based emotion word recognition method
Gendron et al. Natural language processing: a model to predict a sequence of words
CN111241275B (en) Short text similarity evaluation method, device and equipment
CN113821623A (en) Model training method, device, equipment and storage medium
CN114925185B (en) Interaction method, model training method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant