CN116450840A - Deep learning-based field emotion dictionary construction method - Google Patents

Deep learning-based field emotion dictionary construction method Download PDF

Info

Publication number
CN116450840A
CN116450840A CN202310284451.1A CN202310284451A CN116450840A CN 116450840 A CN116450840 A CN 116450840A CN 202310284451 A CN202310284451 A CN 202310284451A CN 116450840 A CN116450840 A CN 116450840A
Authority
CN
China
Prior art keywords
emotion
word
dictionary
domain
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310284451.1A
Other languages
Chinese (zh)
Inventor
李诗轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202310284451.1A priority Critical patent/CN116450840A/en
Publication of CN116450840A publication Critical patent/CN116450840A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a deep learning-based field emotion dictionary construction method, which comprises data preprocessing, word vector model construction and classifier construction, wherein the data preprocessing utilizes a field corpus and a semantic knowledge base to determine a general emotion word set oriented to a specific field; in the word vector model construction process, a seed emotion word set and a field corpus are utilized to determine a candidate emotion word set, then the field universal emotion word set obtained by the candidate emotion word set and the data preprocessing part is converted into word vectors, the word vectors are used as input of a classifier construction part, construction of a field emotion dictionary is assisted, and the method provides an emotion dictionary automatic construction and readability indexing method based on knowledge fusion. The automatic classification of seed emotion words is realized by utilizing a fully-connected neural network combined with a multi-head attention mechanism, emotion dictionaries of annual reports facing to the market company and network financial news facing to the market company are respectively generated, and emotion indexing is further realized.

Description

Deep learning-based field emotion dictionary construction method
Technical Field
The invention relates to the field of deep learning, in particular to a field emotion dictionary construction method based on deep learning.
Background
The emotion dictionary construction method is mainly divided into two types, namely an emotion dictionary construction method based on a knowledge base and an emotion dictionary construction method based on a corpus. The method based on the knowledge base is based on a universal semantic knowledge base, and semantic relations among words are mined through a word relation and paraphrasing-based emotion dictionary expansion method and an emotion dictionary expansion method based on word similarity, so that an emotion dictionary with certain universality is constructed, and the method has the advantages of being strong in universality and high in accuracy. The method based on the corpus is based on the corpus in a specific field, and the relation of the vocabulary in the corpus is mined through a context relation-based emotion dictionary construction method and a word co-occurrence-based emotion dictionary construction method, so that an emotion dictionary with field characteristics is constructed, and the method has the advantage of high accuracy.
The two emotion dictionary construction methods are good and bad, the two methods can be combined to construct a dictionary with field characteristics and higher precision, and the existing scheme lacks of an emotion dictionary which faces to the annual report of a marketing company and faces to the network financial news of the marketing company, so that the emotion dictionary of the text in the financial field is not fully constructed. Therefore, the research provides a field emotion dictionary construction method based on deep learning.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a field emotion dictionary construction method based on deep learning so as to solve the problems in the background art.
In order to achieve the above object, the present invention is realized by the following technical scheme: the method for constructing the domain emotion dictionary based on deep learning comprises data preprocessing, word vector model construction and classifier construction, wherein the data preprocessing utilizes a domain corpus and a semantic knowledge base to determine a general emotion word set oriented to a specific domain and mainly comprises a semantic knowledge base and a processing part of the domain corpus; in the word vector model construction process, a candidate emotion vocabulary set is determined by utilizing a seed emotion vocabulary set and a field corpus, and then the field universal emotion vocabulary set obtained by the candidate emotion vocabulary set and the data preprocessing part is converted into word vectors, and the word vectors are used as input of a classifier construction part to assist in constructing a field emotion dictionary; the classifier construction comprises a classifier based on a fully connected neural network (DNN), a Multi-head attention+DNN (MA-DNN) combined with a Multi-head Attention mechanism and a Bi-directional long and short term memory network (Bi-LSTM).
Furthermore, the processing of the semantic knowledge base is the fusion of the existing universal emotion dictionary, and the processing of the domain corpus mainly comprises the steps of word stopping, special symbol removing and word segmentation, so that a data base is provided for the subsequent processing.
Furthermore, the positive emotion vocabulary and the negative emotion vocabulary which are subjected to reference are fused, repeated vocabularies are removed, a needed universal emotion vocabulary set is formed, after the processing of the semantic knowledge base and the domain corpus is completed, the semantic knowledge base and the domain corpus are further fused, the intersection of the semantic knowledge base and the domain corpus is taken, and finally the universal emotion vocabulary set oriented to the specific domain is formed.
Further, the Word vector model construction process comprises the steps of determining a seed emotion Word set, training Word vectors by using Word2vec, calculating semantic similarity among words and training Word vectors by using Bert.
Further, after the seed emotion Word set is determined, training the domain corpus into Word vectors by using Word2 vec; after word vector training of the domain corpus is completed, further calculating similar words of the seed emotion word set in the domain corpus through an algorithm; after the seed emotion word set is obtained, the seed emotion word set and the universal emotion word set which is obtained by the data preprocessing part and faces to the specific field are required to be converted into word vectors and used as input of a subsequent classifier.
Further, the internal network structure of the DNN classifier is divided into three types, namely an input layer, a hidden layer and an output layer, wherein a network layer can be added in the hidden layer, the information is transmitted from front to back in a fully connected mode between the layers, and the value of the neurons of the rear layer can be determined by the weighted combination of the values of all the neurons of the previous layer.
Furthermore, the MA-DNN classifier is added with a fully-connected neural network on the basis of Multi-head Attention, the input of DNN in the first layer is a word vector converted by using Bert, the vector processed by the DNN network is used as the input of Multi-head Attention, the characteristic information of each position can be obtained through calculation in the first layer, the calculation result is input into the second DNN layer, the result of the first two layers is synthesized to further calculate to obtain the original output, and the final probability distribution can be obtained after the output result is processed by the Softmax layer.
Further, after model building is completed, the definition of the optimizer and the loss function is performed, and similar to the DNN classifier, the optimizer selects Adam with more application, and the loss function selection is widely applied to cross entropy of the two-classification problem. And then dividing the training set and the testing set in a ratio of 9:1, and carrying out cyclic training for 100 times to finally obtain the classification result of the candidate emotion vocabulary set.
Further, in the Bi-LSTM classifier, the context information is obtained after combining the forward LSTM and the reverse LSTM, the input of the Bi-LSTM in the first layer is a word vector converted by using Bert, the vector processed by the Bi-LSTM network is used as the input of the DNN in the second layer, the result in the previous layer is synthesized to further calculate to obtain the original output, and the output result is processed by the Softmax layer to obtain the final probability distribution.
Further, the optimizer chooses to apply more Adam, and the loss function choice is widely applied to cross entropy of the two-classification problem. Then dividing a training set and a testing set in a ratio of 9:1, and performing cyclic training for 100 times to finally obtain a classification result of the candidate emotion vocabulary set; after the classifier is built, the classifier is evaluated, and the classifier with the optimal classification effect is selected as a domain emotion dictionary.
The invention has the beneficial effects that:
1. the deep learning-based field emotion dictionary construction method provides an emotion dictionary automatic construction and readability indexing method based on knowledge fusion. The method for combining the corpus and the knowledge base is applied to the process of automatically constructing the emotion dictionary of the text in the financial field, and Word2vec and Bert are used for Word vector training in different tasks based on the acquired corpus.
2. The deep learning-based field emotion dictionary construction method utilizes a fully-connected neural network combined with a multi-head attention mechanism to realize automatic classification of seed emotion vocabularies, generates an emotion dictionary facing a commercial company annual report and facing a commercial company network financial news respectively, and further realizes emotion indexing.
Drawings
FIG. 1 is a schematic diagram of the appearance of a deep learning-based domain emotion dictionary construction method of the present invention;
FIG. 2 is a network structure diagram of a DNN classifier of the deep learning-based domain emotion dictionary construction method;
FIG. 3 is a Multi-head Attention structure diagram of the deep learning-based domain emotion dictionary construction method of the present invention;
FIG. 4 is a network structure diagram of the MA-DNN classifier of the deep learning-based domain emotion dictionary construction method;
FIG. 5 is a diagram of a Bi-LSTM network structure of the deep learning-based domain emotion dictionary construction method of the present invention;
FIG. 6 is a diagram of a classifier network structure of the deep learning-based field emotion dictionary construction method Bi-LSTM of the present invention;
FIG. 7 is a flow chart of annual report emotion indexing for a marketer in the present invention;
FIG. 8 is a flowchart of acquiring financial crisis early warning knowledge based on annual report emotion indexing in the invention;
FIG. 9 is a flow chart of the network financial news emotion indexing of the marketing company in the invention;
FIG. 10 is a flowchart of acquiring financial crisis early warning knowledge based on web financial news emotion indexing in the invention.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
Example 1
Deep learning-based annual report emotion indexing of marketable companies:
by using the deep learning-based field emotion dictionary construction method, an emotion dictionary which faces to the annual report of a marketing company is constructed, emotion indexing of the annual report of the marketing company is carried out on the basis of the emotion dictionary, and finally, financial crisis early warning knowledge based on the emotion indexing of the annual report of the marketing company is obtained, wherein the specific flow is shown in figure 7
(1) Emotion dictionary construction for annual reports to the market company:
annual reports for all marketable companies between 2016 and 2018 were obtained by crawlers for a total of 10608, the distribution of the number of reports being shown in table 1. Since the experimental corpus is annual report of the market company in 2015 and before, the annual report after 2016 is selected as corpus constructed by emotion dictionary.
TABLE 1 annual report corpus distribution for marketing companies
After the annual newspaper corpus of the marketing company is collected, stopping word removal, special symbol removal and word segmentation are performed on the language database. And then, the processed annual report corpus of the marketable company and the universal emotion vocabulary set are intersected, the result is shown in table 2, and the universal emotion vocabulary set which faces the annual report of the marketable company is further formed and used as the training corpus constructed by the subsequent emotion dictionary classifier.
TABLE 2 intersection of generic emotion vocabulary sets and annual newspaper corpora from a marketing company
Meanwhile, word2vec is utilized to train annual newspaper corpus of a marketing company, similar words of seed emotion words contained in the field corpus are further extracted, the first 5 words which are most similar to each seed emotion Word are used as a candidate emotion Word set, and after duplication removal, the candidate emotion Word set contains 3734 words. After the word vector conversion of the candidate emotion vocabulary set and the field universal emotion vocabulary set based on the Bert is completed, the converted word vector is used as the input of the classifier. The three classifiers of DNN, MA-DNN and Bi-LSTM were constructed separately, and the performance evaluation of the three models in constructing the annual report emotion dictionary of the marketable company is shown in Table 3.
TABLE 3 emotion dictionary performance assessment for annual reports to the market company
Experiments show that the accuracy and recall rate of the MA-DNN model are optimal, the performance of the model is better when an emotion dictionary reported by an upward market company is built, the accuracy of the DNN model is higher, and the performance of the Bi-LSTM model is poorer, so that the model is in line with the pre-judgment in the process of model construction. Thus, the present invention utilizes the results of the MA-DNN model as a annual report emotion dictionary for a marketable company, wherein 2517 positive words, 1217 negative words, and a portion of the words are shown in Table 4.
Table 4 example Domain emotion dictionary for annual report to Online company
As shown in the table, the emotion dictionary which is built by the invention and reports on the year of the market company has strong field characteristics. The active vocabulary mainly covers some vocabularies that represent better developments in the capital market, such as: rising, walking high, etc.; some stock market specific vocabulary is also contemplated, such as: increase the holding, stop falling and stabilize etc.; also included are some words related to positive policies, such as: encouragement, and the like. The negative vocabulary mainly covers some vocabularies representing poor market development situation, for example: concussion, low walking, falling, etc.; also included are some stock market specific vocabulary, such as: empty, high-open, low-pass, etc.; but also to some vocabulary that represents poor business operations, such as: product backlog, atrophy, etc.
(2) Annual report emotion index for marketing company:
after the emotion dictionary of the annual report of the marketable company is built, emotion polarity judgment of the annual report of the marketable company is carried out by using an emotion indexing method based on the dictionary, and financial crisis early warning knowledge based on the emotion index of the annual report of the marketable company is obtained, and a specific flow is shown in fig. 8.
As shown in FIG. 8, the financial crisis pre-warning knowledge based on the annual report emotion index of the marketing company includes emotion index knowledge based on the LM dictionary and emotion index knowledge of the annual report dictionary constructed based on the present study. The emotion indexing knowledge based on the LM dictionary can be obtained from annual report text and gas sheets in a company feature library of the CNRDS database, and the Tang et al (2020) indicates that the emotion indexing knowledge based on the LM dictionary and reported by a marketing company in annual report can play a certain role in early warning of financial crisis. The LM dictionary positive vocabulary number (NB 1) is the positive vocabulary number in the annual report text of the marketing company based on the LM dictionary statistics, the LM dictionary negative vocabulary number (NB 2) is the negative vocabulary number in the annual report text of the marketing company based on the LM dictionary statistics, the annual report emotion value (NB 5) based on the LM dictionary is the emotion score of the annual report text of the marketing company based on the LM dictionary, and the calculation formula is as follows:
in addition, the invention utilizes the established emotion dictionary which faces to the annual report of the company in the market to index emotion of the annual report of the company in the market and acquire relevant financial crisis early warning knowledge. Specifically, the field emotion dictionary containing positive and negative vocabulary is matched with the annual reports to be analyzed, and the number of the positive vocabulary and the number of the negative vocabulary contained in each annual report are counted to obtain the number of the positive vocabulary (NB 3) and the number of the negative vocabulary (NB 4) of the annual dictionary. Further, a annual report emotion value 1 (NB 6) based on an annual report dictionary is obtained, and the calculation formula is as follows:
the annual report emotion value 2 (NB 7) based on the annual report dictionary needs to be calculated by relying on the established annual report emotion dictionary, the degree adverb dictionary and the negative dictionary of the marketing company, and weights of the three dictionaries are shown in tables 4 to 27. A marketing company annual report may contain a plurality of sentences, each sentence being composed of a plurality of information elements, the study referencing emotion tendency calculation principles of historical literature (Zhang et al, 2018; tang et al, 2019), calculating weights of emotion words without modifiers (degree adverbs and negatives) using an addition rule, and calculating weights of emotion words with modifiers using a multiplication rule; and finally the addition rules are utilized to calculate the weights of the different emotion combinations. The study uses the Python program to implement the above calculation procedure.
Table 5 dictionary for emotion indexing and weights thereof
Specifically, the present study is presented as B i To express emotion tendencies of different elements in a sentence, S i For representing emotional tendency of a sentence, NB_2 senti To represent emotional tendency of a year report of a marketing company. When k emotion words without modifier are contained in a sentenceThe emotion tendencies of this element are calculated as follows:
wherein P is i Representing the weight of the affective word. When a sentence contains k emotion words modified by a degree adverb or a negative word, the emotion tendencies of the words are calculated as follows:
wherein A is i Weights representing degree adverbs or negatives. Based on the two formulas, the emotion tendencies of a sentence can be calculated as:
S i =B i1 +B i2 (5)
finally, the emotion tendency value of a annual report of a marketing company containing n sentences can be calculated by the following formula, and it is noted that the log processing of the whole emotion value is required because of the longer annual report of the marketing company:
example 2
Online company network financial news emotion indexing based on deep learning:
similarly, by using the method for constructing the domain emotion dictionary based on deep learning, the invention constructs the emotion dictionary of the network financial news of the marketing company, carries out emotion indexing of the network financial news of the marketing company on the basis of the emotion dictionary, and finally obtains the financial crisis early warning knowledge based on the emotion indexing of the network financial news of the marketing company, and the specific flow is shown in fig. 9.
(1) Emotion dictionary construction for online financial news of market-oriented company
The invention uses financial and stock news in THUCNews1 (news source is New wave news) corpus provided by Qinghai university natural language processing laboratory as corpus constructed by network financial and stock emotion dictionary of marketing company, and the number and distribution of news are shown in table 6.
TABLE 6 network financial news corpus distribution
After the THUCNews corpus is collected, the operations of stopping words, removing special symbols and segmenting words are carried out on the corpus. And then, the processed financial and stock news corpus and the universal emotion vocabulary set are intersected, the result is shown in a table 7, and the universal emotion vocabulary set facing the network news of the market company is further formed and is used as the training corpus constructed by the follow-up emotion dictionary classifier.
TABLE 7 intersection of universal emotion vocabulary sets and web financial news corpus
Taking the number of active words after intersection Number of negative vocabulary after intersection Vocabulary quantity aggregation
6546 7072 13618
Similar to the emotion dictionary construction method for annual reports of the market-oriented company, word2vec is utilized to train network financial news corpus, similar words of seed emotion words contained in the field corpus are further extracted, the first 5 words which are most similar to each seed emotion Word are used as a candidate emotion Word set, and after duplication removal, the candidate emotion Word set contains 4474 words. Similarly, after the universal emotion vocabulary set and the candidate emotion vocabulary set facing the network financial news are obtained, the two sets are respectively converted into word vectors by using Bert and used as input of a classifier. The three classifiers of DNN, MA-DNN and Bi-LSTM were constructed separately, and the performance evaluation of the three models in constructing the annual report emotion dictionary of the marketable company is shown in Table 8.
Table 8 Emotion dictionary Performance assessment of Online financial news for Online companies
In this embodiment, the MA-DNN model is optimal in terms of each feature when constructing the emotion dictionary of the network financial news of the company on the market, the DNN model is inferior, and the Bi-LSTM model is poor in effect. Thus, the present study utilized the results of this model as a marketing company network financial news emotion dictionary, with 2151 active vocabulary and 2323 passive vocabulary, and a portion of the vocabulary as shown in Table 9. In addition, compared with the annual report field dictionary of the marketing company, the accuracy rate of constructing the network financial news field dictionary is relatively low, which is related to the fact that the seed emotion dictionary is selected to be dependent on the LM dictionary, because the LM dictionary is constructed based on the annual report (10-K) of the marketing company in the United states, and the annual language habit of the annual report and the network financial news have certain difference; and the number of candidate emotion vocabularies of the network financial news of the marketing company is relatively large, and the performance of the model is affected to a certain extent.
Table 9 example of Domain emotion dictionary for network financial news to market company
As shown in the table, the emotion dictionary for network financial news of the marketable company constructed by the research has strong field characteristics, and has differences and similarities compared with the annual report emotion dictionary of the marketable company. Wherein the active vocabulary mainly covers related vocabulary with better stock market development situation, for example: high opening, daniu market, high innovation, etc.; also encompassed are part of the vocabulary associated with the main body of the capital market, for example: earning, good, profitable, etc. The passive vocabulary mainly covers some related vocabularies with poor stock market development situation, for example: high-open low-walking, falling stopping, bear market, etc.; words that are partially related to capital market subject behavior are also encompassed, for example: relief, departure, etc.
(2) Online financial news emotion indexing for marketing company
After the emotion dictionary construction for the network financial news is completed, emotion polarity judgment of the network financial news of the marketing company is carried out by using an emotion indexing method based on the dictionary, and financial crisis early warning knowledge based on the emotion indexing of the network financial news of the marketing company is obtained, wherein the specific flow is shown in fig. 10.
As shown in FIG. 10, the financial crisis pre-warning knowledge based on the web financial news emotion index of the marketing company is mainly emotion index knowledge obtained based on the web financial news dictionary constructed by the research. Specifically, the field emotion dictionary containing positive and negative vocabulary is matched with the network financial news to be analyzed, the number of the positive vocabulary and the number of the negative vocabulary contained in each annual report are counted, the number of the positive vocabulary (XW 1) of the annual newspaper dictionary and the number of the negative vocabulary (XW 2) of the annual newspaper dictionary are obtained, and then emotion score 1 of each news is obtained:
then, n news related to each enterprise needs to be summed, and then the average emotion score is calculated, and finally, news emotion value 1 (XW 3) based on a news dictionary is obtained, wherein the calculation formula is as follows:
in addition, the news emotion value 2 (XW 4) calculation process based on the news dictionary and nb_2 described in the previous section senti The calculation process of (2) is similar, but because the news spread is relatively short, log processing of the whole emotion value is not needed. Upon obtaining emotion score M for each news i Then, the average emotion scores of n news related to each enterprise need to be further calculated, and the calculation formula is as follows:
through the calculation process, the LM dictionary positive vocabulary number (NB 1), the LM dictionary negative vocabulary number (NB 2), the annual report dictionary positive vocabulary number (NB 3), the annual report dictionary negative vocabulary number (NB 4), the annual report emotion value (NB 5) based on the LM dictionary, the annual report emotion value 1 (NB 6) based on the annual report dictionary and the annual report emotion value 2 (NB 7) based on the annual report emotion index of the marketing company can be obtained, and the total of 7 financial crisis pre-warning knowledge based on the annual report emotion index of the marketing company can be obtained. And the number of active vocabularies (XW 1) of the news dictionary, the number of passive vocabularies (XW 2) of the news dictionary, the news emotion value 1 (XW 3) based on the news dictionary and the news emotion value 2 (XW 4) based on the news dictionary, which are 4 financial crisis early warning knowledge based on network financial news emotion index of the marketing company.
While the fundamental and principal features of the invention and advantages of the invention have been shown and described, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (10)

1. The field emotion dictionary construction method based on deep learning is characterized by comprising the following steps of: the method comprises data preprocessing, word vector model construction and classifier construction, wherein the data preprocessing utilizes a domain corpus and a semantic knowledge base to determine a universal emotion word set oriented to a specific domain, and mainly comprises a semantic knowledge base and a processing part of the domain corpus; in the word vector model construction process, a candidate emotion vocabulary set is determined by utilizing a seed emotion vocabulary set and a field corpus, and then the field universal emotion vocabulary set obtained by the candidate emotion vocabulary set and the data preprocessing part is converted into word vectors, and the word vectors are used as input of a classifier construction part to assist in constructing a field emotion dictionary; the classifier construction comprises a classifier based on a fully connected neural network (DNN), a fully connected neural network (Multi-head attention+DNN, called MA-DNN for short) combined with a Multi-head attention mechanism and a Bi-directional long and short-term memory network (Bi-LSTM).
2. The deep learning-based domain emotion dictionary construction method of claim 1, wherein: the processing of the semantic knowledge base is the fusion of the existing universal emotion dictionary, and the processing of the domain corpus mainly comprises the steps of stopping words, removing special symbols and word segmentation, and a data base is provided for subsequent processing.
3. The deep learning-based domain emotion dictionary construction method of claim 2, wherein: the positive emotion vocabulary and the negative emotion vocabulary which are subjected to reference are fused, repeated vocabularies are removed, a needed universal emotion vocabulary set is formed, after the processing of a semantic knowledge base and the domain corpus is completed, the semantic knowledge base and the domain corpus are further required to be fused, the intersection of the semantic knowledge base and the domain corpus is taken, and finally the universal emotion vocabulary set oriented to the specific domain is formed.
4. The deep learning-based domain emotion dictionary construction method of claim 1, wherein: the Word vector model construction process comprises the steps of determining a seed emotion Word set, training Word vectors by using Word2vec, calculating semantic similarity among words and training Word vectors by using Bert.
5. The deep learning-based domain emotion dictionary construction method of claim 4, wherein: after the seed emotion Word set is determined, training the domain corpus into Word vectors by using Word2 vec; after word vector training of the domain corpus is completed, further calculating similar words of the seed emotion word set in the domain corpus through an algorithm; after the seed emotion word set is obtained, the seed emotion word set and the universal emotion word set which is obtained by the data preprocessing part and faces to the specific field are required to be converted into word vectors and used as input of a subsequent classifier.
6. The deep learning-based domain emotion dictionary construction method of claim 1, wherein: the internal network structure of the DNN classifier is divided into three types, namely an input layer, a hidden layer and an output layer, wherein a network layer can be added to the hidden layer, the information is transmitted from front to back by using a full connection mode between the layers, and the value of a neuron at the rear layer can be determined by the weighted combination of the values of all neurons at the previous layer.
7. The deep learning-based domain emotion dictionary construction method of claim 6, wherein: the MA-DNN classifier is added with a fully-connected neural network on the basis of Multi-head attribute, the input of a first layer DNN is a word vector converted by using Bert, the vector processed by the DNN network is used as the input of Multi-head attribute, the first layer can acquire the characteristic information of each position through calculation, the calculation result is input into a second DNN layer, the result of the first two layers is synthesized to further calculate to obtain the original output, and the final probability distribution can be obtained after the output result is processed by a Softmax layer.
8. The deep learning-based domain emotion dictionary construction method of claim 7, wherein: after the model is built, the definition of an optimizer and a loss function is carried out, and the optimizer selects Adam with more application and the loss function is widely applied to cross entropy of the two-classification problem similar to the DNN classifier. And then dividing the training set and the testing set in a ratio of 9:1, and carrying out cyclic training for 100 times to finally obtain the classification result of the candidate emotion vocabulary set.
9. The deep learning-based domain emotion dictionary construction method of claim 7, wherein: in the Bi-LSTM classifier, the forward LSTM and the reverse LSTM are combined to obtain context information, the input of the first layer of Bi-LSTM is a word vector converted by using Bert, the vector processed by the Bi-LSTM network is used as the input of the second layer of DNN, the result of the previous layer is synthesized to further calculate to obtain the original output, and the output result is processed by the Softmax layer to obtain the final probability distribution.
10. The deep learning-based domain emotion dictionary construction method of claim 9, wherein: the optimizer chooses Adam with more application and the loss function choice is widely applied to cross entropy of the two classification problem. Then dividing a training set and a testing set in a ratio of 9:1, and performing cyclic training for 100 times to finally obtain a classification result of the candidate emotion vocabulary set; after the classifier is built, the classifier is evaluated, and the classifier with the optimal classification effect is selected as a domain emotion dictionary.
CN202310284451.1A 2023-03-22 2023-03-22 Deep learning-based field emotion dictionary construction method Pending CN116450840A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310284451.1A CN116450840A (en) 2023-03-22 2023-03-22 Deep learning-based field emotion dictionary construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310284451.1A CN116450840A (en) 2023-03-22 2023-03-22 Deep learning-based field emotion dictionary construction method

Publications (1)

Publication Number Publication Date
CN116450840A true CN116450840A (en) 2023-07-18

Family

ID=87129385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310284451.1A Pending CN116450840A (en) 2023-03-22 2023-03-22 Deep learning-based field emotion dictionary construction method

Country Status (1)

Country Link
CN (1) CN116450840A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130103623A1 (en) * 2011-10-21 2013-04-25 Educational Testing Service Computer-Implemented Systems and Methods for Detection of Sentiment in Writing
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN109376251A (en) * 2018-09-25 2019-02-22 南京大学 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
CN112507723A (en) * 2020-12-03 2021-03-16 南京理工大学 News emotion analysis method based on multi-model fusion
CN114238577A (en) * 2021-12-17 2022-03-25 中国计量大学上虞高等研究院有限公司 Multi-task learning emotion classification method integrated with multi-head attention mechanism
CN115525763A (en) * 2022-10-25 2022-12-27 南京理工大学 Emotion analysis method based on improved SO-PMI algorithm and fusion word vector
CN115619443A (en) * 2021-07-15 2023-01-17 西安电子科技大学青岛计算技术研究院 Company operation prediction method and system for emotion analysis based on annual report of listed company

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130103623A1 (en) * 2011-10-21 2013-04-25 Educational Testing Service Computer-Implemented Systems and Methods for Detection of Sentiment in Writing
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN109376251A (en) * 2018-09-25 2019-02-22 南京大学 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
CN112507723A (en) * 2020-12-03 2021-03-16 南京理工大学 News emotion analysis method based on multi-model fusion
CN115619443A (en) * 2021-07-15 2023-01-17 西安电子科技大学青岛计算技术研究院 Company operation prediction method and system for emotion analysis based on annual report of listed company
CN114238577A (en) * 2021-12-17 2022-03-25 中国计量大学上虞高等研究院有限公司 Multi-task learning emotion classification method integrated with multi-head attention mechanism
CN115525763A (en) * 2022-10-25 2022-12-27 南京理工大学 Emotion analysis method based on improved SO-PMI algorithm and fusion word vector

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡家珩 等: "基于深度学习的领域情感词典自动构建――以金融领域为例", 《数据分析与知识发现》, pages 95 - 101 *

Similar Documents

Publication Publication Date Title
CN110442760B (en) Synonym mining method and device for question-answer retrieval system
CN105824922B (en) A kind of sensibility classification method merging further feature and shallow-layer feature
CN107229610B (en) A kind of analysis method and device of affection data
CN111767741B (en) Text emotion analysis method based on deep learning and TFIDF algorithm
CN109977413A (en) A kind of sentiment analysis method based on improvement CNN-LDA
Feng et al. Enhanced sentiment labeling and implicit aspect identification by integration of deep convolution neural network and sequential algorithm
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN107895000B (en) Cross-domain semantic information retrieval method based on convolutional neural network
CN110929034A (en) Commodity comment fine-grained emotion classification method based on improved LSTM
CN107357793A (en) Information recommendation method and device
CN110209818B (en) Semantic sensitive word and sentence oriented analysis method
CN108319734A (en) A kind of product feature structure tree method for auto constructing based on linear combiner
CN110598219A (en) Emotion analysis method for broad-bean-net movie comment
CN108388660A (en) A kind of improved electric business product pain spot analysis method
CN113673254B (en) Knowledge distillation position detection method based on similarity maintenance
CN112861541A (en) Commodity comment sentiment analysis method based on multi-feature fusion
Morales et al. An investigation of deep learning systems for suicide risk assessment
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN112200674B (en) Stock market emotion index intelligent calculation information system
CN117648984A (en) Intelligent question-answering method and system based on domain knowledge graph
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
CN109635289A (en) Entry classification method and audit information abstracting method
CN116450840A (en) Deep learning-based field emotion dictionary construction method
Sani et al. Sentiment Analysis of Hausa Language Tweet Using Machine Learning Approach
Mukherjee et al. Aspect based sentiment analysis of student housing reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination