CN112732907B

CN112732907B - Financial public opinion analysis method based on multi-scale circulation neural network

Info

Publication number: CN112732907B
Application number: CN202011578594.6A
Authority: CN
Inventors: 马千里; 林镇溪
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-06-10
Anticipated expiration: 2040-12-28
Also published as: CN112732907A

Abstract

The invention discloses a financial public opinion analysis method based on a multi-scale recurrent neural network, which comprises the following steps: acquiring financial text data, and preprocessing the data; sampling the preprocessed financial text data by using a sliding window to obtain a subsequence of each time step, inputting the subsequence into a group of recurrent neural networks to extract local feature representation of the text sequence, and obtaining significance feature representation of the text sequence through maximum pooling operation; extracting different significance characteristic representations of the text sequence by using a plurality of sliding windows with different scales, and finally splicing to obtain multi-scale characteristic representations of the sequence; and inputting the multi-scale feature representation into a full connection layer and a softmax layer for classification. The method uses sliding windows with different scales to sample text subsequences, models local phrase characteristics with different scales through a group circulation neural network, and fuses the characteristics with different scales to obtain semantic characteristics of the text, so that the accuracy of financial public opinion analysis is further improved.

Description

Financial public opinion analysis method based on multi-scale circulation neural network

Technical Field

The invention relates to the technical field of financial public opinion analysis, in particular to a financial public opinion analysis method based on a multi-scale circulation neural network.

Background

With the rapid development of internet technology, a large amount of information is generated every day, and how to discriminate and extract a large amount of information is very important. Particularly in the financial field, various financial texts reflect the emotion of investors, and the emotion of investors determines the behavior of investors, so that the trend of the whole market is influenced. By carrying out public opinion analysis on the financial texts, the development trend of the financial market can be known, and monitoring of the financial market and abnormal processing of stock prices are facilitated. Therefore, the public sentiment analysis of the financial text is of great significance.

The traditional financial public opinion analysis method is mainly based on an emotion dictionary and a machine learning theory, the emotion dictionary analyzes corresponding emotion polarities through the number of positive and negative emotion words in financial texts, and the machine learning method comprises a bag-of-words model, naive Bayes, logistic regression and the like. However, the traditional method relies on the characteristics of manual design, has high cost, and cannot fully model semantic information and multi-scale information of financial texts. Because the neural network can automatically extract the characteristics of the text, many current neural network-based methods are applied to the financial public opinion analysis, and the convolutional neural network and the long-short term memory network are more common and effective. The convolutional neural network can capture local continuous phrase information of the financial text, but because the convolution operation is linear, discontinuous phrase structures in the text cannot be sufficiently modeled, such as expression of some emotional transitions. The long-short term memory network can effectively model the sequence information of the financial text, however, the long-short term memory network is a biased model, tends to the information at the end of the text, and cannot model multi-scale information in the financial text. Due to the fact that the financial public opinion text data set with the label is limited, the scale of the current model parameter is relatively large, model overfitting and characteristic redundancy are easily caused, and the accuracy of public opinion analysis is reduced.

Generally, the financial public opinion has the characteristics of timeliness, subjectivity, wide spreading and the like. In order to discriminate core information, the key of financial public opinion analysis is to extract a plurality of key phrase information in a text so as to understand semantic information and emotional tendency contained in the text, and the phrase information generally has different scales. In order to better model text semantic information and multi-scale information, a more time-efficient financial public opinion analysis method is urgently needed to be provided at present.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a financial public opinion analysis method based on a multi-scale recurrent neural network.

The purpose of the invention can be achieved by adopting the following technical scheme:

a financial public opinion analysis method based on a multi-scale recurrent neural network comprises the following steps:

s1, acquiring financial text data, and preprocessing the financial text data to obtain a text sequence;

s2, sampling the text sequence obtained in the step S1 by using a sliding window to obtain a subsequence of each time step, inputting the subsequence into a group recurrent neural network GRNN to extract local feature representation of the text sequence, and then obtaining significance feature representation of the text sequence by using maximum pooling operation;

s3, extracting different significance characteristic representations of the text sequence by using a plurality of sliding windows with different scales, and finally obtaining the multi-scale characteristic representation of the text sequence through splicing operation;

and S4, inputting the full connection and the softmax layer into the multi-scale feature representation obtained in the step S3 for classification.

Further, the calculation process of the saliency-feature representation of the text sequence in step S2 is as follows:

s2.1, semantic information of a general text is understood by some keywords or phrases, and although a traditional CNN network has the capability of capturing local phrases, the discontinuous dependence capability of the text is difficult to model due to the operation of convolution linearity; while RNN networks have the ability to model discontinuous dependencies, but ignore the context ahead of the text due to bias. In order to better model semantic features of local phrases in a sequence, a text subsequence is adopted by using a sliding window with the size of s, local feature representations of each position are extracted by utilizing GRNN, and the mode comprises the capability of CNN local modeling and the capability of RNN discontinuous dependence modeling.

In particular, given an input text sequence X ═ { X₁,x₂,…,x_t…,x_TWhere T is the length of the text sequence,

is the word input at time step T, T is 1, 2, …, T, d₀Is the input dimension for each word. Sampling words of the first s time steps of the time step t to form a text sub-wordSequence X_t＝{x_t-s+1,…,x_tH, converting the subsequence X_tInputting the data into a group recurrent neural network GRNN, and capturing the discontinuous dependence of the subsequences by using a recurrent structure. The GRNN is a recurrent neural network composed of K different initialized long-term and short-term memory networks, and each long-term and short-term memory network is responsible for modeling different semantic features of a sequence and is beneficial to understanding word ambiguity.

Inputting the subsequence into GRNN, and taking the hidden state output at the last time step as the local characteristic representation of the t-th time step of the text sequence

S2.2, obtaining local characteristic representation of t-th time step of GRNN (group recurrent neural network) by splicing hidden state representations of K long-term and short-term memory networks

Wherein

Representing the hidden state of the t-th time step obtained by the K-th long-short term memory network, wherein K is 1, 2, … and K, d is the dimension of each hidden state,

the calculation formula of (a) is as follows:

wherein

Respectively an input gate, a forgetting gate and an output gate of the kth long-short term memory network,

respectively information of the current joining of the kth long-short term memory network and information of the memory unit, sigma and tanh are nonlinear activation functions,

it is shown that the multiplication is element-by-element,

and

is a trainable parameter in the kth long-short term memory network;

obtaining local feature representation of each time step through the above calculation formula

Splicing to form a feature matrix

S2.3, feature matrix is paired along time dimension

Performing maximal pooling operation to obtain significant feature representation of the sequence

FⁱRepresents the value of the ith dimension of the vector F,

to represent

And taking the value of the ith dimension, wherein max represents the operation of taking the maximum value. The significance signature F represents a discriminating signature that can play the most important role in classification, such as some phrases or keywords with emotional expressions, while filtering unimportant information.

Furthermore, K long and short term memory networks in the group recurrent neural network GRNN can be calculated in parallel, and the running time is accelerated.

Further, the multi-scale feature representation calculation process of the text sequence in step S3 is as follows:

since text naturally contains multi-scale information, e.g., phrases having different lengths. In order to extract multi-scale information in a text sequence, M sliding windows with different scales are used for extracting significance characteristic representations of different sequences, wherein the scale size of the mth sliding window is s_m(ii) a Repeating the operation of step S2, the significance signature obtained for the sliding window of the m-th scale is denoted as F_mAnd M is 1, 2, … and M, and the multi-scale feature representation of the sequence is obtained through a splicing operation

Further, the classification process in step S4 is as follows:

representing the multi-scale features obtained in the step S3

Inputting a full connection layer and a softmax layer for classification, wherein the formula is as follows:

wherein, W_fIs an affine transformation matrix composed of trainable parameters, b_fIs the bias term, ReLU and softmax are non-linear activation functions,

is a predicted distribution;

training with the loss L as a training target, wherein the loss L is expressed as follows:

where N is the number of samples of the dataset, J is the number of classes of the dataset, M is the number of different scale sliding windows, y is the true distribution of the samples,

is the predicted distribution of the samples and is,

is the jth dimension of the distribution y in the nth sample,

is the distribution in the nth sample

The (j) th dimension of (a),

is the dimension s_mThe penalty of (2) is lost,λ is a hyper-parameter that is used to weigh the weight of the classification penalty and the penalty.

Further, without conditional constraints, different long-short term memory networks in GRNN may learn similar feature representations, resulting in feature redundancy. To increase the diversity of features and avoid feature redundancy, the scale s_mPenalty loss of

Comprises the following steps:

wherein, the matrix I is an identity matrix,

respectively is of dimension s_mThe trainable parameters of the ith long-short term memory network and the jth long-short term memory network of GRNN, | · |₂Representing a 2 norm.

Compared with the prior art, the invention has the following advantages and effects:

(1) the invention provides a novel multi-scale recurrent neural network for financial public opinion analysis, which combines the capability of CNN network local phrase feature modeling and the capability of RNN network discontinuous dependence modeling. Compared with CNN and RNN, the method can obtain better accuracy in financial text analysis.

(2) In order to better model financial text semantic information and multi-scale information, the text subsequence is sampled by using sliding windows with different scales, local phrase characteristics with different scales are modeled by a recurrent neural network GRNN, and the semantic characteristics of the text are obtained by fusing the characteristics with different scales.

Drawings

Fig. 1 is a flowchart of a method for financial public opinion analysis based on a multi-scale recurrent neural network disclosed in an embodiment of the present invention;

fig. 2 is a network structure diagram of the group recurrent neural network GRNN disclosed in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

As shown in fig. 2, the embodiment specifically discloses a financial public opinion analysis method based on a multi-scale recurrent neural network, which includes the following steps:

and S1, acquiring financial text data, and preprocessing the financial text data to obtain a text sequence. In practice, the data used is derived from the "SmoothNLP" public data set, which contains approximately 2 ten thousand financial messaging news texts.

And S2, sampling the text sequence obtained in the step S1 by using a sliding window to obtain a subsequence of each time step, inputting the subsequence into a group recurrent neural network GRNN to extract local feature representation of the text sequence, and then obtaining the saliency feature representation of the text sequence by using a maximum pooling operation. The specific process is as follows:

s2.1, semantic information of a general text is understood by some keywords or phrases, and although a traditional CNN network has the capability of capturing local phrases, the discontinuous dependence capability of the text is difficult to model due to the operation of convolution linearity; while RNN networks have the ability to model discontinuous dependencies, but ignore the context ahead of the text due to bias. In order to better model the semantic features of local phrases in a sequence, as shown in fig. 2, a sliding window with the size of 3 is used for taking a text subsequence, and a local feature representation of each position is extracted by using GRNN, wherein the method comprises the capability of local modeling of CNN and the capability of non-continuously dependent modeling of RNN.

Specifically, as shown in fig. 2, given an input sequence X ═ { X ═ X₁,x₂,x₃,x₄The sentence "I happy",

is a 300-dimensional word vector input at time step t. The word of the first 3 time steps of the sampling time step t forms a subsequence X_t＝{x_t-2,x_t-1,x_tH, converting the subsequence X_tInputting the data into a group recurrent neural network GRNN, and capturing the discontinuous dependence of the subsequences by using a recurrent structure. The GRNN is a recurrent neural network composed of 4 different initialized long-term and short-term memory networks, and each long-term and short-term memory network is responsible for modeling different semantic features of a sequence and is helpful for understanding word ambiguity.

S2.2, obtaining local feature representation of t-th time step of GRNN (group recurrent neural network) by splicing hidden state representations of 4 long-term and short-term memory networks

Wherein

A 50-dimensional hidden state representing the t-th time step obtained by the k-th long-short term memory network, wherein k is 1, 2, 3 and 4,

the calculation formula of (a) is as follows:

wherein

it is shown that the multiplication is element-by-element,

and

is a trainable parameter in the kth long-short term memory network;

obtaining local feature representation of 4 time steps by the above calculation formula

Splicing to form a feature matrix

S2.3, feature matrix is paired along time dimension

FⁱRepresents the value of the ith dimension of the vector F,

represent

And taking the value of the ith dimension, wherein max represents the operation of taking the maximum value. The significance signature F represents a discriminating signature that can play the most important role in classification, such as some phrases or keywords with emotional expressions, while filtering unimportant information. For example, the emotional word "happy" in the sentence "i happy" is the most important information in the classification.

The 4 long and short term memory networks in the group circulation neural network GRNN can perform parallel computation, and the running time is shortened.

And S3, extracting different salient feature representations of the text sequence by using a plurality of sliding windows with different scales, and finally obtaining the multi-scale feature representation of the text sequence through splicing operation. The specific process is as follows:

since text naturally contains multi-scale information, e.g., phrases having different lengths. In order to extract multi-scale information in a text sequence, as shown in fig. 2, 2 sliding windows with different scales are used to extract different significance feature representations of the sequence, and the scale sizes of the sliding windows are 3 and 2 respectively; repeating the operation of step S2 to obtain the significance characteristic table for the sliding window of the m scaleShown as F_mAnd m is 1 and 2, and the multi-scale feature representation of the sequence is obtained through the splicing operation

And S4, inputting the full connection and the softmax layer into the multi-scale feature representation obtained in the step S3 for classification. The specific classification process is as follows:

representing the multi-scale features obtained in the step S3

is a predicted distribution;

where N is the number of samples of the dataset, J is the number of classes of the dataset, M2 is the number of different scale sliding windows, y is the true distribution of the samples,

is the predicted distribution of the samples and is,

is the jth dimension of the distribution y in the nth sample,

is the distribution in the nth sample

The (j) th dimension of (a),

is the dimension s_mλ ═ 0.001 is a hyper-parameter, and is used to weigh the classification loss and the penalty loss.

Without conditional constraints, different long-short term memory networks in GRNN may learn similar feature representations, resulting in feature redundancy. To increase the diversity of features and avoid feature redundancy, the scale s_mPenalty loss of

Comprises the following steps:

wherein, the matrix I is an identity matrix,

In summary, the embodiment combines the advantages of the CNN and the RNN, uses the sliding window sampling subsequence, uses the group recurrent neural network GRNN encoding subsequence feature as the local semantic representation of the text, and extracts the multi-scale local phrase feature of the text by adopting the sliding windows of different scales, which is beneficial to better understanding the text semantic representation. Compared with CNN and RNN, the method has better accuracy in financial text analysis, is beneficial to discrimination and filtering of information, and analyzes the future trend of financial markets.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The financial public opinion analysis method based on the multi-scale recurrent neural network is characterized by comprising the following steps of:

wherein, the calculation process of the saliency characteristic representation of the text sequence in step S2 is as follows:

s2.1, given the input text sequence X ═ { X₁,x₂,…,x_t…,x_TWhere T is the length of the text sequence,

is the word input at time step T, T is 1, 2, …, T, d₀Is the input dimension of each word, and a sliding window with the size of s is used for extracting the local feature representation of each position; the words of the first s time steps of the sampling time step t form a subsequence X_t＝{x_t-s+1,…,x_tH, converting the subsequence X_tInputting the hidden state into a group recurrent neural network GRNN, wherein the group recurrent neural network GRNN is a recurrent neural network consisting of K different initialized long-term and short-term memory networks, and the hidden state output at the last time step of the subsequence is taken as the tth time of the text sequenceLocal feature representation of a step

Wherein

the calculation formula of (a) is as follows:

wherein

An input gate, a forgetting gate and an output gate of the kth long-short term memory network respectively，

Respectively, the information of the current addition of the kth long-and-short term memory network and the information of the memory cell, σ, tanh are nonlinear activation functions, which indicate element-by-element multiplication,

and

is a trainable parameter in the kth long-short term memory network;

Splicing to form a feature matrix

S2.3, feature matrix is paired along time dimension

FⁱRepresents the value of the ith dimension of the vector F,

to represent

Taking the value of the ith dimension, wherein max represents the operation of taking the maximum value; s3, extracting different significance characteristic representations of the text sequence by using a plurality of sliding windows with different scales, and finally obtaining the multi-scale characteristic representation of the text sequence through splicing operation;

in step S3, the multi-scale feature representation calculation process of the text sequence is as follows:

extracting significance feature representations with different sequences by using M sliding windows with different scales, wherein the scale size of the mth sliding window is s_m(ii) a Repeating the operation of step S2, the significance signature obtained for the sliding window of the m-th scale is denoted as F_mAnd M is 1, 2, … and M, and the multi-scale feature representation of the sequence is obtained through a splicing operation

S4, inputting the multi-scale feature representation obtained in the step S3 into a full connection layer and a softmax layer for classification, wherein the classification process in the step S4 is as follows:

representing the multi-scale features obtained in the step S3

wherein, W_fAffine transformation moments consisting of trainable parametersArray, b_fIs the offset term, ReLU and softmax are non-linear activation functions,

is a predicted distribution;

is the predicted distribution of the samples and is,

is the jth dimension of the distribution y in the nth sample,

is the distribution in the nth sample

The (j) th dimension of (a),

is the dimension s_mλ is a hyper-parameter, used to weigh the weight of classification loss and penalty loss;

the penalty loss

The expression of (a) is as follows:

wherein, the matrix I is an identity matrix,

respectively is of dimension s_mThe trainable parameters of the ith long-short term memory network and jth long-short term memory network of GRNN, | · |₂Representing a 2 norm.

2. The method of claim 1, wherein K long-term and short-term memory networks in the GRNN are computed in parallel to accelerate the running time.