CN115271816A

CN115271816A - Bulk commodity price prediction method and device based on emotion index

Info

Publication number: CN115271816A
Application number: CN202210922285.9A
Authority: CN
Inventors: 任俊玲; 许英姿
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-11-01
Anticipated expiration: 2042-08-02
Also published as: CN115271816B

Abstract

The invention provides a bulk commodity price prediction method and device based on emotion indexes. The method comprises the following steps: constructing a bulk commodity emotion dictionary based on bulk commodity news and a universal emotion dictionary; obtaining news of a large number of commodities, and calculating an emotion index of each commodity in each time period based on a large number of commodity emotion dictionary; and predicting the commodity price based on the emotion indexes of each commodity in each time period. The invention expands the existing general emotion dictionary into a bulk commodity emotion dictionary, and uses the bulk commodity emotion dictionary for calculating the emotion index of the commodity, thereby improving the accuracy of predicting the commodity price based on the emotion index.

Description

Bulk commodity price prediction method and device based on emotion index

Technical Field

The invention belongs to the technical field of price prediction, and particularly relates to a bulk commodity price prediction method and device based on an emotion index.

Background

The emotion of investors directly influences the market price, and the action mechanism of the method is as follows: irrational factors such as subjective understanding influence the behavior of investors, so that the supply demand of commodities changes, the supply demand influences the price, and the emotion of the investors finally influences the market price. Since most of the data reflecting the emotion of the investor is unstructured texts which cannot be directly processed by a computer, such as financial news, investor comments and the like, the texts need to be converted into numerical data which can be processed by the computer. And text can be quantized into numerical data using an emotion dictionary. The polarity and the score of most of the meaningful words are recorded in the emotion dictionary, the scores of all words in a section of speech can be obtained by matching the text with the words in the emotion dictionary, and the scores of all words are collected, namely the scores of the section of speech.

Based on news texts or investor comments of stock markets, using a general emotion dictionary or based on the general emotion dictionary, using an algorithm to expand and construct an emotion dictionary in the field, wherein the expansion method comprises the following steps: and taking the words of the general emotion dictionary as reference words, preprocessing the text and taking the word segmentation result as candidate words, comparing whether the current candidate words exist in the reference word set one by one, if so, continuously comparing the next candidate words, if not, judging the similarity of the reference words and the candidate words by using an algorithm, filtering out the words similar to the reference words, and adding the words into the general emotion dictionary. And when calculating the emotion index, matching the word segmentation result of the text with words in the emotion dictionary by using the emotion dictionary, calculating the emotion value of the text according to the scores of the positive words, the negative words and the degree words, and summarizing to obtain the emotion index. And (4) inputting the emotion index as one of the prediction characteristics into a prediction model to realize the prediction of the stock price. However, the existing emotion dictionary is not adapted to the field of bulk commodities, so that the emotion indexes of bulk commodity varieties are unreasonably calculated, and text emotions cannot be correctly reflected, thereby influencing the application of subsequent emotion indexes in price prediction. For example: the term "diving" generally refers to sports, belongs to neutral words, but in the field of bulk goods, after being matched with "price", refers to the phenomenon that the price drops greatly, and belongs to negative emotional words. Therefore, the general emotion dictionary has low adaptability to the bulk commodity field, and cannot meet the calculation precision of the emotion index in the field.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a bulk commodity price prediction method and device based on an emotion index.

In order to achieve the above object, the present invention adopts the following technical solutions.

In a first aspect, the invention provides a bulk commodity price prediction method based on an emotion index, which comprises the following steps:

constructing a bulk commodity emotion dictionary based on bulk commodity news and a universal emotion dictionary;

obtaining news of a large number of commodities, and calculating an emotion index of each commodity in each time period based on a large number of commodity emotion dictionary;

and predicting the commodity price based on the emotion indexes of each commodity in each time period.

Further, the method for constructing the sentiment dictionary of the bulk commodity comprises the following steps:

acquiring a universal emotion dictionary comprising a positive reference word set and a negative reference word set;

obtaining news from a massive commodity news corpus, and preprocessing the news including word segmentation to obtain candidate words for constructing a massive commodity emotion dictionary;

combining each candidate word with each candidate word in a specified range around the position of the sentence to obtain a candidate combined word;

combining each candidate word with each positive reference word and each negative reference word in the general emotion dictionary to obtain a positive combination word and a negative combination word;

judging whether each candidate combination word exists in the positive combination word or the negative combination word, if so, respectively calculating the emotional tendency coefficients K of two candidate words in each candidate combination word by using an emotional tendency point mutual information algorithm, and if K is greater than 0, the candidate words are the positive emotional words; when K =0, the candidate word is a neutral emotion word; when K is less than 0, the candidate word is a negative sentiment word;

respectively merging the candidate words meeting K > a first threshold >0 and the candidate words meeting K < a second threshold <0 into a positive benchmark word set and a negative benchmark word set of the general emotion dictionary;

and screening the expanded general emotion dictionary including duplication elimination to obtain a bulk commodity emotion dictionary.

Further, the emotional tendency coefficient K of the candidate word c is calculated by the following formula:

in the formula (I), the compound is shown in the specification,

the number n is the number of the ith front reference word in the front reference word set, i =1,2, \ 8230;

j =1,2, \8230forthe jth negative reference word in the negative reference word set, and m, m is the number of the negative reference words; a count (c),

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

as candidate word c and

the number of simultaneous occurrences in the corpus,

is a candidate word c and

the number of simultaneous occurrences in the corpus, N is the total word frequency.

Further, the method for calculating the emotion index of the commodity in a period of time comprises the following steps:

acquiring all news of the commodity in the time period from a large commodity news corpus;

dividing each news into sentences;

segmenting each sentence, and calculating the emotion index of each sentence by calculating the emotion index of each word and considering the influence of negative words and degree words;

summing the emotion indexes of each sentence forming each news to obtain the emotion index of each news; and averaging the emotion indexes of the news to obtain the emotion index of the commodity in the time period.

Further, the method of calculating the emotion index of a sentence includes:

s1, setting an emotion index variable word _ polar of a word, and taking values as 1, -1 and 0; setting a negation word influence variable dense _ sign to take the value as 1 or-1; setting a degree word influence variable degree _ sign, wherein the value range is [1, C ]; initializing density _ sign =1, default \/sign =1, i =1;

s2, acquiring the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i =1 dense sign, convert S4; if w _i If the word is a negative word in the bulk commodity emotion dictionary, then word _ polar _i = (-1) × dense _ sign, go 4; if w _i Is not in the bulk goodsIn the emotion dictionary, word _ polar _i ＝0；

S3, if w _i If the word is negative, then dense _ sign = -1; if w _i To be a degree word, then obtain w _i Degree value of (d) degree _ sign _i ，degree_sign＝degree_sign _i ；

S4, if i is smaller than M, updating i to i +1 and then turning to S2, otherwise turning to S5, wherein M is the number of words in the sentence;

s5, calculating the emotion index of the sentence according to the following formula:

wherein Q is the emotion index of the sentence.

In a second aspect, the present invention provides a bulk commodity price prediction device based on an emotional index, including:

the dictionary construction module is used for constructing a bulk commodity emotion dictionary based on bulk commodity news and a general emotion dictionary;

the emotion index calculation module is used for acquiring news of a large commodity and calculating an emotion index of each commodity in each time period based on a large commodity emotion dictionary;

and the price prediction module is used for predicting the commodity price based on the emotion index of each commodity in each time period.

Further, the dictionary construction module is specifically configured to:

obtaining news from a large commodity news corpus, and preprocessing the news including word segmentation to obtain candidate words for constructing a large commodity emotion dictionary;

respectively merging the candidate words meeting K > a first threshold >0 and the candidate words meeting K < a second threshold <0 into a positive reference word set and a negative reference word set of the general emotion dictionary;

and screening the expanded general emotion dictionary by duplication elimination to obtain a bulk commodity emotion dictionary.

Further, the emotional tendency coefficient K of the candidate word c is calculated as:

in the formula (I), the compound is shown in the specification,

the number n is the number of the front reference words;

j =1,2, \ 8230for the jth negative reference word in the negative reference word set, and m, m is the number of negative reference words; a count (c),

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

is a candidate word c and

the number of simultaneous occurrences in the corpus,

is a candidate word c and

dividing each news into sentences;

Further, the method of calculating the emotion index of a sentence includes:

s1, setting an emotion index variable word _ polar of a word, and taking values as 1, -1 and 0; setting a negative word influence variable dense _ sign, wherein the value is 1 or-1; setting a degree word influence variable degree _ sign, wherein the value range is [1, C ]; initializing density _ sign =1, default \/sign =1, i =1;

s2, obtaining the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i =1 dense sign, convert S4; if w _i If the word is a negative word in the bulk commodity emotion dictionary, then word _ polar _i (= (-1) × dense _ sign) × density _ sign, go S4; if w _i If the product is not in the large commodity emotion dictionary, then word _ polar _i ＝0；

wherein Q is the sentiment index of the sentence.

Compared with the prior art, the invention has the following beneficial effects.

According to the method, the bulk commodity sentiment dictionary is constructed based on the bulk commodity news and the general sentiment dictionary, the bulk commodity news is obtained, the sentiment index of each commodity in each time period is calculated based on the bulk commodity sentiment dictionary, the commodity price is predicted based on the sentiment index of each commodity in each time period, and the commodity price prediction based on the sentiment index is realized. The invention expands the existing general emotion dictionary into a bulk commodity emotion dictionary, and uses the bulk commodity emotion dictionary for calculating the emotion index of the commodity, thereby improving the accuracy of predicting the commodity price based on the emotion index.

Drawings

Fig. 1 is a flowchart of a bulk commodity price prediction method based on an emotional index according to an embodiment of the present invention.

FIG. 2 is a flow chart of another embodiment of the present invention.

FIG. 3 is a flow chart of the construction of a large commodity emotion dictionary.

Fig. 4 is a flowchart of single news emotion value calculation.

Fig. 5 is a block diagram of a device for predicting prices of bulk goods based on emotional index according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described below with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a bulk commodity price prediction method based on an emotion index according to an embodiment of the present invention, where the method includes the following steps:

step 101, constructing a bulk commodity emotion dictionary based on bulk commodity news and a general emotion dictionary;

102, acquiring news of a bulk commodity, and calculating an emotion index of each commodity in each time period based on a bulk commodity emotion dictionary;

and 103, predicting the commodity price based on the emotion index of each commodity in each time period.

In this embodiment, step 101 is mainly used to construct a bulk commodity emotion dictionary. Practice shows that market emotion can influence factors such as bulk commodity price. The emotion in the text is an important embodiment of market emotion, the emotion of the bulk commodity text is quantized into an emotion index which can be processed by a computer, an emotion dictionary is needed, the emotion of the bulk commodity text is difficult to accurately quantize by the conventional general dictionary, and the emotion calculation is prone to deviation. For example: the term "diving" generally refers to sports, belongs to neutral words, but in the field of bulk goods, after being matched with "price", refers to the phenomenon that the price drops greatly, and belongs to negative emotional words. Therefore, the general emotion dictionary has low adaptability to the field of bulk commodities and is not enough to support the calculation of the emotion indexes in the field. Therefore, the embodiment constructs a bulk commodity emotion dictionary for calculating the emotion indexes in the bulk commodity field on the basis of bulk commodity news and a general emotion dictionary. The construction method comprises the steps of extracting candidate words from news of the bulk commodity, screening and screening the candidate words, adding the candidate words into a general emotion dictionary, and expanding the general emotion dictionary into a bulk commodity emotion dictionary. The following embodiment will provide a specific construction method of the bulk commodity emotion dictionary.

In this embodiment, step 102 is mainly used to calculate the emotion index of the commodity. The emotion indexes of the commodities are obtained by summarizing the emotion indexes of sentences and vocabularies in news texts of the commodities. The emotional words generally include positive emotional words, negative emotional words, and neutral emotional words. The emotion index of a word mainly depends on its emotional attribute (which emotional word belongs to), for example, the emotion index of a positive emotional word can be set to a positive number, the emotion index of a negative emotional word can be set to a negative number, and the emotion index of a neutral emotional word can be set to 0. Of course, in order to calculate the emotion index of a word more accurately, the influence of the negation word and the degree word preceding the word is also considered. The emotion attribute of a word can be obtained by querying an emotion dictionary. After the large commodity emotion dictionary is built, the commodity emotion index can be calculated more effectively by replacing the general emotion dictionary. Certainly, in order to effectively predict the price of the commodity, the emotion index of each commodity in different time periods needs to be obtained, the time period is determined according to industry experience, and generally 1 day can be taken, namely the emotion index of each commodity every day is calculated. The following embodiment will provide a technical solution for calculating the emotional index of the commodity within a period of time.

In this embodiment, step 103 is mainly used to predict the commodity price based on the emotion index. The method for predicting the commodity price based on the emotion index is various, for example, an artificial neural network prediction model can be constructed, the output of the model is the commodity price, the emotion index of the commodity is input, and other quantities which have obvious influence on the commodity price, such as the indexes of position holding quantity, volume of bargain, amount of bargain and the like of the commodity are also included. And constructing a training set and a testing set by carrying out data preprocessing steps such as missing value processing, standardization and the like, training a model by using the training set, and testing and optimizing the model by using the testing set. The emotion index of the commodity and the like are input into the trained model, so that the price of the commodity can be predicted. The concrete prediction method can refer to the 'coal price prediction research based on BP neural network' which is a paper published in 2021 of Yinjian Heng in 'science and technology and Innovation' stage 02.

According to the commodity price forecasting method and device, the existing general emotion dictionary is expanded into the bulk commodity emotion dictionary, and the bulk commodity emotion dictionary is used for calculating the commodity emotion index, so that the commodity price forecasting precision based on the emotion index is improved.

Another embodiment is a flow chart shown in fig. 2, and fig. 2 shows a flow chart of price prediction of two commodities, namely a and B.

As an optional embodiment, the method for constructing the bulk commodity emotion dictionary based on the universal emotion dictionary comprises the following steps:

acquiring a general emotion dictionary comprising a positive reference word set and a negative reference word set;

judging whether each candidate combination word exists in the positive combination word or the negative combination word, if so, respectively calculating the emotional tendency coefficients K of two candidate words in each candidate combination word by using an emotional tendency point mutual information algorithm, and if K is greater than 0, the candidate words are the positive emotional words; when K =0, the candidate word is a neutral emotion word; when K is less than 0, the candidate word is a negative emotion word;

The embodiment provides a technical scheme for constructing a large commodity emotion dictionary, which is shown in fig. 3. In the embodiment, a bulk commodity emotion dictionary is constructed based on the general emotion dictionary, so that it is necessary to introduce the existing financial field general emotion dictionary first. The large commodity trading market is mainly divided into a spot market, an electronic trading market and a futures market. Futures, including commodity futures and financial futures, are equivalent to stocks in the financial field. Through observation of bulk commodity news text and financial domain news text, the following results are found: except for special major commodity vocabularies (such as 'warehouse-through', 'arbitrage' and 'hedging'), most of the major commodity field vocabularies are mentioned in the financial field, so the embodiment expands the existing financial field emotion dictionary to obtain the major commodity emotion dictionary. The Yao weighting and the like adopt a dictionary reorganization method and a long-short term memory model, and a Chinese emotion dictionary suitable for annual newspapers (formal texts) and social media (informal texts) in the financial field is constructed by combining 4 universal emotion dictionaries. The emotion dictionary divides Chinese words into four categories of annual newspaper positive words, annual newspaper negative words, social media positive words and social media negative words according to use scenes and emotional tendency, and the specific division and the number of each category are shown in table 1.

TABLE 1 financial domain emotional dictionary Categories Table

Since the content term of the bulk commodity news text related in this embodiment is between the official term and the informal term, this embodiment combines the positive and negative words of the two contexts of annual newspaper and social media based on the research result of yao, and gets the positive and negative reference word sets by de-duplication, wherein the positive reference word set includes 3856 positive reference emotion words, the negative reference word set includes 2076 negative reference emotion words, and 5932 reference emotion words in total. Through statistics, 3000 reference emotional words in the financial field Chinese emotional dictionary appear in the word segmentation result of the bulk commodity corpus, account for about 50.57% of the total reference emotional words, the fact that the dictionary has a certain degree of engagement with the bulk commodity field news text is shown, and the bulk commodity emotional dictionary can be constructed based on the dictionary.

After the general emotion dictionary is determined, news is acquired from a large commodity news corpus, and candidate words for constructing the large commodity emotion dictionary are obtained by preprocessing news texts. The preprocessing of the embodiment mainly comprises the steps of performing word segmentation on the text by using a jieba library of python language, and removing stop words and punctuation marks without obvious meanings such as 'some', 'soon' and the like. The embodiment adds the general emotion dictionary on the basis of the jieba initial dictionary, so that the words in the dictionary can be prevented from being cut by mistake. For example, in the financial domain emotion dictionary, the negative word "debt ascends", two words "debt" and "ascent" exist in the jieba initial dictionary, but there is no "debt ascent" phrase, and if the financial domain emotion dictionary is not added, the phrase is mistakenly segmented into the negative term "debt" and the positive verb "ascent", and the emotion values of the two are added and zero, and finally, the phrase is determined as a neutral phrase that does not contribute to the emotion value calculation. However, the word is a negative word whose original meaning is increased in liability, and the above determination cannot accurately represent the word original meaning. Because the phrase 'burden rising' exists in the financial field emotion dictionary, after the phrase 'burden rising' is added into the financial field emotion dictionary, the phrase 'burden rising' cannot be mistakenly cut, and the emotion value can be correctly calculated.

After candidate words are extracted from the news text, each candidate word and each candidate word in a designated range around the position of the sentence are combined respectively to obtain candidate combined words. For example, if there are B and C candidate words in the same sentence as candidate word a, candidate combined words AB and AC can be obtained. After extracting candidate words from the news text, combining each candidate word with each positive reference word and each negative reference word in the general emotion dictionary to obtain a positive combination word and a negative combination word.

The following processing is performed for each candidate compound word: judging whether the candidate combined word exists in the positive combined word or the negative combined word, if so, respectively calculating the emotional tendency coefficients K of two candidate words in the candidate combined word; if not, the candidate compound word is discarded. And judging whether the candidate word is a positive word, a neutral emotional word or a negative emotional word according to the value K of the emotional tendency coefficient. When K is greater than 0, the candidate word is a positive emotion word, and the larger the K value is, the stronger the positive emotion is; when K =0, the candidate word is a neutral sentiment word; when K <0, the word is negative emotion word, and the larger the absolute value of K is, the stronger the positive emotion is. The embodiment calculates the emotional tendency coefficient of the candidate word by using a emotional tendency point Mutual Information algorithm (SO-PMI). The SO-PMI algorithm measures the mutual information value between a certain candidate word and the positive and negative reference words, the size of the mutual information value represents the correlation of the two words, then the difference value of the word and the mutual information of the positive and negative reference words is calculated, and the emotional tendency of the candidate word is judged according to the positive and negative of the difference value. One specific embodiment for calculating the emotional tendency coefficients using the SO-PMI algorithm will be given later.

After obtaining the emotional tendency coefficient of each candidate word, screening the candidate words based on the emotional tendency coefficient: and combining the positive emotion word candidate words with K being greater than the first threshold value into a positive reference word set of the general emotion dictionary, and combining the negative emotion word candidate words with K being less than the second threshold value into a negative reference word set of the general emotion dictionary. It is clear that the first threshold is >0 and the second threshold <0.

And finally, processing the expanded general emotion dictionary such as duplication elimination and manual screening to obtain a bulk commodity emotion dictionary.

In practical application, the number of the positive emotion words and the number of the negative emotion words expanded on the basis of the general emotion dictionary are likely to be large, for example, the expansion of the positive emotion words is less, the reason may be that the positive emotion word set of the original financial field emotion dictionary is complete, most of the positive emotion words merged into the positive emotion word set are overlapped with the words of the original dictionary, and the positive emotion words are screened out after duplication removal.

As an alternative embodiment, the emotional tendency coefficient K of the candidate word c is calculated by the following formula:

in the formula (I), the compound is shown in the specification,

the number n is the number of the front reference words;

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

is a candidate word c and

the number of simultaneous occurrences in the corpus,

as candidate word c and

The embodiment provides a method for calculating the emotion tendency coefficient of the candidate word. The probability of common occurrence of words (referred to as co-occurrence rate) can be used to indicate the correlation between words, and the higher the co-occurrence rate, the higher the correlation between words. In the first formula above

Is the ith positive reference word

The co-occurrence rate with the candidate word c,

is the jth negative benchmark word

Co-occurrence with candidate word c.

The calculation formula of (2) is shown in the second and third formulas, the positive value of the calculation formula represents that the candidate word is related to the reference word, the negative value represents that the candidate word and the reference word are mutually exclusive (the probability of simultaneous occurrence is very small), and the 0 value represents that the candidate word and the reference word are not related or mutually exclusive. The emotional tendency coefficient K of the candidate word c is equal to the mean value of the co-occurrence rate of c and the positive reference wordThe mean of the co-occurrence of c with the negative benchmark word is subtracted.

As an alternative embodiment, a method of calculating an emotional index of a commodity over a period of time includes:

dividing each news into sentences;

The embodiment provides a method for calculating the emotion index of a commodity in a time period. In this embodiment, all news of a commodity in the time period are acquired from a large commodity news corpus, each news is divided into sentences, the sentences are segmented, the emotion index of each sentence is calculated by calculating the emotion index of each word in the sentence, the emotion indexes of all sentences of each news are summed to obtain the emotion index of each news, and finally the emotion indexes of each news are averaged to obtain the emotion index of the commodity in the time period. It should be noted that, in the embodiment, when the emotion index of each sentence is calculated, the emotion words are taken as the main body, and the influence of the negative words and the degree words in the non-emotion words is also considered, so that the calculation accuracy of the emotion index is improved.

As an alternative embodiment, the method of calculating the sentiment index of a sentence comprises:

s2, obtaining the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i =1 dense sign, convert S4; if w _i If the word is a negative word in the bulk commodity emotion dictionary, then word _ polar _i = (-1) × dense _ sign, go 4; if w _i If the product is not in the large commodity emotion dictionary, then word _ polar _i ＝0；

S3, if w _i If the word is negative, then dense _ sign = -1; if w _i To be a degree word, then obtain w _i Degree value of (d) degree value degree _ sign _i ，degree_sign＝degree_sign _i ；

wherein Q is the sentiment index of the sentence.

The embodiment provides a technical scheme for calculating the emotion index of a single sentence. As shown in fig. 4. In the embodiment, the emotion indexes of the single words are set to be 1, -1 and 0, the emotion indexes of the positive emotion words and the negative emotion words are respectively 1-1, and when the words are not the positive emotion words or the negative emotion words, the emotion indexes are 0. A negation word influence variable dense _ sign and a degree word influence variable dense _ sign are also set. The value of the dense _ sign is 1 or-1, and if negative words such as 'not', 'no' and the like appear in the sentence, the dense _ sign = -1. The value range [1, C ] of the degree _ sign can be determined empirically, and generally takes an integer value greater than 1. An acquisition method of the default _ sign is given below. The method for calculating the emotion index of the sentence is given above in a very detailed manner, and a description thereof is not given here again, wherein the degree value ranges from (0, 3) to the next decimal place, for example, the value of "mild" word degree =0.3, and the value of "hundred percent" word degree =3.degree \usign is set as +1, which is the degree value found from the network.

Fig. 5 is a schematic diagram illustrating a price prediction apparatus for a bulk commodity based on an emotion index, the apparatus including:

the dictionary construction module 11 is used for constructing a bulk commodity emotion dictionary based on bulk commodity news and a general emotion dictionary;

the emotion index calculation module 12 is used for acquiring news of a large number of commodities and calculating an emotion index of each commodity in each time period based on an emotion dictionary of the large number of commodities;

and the price prediction module 13 is used for predicting the price of the commodity based on the emotion index of each commodity in each time period.

The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again. The same applies to the following embodiments, which are not further described.

As an optional embodiment, the dictionary building module 11 is specifically configured to:

judging whether each candidate combination word exists in the positive combination word or the negative combination word, if so, respectively calculating the emotional tendency coefficients K of two candidate words in each candidate combination word by using an emotional tendency point mutual information algorithm, and if K is greater than 0, the candidate words are the positive emotional words; when K =0, the candidate word is a neutral sentiment word; when K is less than 0, the candidate word is a negative sentiment word;

in the formula (I), the compound is shown in the specification,

the number n is the number of the front reference words;

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

is a candidate word c and

the number of simultaneous occurrences in the corpus,

as candidate word c and

dividing each news into sentences;

As an alternative embodiment, the method of calculating the emotion index of a sentence includes:

s1, setting an emotion index variable word _ polar of a word, and taking values as 1, -1 and 0; setting a negative word influence variable dense _ sign, wherein the value is 1 or-1; setting a degree word influence variable degree _ sign, wherein the value range is [1, C ]; initializing dense _ sign =1, degree \usign =1, i =1;

s2, obtaining the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i ＝1*deny_sign*degree_sign, turning to S4; if w _i If the word is a negative word in the large commodity emotion dictionary, then word _ polar _i (= (-1) × dense _ sign) × density _ sign, go S4; if w _i If the word _ polar is not in the large commodity emotion dictionary, then word _ polar _i ＝0；

wherein Q is the sentiment index of the sentence.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A bulk commodity price prediction method based on emotion indexes is characterized by comprising the following steps:

acquiring news of a large quantity of commodities, and calculating an emotion index of each commodity in each time period based on a large quantity of commodity emotion dictionary;

and predicting the commodity price based on the emotion index of each commodity in each time period.

2. The method for predicting price of bulk commodity based on emotion index as recited in claim 1, wherein the method for constructing emotion dictionary of bulk commodity comprises:

3. The method for predicting price of bulk goods based on emotion index according to claim 2, wherein a formula for calculating emotion tendency coefficient K of candidate word c is:

in the formula (I), the compound is shown in the specification,

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

is a candidate word c and

the number of simultaneous occurrences in the corpus,

is a candidate word c and

4. The method of predicting prices of bulk commodities based on emotion indexes as recited in claim 1, wherein the method of calculating emotion indexes of a commodity over a period of time comprises:

dividing each news into sentences;

summing the emotion indexes of each sentence forming each news to obtain the emotion index of each news; and averaging the emotion indexes of the news to obtain the emotion indexes of the commodity in the time period.

5. The method for predicting prices of commodities in bulk according to claim 4, wherein the method for calculating the emotion index of a sentence comprises:

s2, acquiring the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i =1 dense sign, convert S4; if w _i If the word is a negative word in the bulk commodity emotion dictionary, then word _ polar _i = (-1) × dense _ sign, go 4; if w _i If the product is not in the large commodity emotion dictionary, then word _ polar _i ＝0；

S3, if w _i If the word is negative, then dense _ sign = -1; if w _i To be a degree word, then obtain w _i Degree value of (d)gn _i ，degree_sign＝degree_sign _i ；

wherein Q is the sentiment index of the sentence.

6. A bulk goods price prediction device based on an emotion index, comprising:

the dictionary building module is used for building a bulk commodity emotion dictionary based on bulk commodity news and a general emotion dictionary;

7. The sentiment-index-based bulk commodity price prediction device according to claim 6, wherein the dictionary building module is specifically configured to:

judging whether each candidate combined word exists in the positive combined words or the negative combined words, if so, respectively calculating the emotional tendency coefficients K of two candidate words in each candidate combined word by using an emotional tendency point mutual information algorithm, and if K is greater than 0, the candidate words are the positive emotional words; when K =0, the candidate word is a neutral emotion word; when K is less than 0, the candidate word is a negative emotion word;

8. The apparatus for predicting price of commodity according to claim 7, wherein said emotion tendency coefficient K of candidate word c is calculated by the following formula:

in the formula (I), the compound is shown in the specification,

as the first in the negative reference word setj negative reference words, j =1,2, \8230, m, m is the number of the negative reference words; a count (c),

And

are respectively candidate words c,

And

the number of occurrences in the corpus is,

is a candidate word c and

the number of simultaneous occurrences in the corpus,

is a candidate word c and

9. The apparatus for predicting price of commodity according to claim 6, wherein the method for calculating the emotional index of a commodity during a period of time comprises:

dividing each news into sentences;

10. The sentiment index-based commodity price prediction device of claim 9, wherein the method of calculating the sentiment index of a sentence comprises:

s2, acquiring the ith word w in the sentence _i If w is _i If the word is a positive word in the large commodity emotion dictionary, then word _ polar _i =1 dense sign, convert S4; if w _i If the word is a negative word in the bulk commodity emotion dictionary, then word _ polar _i (= (-1) × dense _ sign) × density _ sign, go S4; if w _i If the product is not in the large commodity emotion dictionary, then word _ polar _i ＝0；

wherein Q is the sentiment index of the sentence.