CN111241410B

CN111241410B - Industry news recommendation method and terminal

Info

Publication number: CN111241410B
Application number: CN202010073519.8A
Authority: CN
Inventors: 李伟; 杨双
Original assignee: Shenzhen Sinan Data Service Co ltd
Current assignee: Shenzhen Sinan Data Service Co ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2023-08-22
Anticipated expiration: 2040-01-22
Also published as: CN111241410A

Abstract

The invention discloses an industry news recommending method and a terminal, which are used for capturing real-time news resources, and inputting real-time news in the real-time news resources into a trained classification model to obtain industry news recommending results, wherein the classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factor decomposition machine and an attention mechanism, and the Deep layer is a feedforward neural network; according to the invention, by adopting a Deep learning method combining an AFM layer and a Deep layer, all the characteristics in various industry fields are used as the input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.

Description

Industry news recommendation method and terminal

Technical Field

The invention relates to the technical field of financial industry, in particular to an industry news recommending method and a terminal.

Background

In the research management, the basic investment flow is very long, the information quantity to be processed is very large, and the variety is very large, wherein news is one of mass data to be processed, and along with the development of the Internet, news on the network presents an explosion trend, and facing to mass news, researchers need to spend a large amount of time to screen news with investment reference value from various websites, so that the efficiency of analyzing and judging specific matters is very low.

According to the personalized news recommendation method, system and storage medium with the patent application number of CN201811574596.0, news resources are obtained in real time, duplication is removed, news after duplication removal is classified differently according to the different fields of the content, wherein the news texts are required to be marked manually, and then LSTM is used for classification learning so as to realize automatic classification of the news; then combining some field information of interest of the user, gradually narrowing down to recommend, forming a brief report for news extraction key information to be recommended to the user by using textword, and finally pushing news brief reports generated by a plurality of abstracts to the user. However, the above-described technical solution has the following problems:

(1) The existing news has a plurality of types, and the correct marking is difficult to be carried out by manual marking;

(2) The classification model adopts LSTM, and the input content is 2-4 sentences before news, so that some key information is likely to be missed;

(3) When selecting news of interest to a user, the priority of the domain features is difficult to determine according to the way of gradually narrowing the domain.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the industrial news recommending method and the terminal are provided, a Deep learning method combining an AFM layer and a Deep layer is adopted, manual marking is not needed, key information is not omitted, and the problem of determining the priority of the field characteristics is avoided.

In order to solve the technical problems, the invention adopts the following technical scheme:

an industry news recommendation method, comprising the steps of:

capturing real-time news resources;

inputting the real-time news in the real-time news resource into a trained classification model to obtain an industry news recommendation result, wherein the classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factoring machine and an attention mechanism, and the Deep layer is a feedforward neural network.

In order to solve the technical problems, the invention adopts another technical scheme that:

the industrial news recommending terminal comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the industrial news recommending method when executing the computer program.

The invention has the beneficial effects that: an industry news recommending method and a terminal adopt a Deep learning method combining an AFM layer and a Deep layer, all features in various industry fields are used as input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.

Drawings

FIG. 1 is a schematic flow chart of an industry news recommending method according to an embodiment of the invention;

FIG. 2 is a schematic overall flow diagram of an industry news recommendation method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an industry news recommendation terminal according to an embodiment of the present invention.

Description of the reference numerals:

1. an industry news recommendation terminal; 2. a processor; 3. a memory.

Detailed Description

In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.

Before this, in order to facilitate understanding of the technical solution of the present invention, the following description is given for the english abbreviations, apparatuses, etc. related to the present invention:

(1) AFM: on the basis of FM, an attention mechanism is introduced, different weights are considered when features are combined, wherein FM is an abbreviation of Factorization Machine (factorizer), the problem of how the features are combined under the condition of sparse data is solved, and the 4.Attention mechanism means an attention mechanism, the weights of the feature combinations are considered, and the weights of the feature combinations are obtained through interaction between the features.

(2) Deep: is a feed-forward neural network.

(3) Gini Normalization: the base was normalized and the normalized base value. The coefficient of kunity is a measure of the degree of imbalance of the distribution. It is defined as the ratio of sizes between 0 and 1: the numerator is the area between the evenly distributed line and the lorentz curve, and the denominator is the area under the evenly distributed line.

(4) Word2vec: is a group of correlation models used to generate word vectors.

(5) One-hot: one-hot coding is also called "one-hot coding". It encodes N states with N-bit state registers, each having independent register bits, with only one of the register bits being valid.

(6) Sigmoid: the sigmoid function, also called a Logistic function, is used for hidden layer neuron output, and has a value range of (0, 1), and can map a real number to a section of (0, 1) and can be used for classification.

(7) TextRank: the text ranking algorithm is improved from the webpage importance ranking algorithm PageRank algorithm of ***, can extract keywords and keyword groups of a given text, and can extract key sentences of the text by using an extraction type automatic abstract method.

(8) Jieba: is a Chinese word segmentation tool, which can segment text into words.

Referring to fig. 1 and 2, an industry news recommending method includes the steps of:

capturing real-time news resources;

From the above description, the beneficial effects of the invention are as follows: by adopting a Deep learning method combining an AFM layer and a Deep layer, all the characteristics in various industry fields are used as the input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.

Further, the specific steps of capturing the real-time news resources are as follows:

and capturing real-time news data of each piece of real-time news from a website of a preset industry at regular intervals, wherein the real-time news data comprises real-time news headlines, real-time news contents and real-time news link addresses, and defining a news ID as a unique identifier of the real-time news data so as to store the real-time news data in a news database.

From the description, capturing news from a preset industry website, and ensuring the professionality of real-time news; the real-time news can be captured at fixed time according to a preset time interval, for example, once in half an hour, the real-time performance of the real-time news can be guaranteed under the condition of not occupying larger network resources, and the real-time news can be stored in a database to facilitate subsequent recommendation and reading of users.

Further, the steps between the capturing the real-time news resource and the inputting the real-time news in the real-time news resource into the trained classification model further comprise:

acquiring real-time news data of each piece of real-time news in the news database;

acquiring real-time news content and real-time news headlines of the real-time news data, removing noise and special characters of the real-time news content and the real-time news headlines, then segmenting words, and removing stop words to obtain text features of a word segmentation set comprising the real-time news content and the real-time news headlines, wherein the special characters comprise punctuation, line feed and blank characters;

acquiring a news website corresponding to the real-time news link address in the real-time news data to obtain discrete features corresponding to the news website, wherein the discrete features comprise industries, users and hot spots;

and obtaining the data characteristics of each piece of real-time news, wherein the data characteristics comprise text characteristics and discrete characteristics.

From the above description, before inputting the news data into the classification model, the news data is preprocessed, and targeted processing is performed on different types of contents, so as to obtain standardized data characteristics, and facilitate the processing of the subsequent classification model.

Further, the step of obtaining the trained classification model comprises the following steps:

acquiring historical news data of each piece of historical news in a news database, and taking the historical news data as input of the classification model, wherein the historical news data comprises first news data interested by a user and second news data except the first news data;

and training the two classification models in a combined training mode, updating parameters in the two classification models by adopting a gradient descent method, and adopting a logarithmic loss function as a loss function of the two classification models to obtain the trained two classification models.

From the description, the news data which is interested by the user is used as an important index of recommendation, and the combined training mode, the gradient descent method and the logarithmic loss function are used for training, so that the trained two-class model can accurately recommend real-time news which is most likely to be interested by the user, and the efficiency of inquiring and reading news contents with investment value by the user is improved.

Further, the step of obtaining the trained classification model further comprises the following steps:

and adopting Gini Normalization indexes to evaluate the trained classification model, and if the evaluation fails, continuing to train the classification model until the evaluation is successful.

From the above description, it is known that the classification effect of the classification model is ensured.

Further, the inputting the real-time news in the real-time news resource into the trained classification model specifically includes:

and inputting the data characteristics of the real-time news in the real-time news resource into the trained classification model.

Further, the bi-classification model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, wherein the input layer comprises an input initial layer, a sparse feature layer and a dense vector embedding layer;

the method for obtaining the industry news recommendation result specifically comprises the following steps:

the data features are received through the input initiation layer,

carrying out vectorization processing on the data features through the sparse feature layer, if the data features are text features, converting the text features into digital vector features by using word2vec, and if the data features are discrete features, converting the data features into the digital vector features by using one-hot coding;

converting the high-latitude sparse digital vector features into low-dimension dense digital vector features through the dense vector embedding layer;

carrying out first-order computation and second-order computation on the digital vector features through the AFM layer, summing results obtained by the first-order computation and the second-order computation, wherein the first-order computation is carried out by directly carrying out first-order computation according to a weight matrix formed by the digital vector features output by the sparse feature layer to obtain a first-order computation result, the second-order computation is carried out by carrying out feature intersection on the digital vector features output by the dense vector embedding layer to obtain low-order features, and the low-order features are input into an attribute mechanism, and the output value of the attribute mechanism is the second-order computation result;

extracting features of the digital vector features output by the dense vector embedding layer through the Deep layer to obtain high-order features, wherein the Deep layer comprises a two-layer fully connected network;

and taking the output characteristics of the AFM layer and the Deep layer which are parallel as the input characteristics of the output layer, performing two-classification on the input characteristics by using a sigmoid activation function, and outputting a result of whether real-time news corresponding to the input characteristics is recommended or not so as to obtain an industry news recommendation result.

From the above description, it can be seen that a Deep learning model combining AFM and Deep is provided, low-order features and high-order features are considered when recommending users, the importance degree of the combination of the features is different, and a plurality of recommendation models are adopted to conduct recommendation ordering at the same time, so that the accuracy of recommendation can be improved.

Further, after the industry news recommendation result is obtained, the method further comprises the following steps:

automatically generating abstracts of real-time news to be recommended in the industry news recommendation results to obtain news abstracts of each real-time news to be recommended;

selecting the real-time news to be recommended, the quantity of which is consistent with the preset news content, from the industry news recommendation results to carry out front-end display, wherein the front-end display respectively displays a news website, a real-time news headline, a news abstract and a real-time news link address corresponding to each real-time news to be recommended;

and exporting the information displayed at the front end of the real-time news to be recommended into news briefs in txt format, pdf format or word format.

From the description, after the industry news recommendation result is obtained, the news to be recommended is automatically extracted in the abstract, and the documents exported into various formats are supported, so that great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.

Further, the automatic generation of the abstracts of the real-time news to be recommended in the industry news recommendation result, and the obtaining of the news abstracts of each real-time news to be recommended specifically includes:

dividing each real-time news to be recommended in the industry news recommendation result into sentences according to preset punctuation marks to obtain a sentence matrix T= [ S ] ₁ ,S ₂ ,...,S _m ]The S is _m Represents an mth sentence;

performing stop word removal and non-Chinese character removal on each divided sentence, and performing word segmentation to obtain a word segmentation set of each sentence;

converting all words in the word segmentation set by using word2vec to obtain word vectors Si= [ t ] of all words in each sentence _i,1 ,t _i,2 ,...,t _i,n ]The t is _i,j For the jth word of the ith sentence, averaging word vectors of all words in each sentence to obtain vector representation of each sentence;

calculating cosine similarity between sentences of the real-time news to be recommended to form a similarity matrix W, wherein the similarity matrix W comprises element values W _jk The element value w _jk Representing a similarity between the jth sentence and the kth sentence;

constructing edges between any two nodes by taking sentences as nodes and adopting co-occurrence relations, and constructing a graph structure G= (V, E) by taking similarity scores as transition probabilities, wherein V is a node set, and E is an edge set;

iteratively calculating a node score for each sentence using a first formula, until convergence, the first formula being:

said WS (V) _i ) A node score representing the ith sentence, d is the damping coefficient, in (V _i ) And said Out (V _j ) Respectively indicate the directions V _i Sum of points V _j A set of points pointed to;

and sorting all sentences of each real-time news to be recommended in a descending order according to the node scores to obtain sentence recommendation lists, and taking out the first L sentences in the sentence recommendation lists of each real-time news to be recommended as news summaries thereof.

From the above description, it can be known that, the news to be recommended is automatically generated by summarizing the news by using TextRank algorithm, so as to extract the keywords, keyword groups and keywords of the real-time news rapidly and accurately.

Referring to fig. 3, an industry news recommending terminal includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the industry news recommending method when executing the computer program.

Referring to fig. 1 and 2, a first embodiment of the invention is as follows:

an industry news recommendation method, comprising the steps of:

s1, capturing real-time news resources;

the method comprises the following specific steps: capturing real-time news data of each piece of real-time news from a preset industry website at regular intervals, wherein the real-time news data comprise real-time news headlines, real-time news contents and real-time news link addresses, a news ID is defined as a unique identifier of the real-time news data, so that the real-time news data are stored in a news database, and in the embodiment, the preset industry website comprises a world wide web, a license, an orange web, a ring world wide travel message, a new real-time property web, a C114 communication network and the like, and the preset time interval is set to be half an hour;

s2, preprocessing news data;

the method comprises the following specific steps:

s21, acquiring real-time news data of each piece of real-time news in a news database;

s22, acquiring real-time news content and real-time news headlines of the real-time news data, removing noise and special characters of the real-time news content and the real-time news headlines, then segmenting words by using jieba, and removing stop words to obtain text features of a word segmentation set comprising the real-time news content and the real-time news headlines, wherein the special characters comprise punctuation, line feed and blank characters;

s23, acquiring a news website corresponding to a real-time news link address in the real-time news data to obtain discrete features corresponding to the news website, wherein the discrete features comprise industries, users and hot spots;

s24, obtaining data characteristics of each piece of real-time news, wherein the data characteristics comprise text characteristics and discrete characteristics;

s3, recommending real-time news;

in the step, inputting real-time news in a real-time news resource into a trained two-class model to obtain an industry news recommendation result, wherein the two-class model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, the input layer comprises an input initial layer, a sparse feature layer and a dense vector embedding layer, the AFM layer comprises a factor decomposition machine and an attention mechanism, and the Deep layer is a feedforward neural network;

the training process for training the classification model until the training is completed comprises the following steps:

the method comprises the steps of obtaining historical news data of each piece of historical news in a news database, taking the historical news data as input of a classification model, wherein the historical news data comprises first news data interested by a user and second news data except the first news data, and in the embodiment, in order to ensure timeliness of the news, only news historical data selected by the user in the last half month is selected as the news data interested by the user;

training the two-class model in a combined training mode, updating parameters in the two-class model by adopting a gradient descent method, and obtaining a trained two-class model by adopting a logarithmic loss function as a loss function of the two-class model;

adopting Gini Normalization index to evaluate the trained classification model, if the evaluation fails, continuing to train the classification model until the evaluation is successful;

therefore, after the trained classification model is obtained, judging whether each piece of real-time news is recommended to the user or not is started, and the specific steps are as follows:

the data characteristics of the real-time news in the real-time news resource are input into the trained classification model.

Receiving data characteristics through an input initial layer;

carrying out vectorization processing on the data features through the sparse feature layer, if the data features are text features, converting the text features into digital vector features by using word2vec, and if the data features are discrete features, converting the discrete features into the digital vector features by using one-hot coding;

converting the high-latitude sparse digital vector features into low-dimension dense digital vector features through a dense vector embedding layer;

carrying out first-order computation and second-order computation on the digital vector features through an AFM layer, summing results obtained by the first-order computation and the second-order computation, wherein the first-order computation is carried out on the first-order computation directly according to a weight matrix formed by the digital vector features output by a sparse feature layer to obtain a first-order computation result, the second-order computation is carried out on the digital vector features output by a dense vector embedding layer to obtain low-order features, the low-order features are input into an attribute mechanism, and the output value of the attribute mechanism is the second-order computation result;

the Deep layer is used for extracting the characteristics of the digital vectors output by the dense vector embedding layer to obtain high-order characteristics, and comprises a two-layer fully connected network;

taking output features of the parallel AFM layer and Deep layer as input features of the output layer, performing two classifications on the input features by using a sigmoid activation function, and outputting a real-time news recommendation result corresponding to the input features to obtain an industry news recommendation result, wherein the industry news recommendation result is which real-time news is recommended;

s4, abstract generation

Automatically generating abstracts of real-time news to be recommended in industry news recommendation results to obtain news abstracts of each real-time news to be recommended;

further, the abstract of the real-time news to be recommended in the industry news recommendation result is automatically generated, and the news abstract of each real-time news to be recommended is obtained specifically as follows:

dividing each real-time news to be recommended in the industry news recommendation result into sentences according to preset punctuation marks to obtain a sentence matrix T= [ S ] ₁ ,S ₂ ,...,S _m ]，S _m Represents an mth sentence;

converting all words in the word-segmentation set by using word2vec to obtain word vectors Si= [ t ] of all words in each sentence _i,1 ,t _i,2 ,...,t _i,n ]，t _i,j For the jth word of the ith sentence, averaging word vectors of all words in each sentence to obtain vector representation of each sentence;

calculating real-time new to be recommended of each pieceCosine similarity between the smelling sentences forms a similarity matrix W which comprises element values W _jk Element value w _jk Representing a similarity between the jth sentence and the kth sentence;

constructing edges between any two nodes by taking sentences as nodes and adopting co-occurrence relations, and constructing a graph structure G= (V, E) by taking similarity scores as transition probabilities, wherein V is a node set and E is an edge set;

WS(V _i ) Node score representing the ith sentence, d is damping coefficient, in (V _i ) And Out (V) _j ) Respectively indicate the directions V _i Sum of points V _j A set of points pointed to;

and sorting all sentences of each real-time news to be recommended in a descending order according to the node scores to obtain sentence recommendation lists, and taking out the first L sentences in the sentence recommendation list of each real-time news to be recommended as news summaries thereof, wherein in the embodiment, L is 3.

S5, front end display

The method comprises the steps that real-time news to be recommended, the quantity of which is consistent with the preset news content, is selected from industry news recommendation results to be displayed at the front end, so that a user can quickly and effectively acquire real-time news information; the front end displays a news website, a real-time news headline, a news abstract and a real-time news link address corresponding to each real-time news to be recommended respectively, and in the embodiment, manual selection, manual sorting and manual editing of a user are supported simultaneously;

s6, exporting news briefs

The information displayed at the front end of the real-time news to be recommended is exported to be a news briefing in txt format, pdf format or word format, great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.

Referring to fig. 3, a second embodiment of the present invention is as follows:

an industry news recommendation terminal 1 comprises a memory 3, a processor 2 and a computer program stored on the memory 3 and executable on the processor 2, wherein the processor 2 implements the steps of the first embodiment when executing the computer program.

In summary, according to the industry news recommending method and the terminal provided by the invention, all the characteristics of various industry fields are used as the input of the classification model by adopting the Deep learning method combining the AFM layer and the Deep layer, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be combined in a crossing way, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users; in addition, capturing news from a website in a preset industry, and capturing at regular time according to a preset time interval so as to ensure the professionality and instantaneity of the real-time news; in the process of processing news data, the news data are preprocessed, the news data which are of interest to a user are used as important indexes of recommendation, a combined training mode, a gradient descent method and a logarithmic loss function are used for training, low-order characteristics and high-order characteristics are considered when the user is recommended, the importance degree of the mutual combination of the characteristics is different, a plurality of recommendation models are adopted for recommendation sorting, and Gini Normalization indexes are adopted for evaluation, so that the trained two-class model can accurately recommend real-time news which are most likely to be of interest to the user, and the recommendation accuracy can be greatly improved; after the industry news recommendation result is obtained, the news to be recommended is automatically extracted in the abstract, and the documents which are exported into various formats are supported, so that great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims

1. An industry news recommendation method is characterized by comprising the following steps:

capturing real-time news resources;

inputting real-time news in the real-time news resource into a trained two-classification model to obtain an industry news recommendation result, wherein the two-classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factoring machine and an attention mechanism, and the Deep layer is a feedforward neural network;

after the industry news recommendation result is obtained, the method further comprises the following steps:

exporting the information displayed at the front end of the real-time news to be recommended into news briefs in txt format, pdf format or word format;

the method comprises the steps of automatically generating abstracts of real-time news to be recommended in industry news recommendation results, and obtaining news abstracts of each real-time news to be recommended specifically comprises the following steps:

2. The industry news recommending method according to claim 1, wherein the capturing real-time news resources specifically comprises the following steps:

3. The industry news recommendation method of claim 2, wherein the steps between capturing real-time news resources and inputting real-time news within the real-time news resources into the trained classification model further comprise:

4. The industry news recommendation method of claim 1, wherein obtaining the trained classification model comprises the steps of:

5. The industry news recommendation method of claim 4, wherein the step of obtaining the trained classification model further comprises the steps of:

6. The industry news recommendation method according to claim 3, wherein the inputting the real-time news in the real-time news resource into the trained classification model specifically comprises:

7. The industry news recommendation method of claim 6, wherein the classification model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, the input layer comprising an input initial layer, a sparse feature layer and a dense vector embedding layer;

the data features are received through the input initiation layer,

8. An industry news recommendation terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements an industry news recommendation method according to any one of claims 1 to 7 when executing the computer program.