CN111241410B - Industry news recommendation method and terminal - Google Patents

Industry news recommendation method and terminal Download PDF

Info

Publication number
CN111241410B
CN111241410B CN202010073519.8A CN202010073519A CN111241410B CN 111241410 B CN111241410 B CN 111241410B CN 202010073519 A CN202010073519 A CN 202010073519A CN 111241410 B CN111241410 B CN 111241410B
Authority
CN
China
Prior art keywords
news
real
time
time news
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010073519.8A
Other languages
Chinese (zh)
Other versions
CN111241410A (en
Inventor
李伟
杨双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sinan Data Service Co ltd
Original Assignee
Shenzhen Sinan Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sinan Data Service Co ltd filed Critical Shenzhen Sinan Data Service Co ltd
Priority to CN202010073519.8A priority Critical patent/CN111241410B/en
Publication of CN111241410A publication Critical patent/CN111241410A/en
Application granted granted Critical
Publication of CN111241410B publication Critical patent/CN111241410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an industry news recommending method and a terminal, which are used for capturing real-time news resources, and inputting real-time news in the real-time news resources into a trained classification model to obtain industry news recommending results, wherein the classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factor decomposition machine and an attention mechanism, and the Deep layer is a feedforward neural network; according to the invention, by adopting a Deep learning method combining an AFM layer and a Deep layer, all the characteristics in various industry fields are used as the input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.

Description

Industry news recommendation method and terminal
Technical Field
The invention relates to the technical field of financial industry, in particular to an industry news recommending method and a terminal.
Background
In the research management, the basic investment flow is very long, the information quantity to be processed is very large, and the variety is very large, wherein news is one of mass data to be processed, and along with the development of the Internet, news on the network presents an explosion trend, and facing to mass news, researchers need to spend a large amount of time to screen news with investment reference value from various websites, so that the efficiency of analyzing and judging specific matters is very low.
According to the personalized news recommendation method, system and storage medium with the patent application number of CN201811574596.0, news resources are obtained in real time, duplication is removed, news after duplication removal is classified differently according to the different fields of the content, wherein the news texts are required to be marked manually, and then LSTM is used for classification learning so as to realize automatic classification of the news; then combining some field information of interest of the user, gradually narrowing down to recommend, forming a brief report for news extraction key information to be recommended to the user by using textword, and finally pushing news brief reports generated by a plurality of abstracts to the user. However, the above-described technical solution has the following problems:
(1) The existing news has a plurality of types, and the correct marking is difficult to be carried out by manual marking;
(2) The classification model adopts LSTM, and the input content is 2-4 sentences before news, so that some key information is likely to be missed;
(3) When selecting news of interest to a user, the priority of the domain features is difficult to determine according to the way of gradually narrowing the domain.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the industrial news recommending method and the terminal are provided, a Deep learning method combining an AFM layer and a Deep layer is adopted, manual marking is not needed, key information is not omitted, and the problem of determining the priority of the field characteristics is avoided.
In order to solve the technical problems, the invention adopts the following technical scheme:
an industry news recommendation method, comprising the steps of:
capturing real-time news resources;
inputting the real-time news in the real-time news resource into a trained classification model to obtain an industry news recommendation result, wherein the classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factoring machine and an attention mechanism, and the Deep layer is a feedforward neural network.
In order to solve the technical problems, the invention adopts another technical scheme that:
the industrial news recommending terminal comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the industrial news recommending method when executing the computer program.
The invention has the beneficial effects that: an industry news recommending method and a terminal adopt a Deep learning method combining an AFM layer and a Deep layer, all features in various industry fields are used as input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.
Drawings
FIG. 1 is a schematic flow chart of an industry news recommending method according to an embodiment of the invention;
FIG. 2 is a schematic overall flow diagram of an industry news recommendation method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an industry news recommendation terminal according to an embodiment of the present invention.
Description of the reference numerals:
1. an industry news recommendation terminal; 2. a processor; 3. a memory.
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Before this, in order to facilitate understanding of the technical solution of the present invention, the following description is given for the english abbreviations, apparatuses, etc. related to the present invention:
(1) AFM: on the basis of FM, an attention mechanism is introduced, different weights are considered when features are combined, wherein FM is an abbreviation of Factorization Machine (factorizer), the problem of how the features are combined under the condition of sparse data is solved, and the 4.Attention mechanism means an attention mechanism, the weights of the feature combinations are considered, and the weights of the feature combinations are obtained through interaction between the features.
(2) Deep: is a feed-forward neural network.
(3) Gini Normalization: the base was normalized and the normalized base value. The coefficient of kunity is a measure of the degree of imbalance of the distribution. It is defined as the ratio of sizes between 0 and 1: the numerator is the area between the evenly distributed line and the lorentz curve, and the denominator is the area under the evenly distributed line.
(4) Word2vec: is a group of correlation models used to generate word vectors.
(5) One-hot: one-hot coding is also called "one-hot coding". It encodes N states with N-bit state registers, each having independent register bits, with only one of the register bits being valid.
(6) Sigmoid: the sigmoid function, also called a Logistic function, is used for hidden layer neuron output, and has a value range of (0, 1), and can map a real number to a section of (0, 1) and can be used for classification.
(7) TextRank: the text ranking algorithm is improved from the webpage importance ranking algorithm PageRank algorithm of ***, can extract keywords and keyword groups of a given text, and can extract key sentences of the text by using an extraction type automatic abstract method.
(8) Jieba: is a Chinese word segmentation tool, which can segment text into words.
Referring to fig. 1 and 2, an industry news recommending method includes the steps of:
capturing real-time news resources;
inputting the real-time news in the real-time news resource into a trained classification model to obtain an industry news recommendation result, wherein the classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factoring machine and an attention mechanism, and the Deep layer is a feedforward neural network.
From the above description, the beneficial effects of the invention are as follows: by adopting a Deep learning method combining an AFM layer and a Deep layer, all the characteristics in various industry fields are used as the input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.
Further, the specific steps of capturing the real-time news resources are as follows:
and capturing real-time news data of each piece of real-time news from a website of a preset industry at regular intervals, wherein the real-time news data comprises real-time news headlines, real-time news contents and real-time news link addresses, and defining a news ID as a unique identifier of the real-time news data so as to store the real-time news data in a news database.
From the description, capturing news from a preset industry website, and ensuring the professionality of real-time news; the real-time news can be captured at fixed time according to a preset time interval, for example, once in half an hour, the real-time performance of the real-time news can be guaranteed under the condition of not occupying larger network resources, and the real-time news can be stored in a database to facilitate subsequent recommendation and reading of users.
Further, the steps between the capturing the real-time news resource and the inputting the real-time news in the real-time news resource into the trained classification model further comprise:
acquiring real-time news data of each piece of real-time news in the news database;
acquiring real-time news content and real-time news headlines of the real-time news data, removing noise and special characters of the real-time news content and the real-time news headlines, then segmenting words, and removing stop words to obtain text features of a word segmentation set comprising the real-time news content and the real-time news headlines, wherein the special characters comprise punctuation, line feed and blank characters;
acquiring a news website corresponding to the real-time news link address in the real-time news data to obtain discrete features corresponding to the news website, wherein the discrete features comprise industries, users and hot spots;
and obtaining the data characteristics of each piece of real-time news, wherein the data characteristics comprise text characteristics and discrete characteristics.
From the above description, before inputting the news data into the classification model, the news data is preprocessed, and targeted processing is performed on different types of contents, so as to obtain standardized data characteristics, and facilitate the processing of the subsequent classification model.
Further, the step of obtaining the trained classification model comprises the following steps:
acquiring historical news data of each piece of historical news in a news database, and taking the historical news data as input of the classification model, wherein the historical news data comprises first news data interested by a user and second news data except the first news data;
and training the two classification models in a combined training mode, updating parameters in the two classification models by adopting a gradient descent method, and adopting a logarithmic loss function as a loss function of the two classification models to obtain the trained two classification models.
From the description, the news data which is interested by the user is used as an important index of recommendation, and the combined training mode, the gradient descent method and the logarithmic loss function are used for training, so that the trained two-class model can accurately recommend real-time news which is most likely to be interested by the user, and the efficiency of inquiring and reading news contents with investment value by the user is improved.
Further, the step of obtaining the trained classification model further comprises the following steps:
and adopting Gini Normalization indexes to evaluate the trained classification model, and if the evaluation fails, continuing to train the classification model until the evaluation is successful.
From the above description, it is known that the classification effect of the classification model is ensured.
Further, the inputting the real-time news in the real-time news resource into the trained classification model specifically includes:
and inputting the data characteristics of the real-time news in the real-time news resource into the trained classification model.
Further, the bi-classification model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, wherein the input layer comprises an input initial layer, a sparse feature layer and a dense vector embedding layer;
the method for obtaining the industry news recommendation result specifically comprises the following steps:
the data features are received through the input initiation layer,
carrying out vectorization processing on the data features through the sparse feature layer, if the data features are text features, converting the text features into digital vector features by using word2vec, and if the data features are discrete features, converting the data features into the digital vector features by using one-hot coding;
converting the high-latitude sparse digital vector features into low-dimension dense digital vector features through the dense vector embedding layer;
carrying out first-order computation and second-order computation on the digital vector features through the AFM layer, summing results obtained by the first-order computation and the second-order computation, wherein the first-order computation is carried out by directly carrying out first-order computation according to a weight matrix formed by the digital vector features output by the sparse feature layer to obtain a first-order computation result, the second-order computation is carried out by carrying out feature intersection on the digital vector features output by the dense vector embedding layer to obtain low-order features, and the low-order features are input into an attribute mechanism, and the output value of the attribute mechanism is the second-order computation result;
extracting features of the digital vector features output by the dense vector embedding layer through the Deep layer to obtain high-order features, wherein the Deep layer comprises a two-layer fully connected network;
and taking the output characteristics of the AFM layer and the Deep layer which are parallel as the input characteristics of the output layer, performing two-classification on the input characteristics by using a sigmoid activation function, and outputting a result of whether real-time news corresponding to the input characteristics is recommended or not so as to obtain an industry news recommendation result.
From the above description, it can be seen that a Deep learning model combining AFM and Deep is provided, low-order features and high-order features are considered when recommending users, the importance degree of the combination of the features is different, and a plurality of recommendation models are adopted to conduct recommendation ordering at the same time, so that the accuracy of recommendation can be improved.
Further, after the industry news recommendation result is obtained, the method further comprises the following steps:
automatically generating abstracts of real-time news to be recommended in the industry news recommendation results to obtain news abstracts of each real-time news to be recommended;
selecting the real-time news to be recommended, the quantity of which is consistent with the preset news content, from the industry news recommendation results to carry out front-end display, wherein the front-end display respectively displays a news website, a real-time news headline, a news abstract and a real-time news link address corresponding to each real-time news to be recommended;
and exporting the information displayed at the front end of the real-time news to be recommended into news briefs in txt format, pdf format or word format.
From the description, after the industry news recommendation result is obtained, the news to be recommended is automatically extracted in the abstract, and the documents exported into various formats are supported, so that great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.
Further, the automatic generation of the abstracts of the real-time news to be recommended in the industry news recommendation result, and the obtaining of the news abstracts of each real-time news to be recommended specifically includes:
dividing each real-time news to be recommended in the industry news recommendation result into sentences according to preset punctuation marks to obtain a sentence matrix T= [ S ] 1 ,S 2 ,...,S m ]The S is m Represents an mth sentence;
performing stop word removal and non-Chinese character removal on each divided sentence, and performing word segmentation to obtain a word segmentation set of each sentence;
converting all words in the word segmentation set by using word2vec to obtain word vectors Si= [ t ] of all words in each sentence i,1 ,t i,2 ,...,t i,n ]The t is i,j For the jth word of the ith sentence, averaging word vectors of all words in each sentence to obtain vector representation of each sentence;
calculating cosine similarity between sentences of the real-time news to be recommended to form a similarity matrix W, wherein the similarity matrix W comprises element values W jk The element value w jk Representing a similarity between the jth sentence and the kth sentence;
constructing edges between any two nodes by taking sentences as nodes and adopting co-occurrence relations, and constructing a graph structure G= (V, E) by taking similarity scores as transition probabilities, wherein V is a node set, and E is an edge set;
iteratively calculating a node score for each sentence using a first formula, until convergence, the first formula being:
said WS (V) i ) A node score representing the ith sentence, d is the damping coefficient, in (V i ) And said Out (V j ) Respectively indicate the directions V i Sum of points V j A set of points pointed to;
and sorting all sentences of each real-time news to be recommended in a descending order according to the node scores to obtain sentence recommendation lists, and taking out the first L sentences in the sentence recommendation lists of each real-time news to be recommended as news summaries thereof.
From the above description, it can be known that, the news to be recommended is automatically generated by summarizing the news by using TextRank algorithm, so as to extract the keywords, keyword groups and keywords of the real-time news rapidly and accurately.
Referring to fig. 3, an industry news recommending terminal includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the industry news recommending method when executing the computer program.
From the above description, the beneficial effects of the invention are as follows: by adopting a Deep learning method combining an AFM layer and a Deep layer, all the characteristics in various industry fields are used as the input of a classification model, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be cross-combined, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users.
Referring to fig. 1 and 2, a first embodiment of the invention is as follows:
an industry news recommendation method, comprising the steps of:
s1, capturing real-time news resources;
the method comprises the following specific steps: capturing real-time news data of each piece of real-time news from a preset industry website at regular intervals, wherein the real-time news data comprise real-time news headlines, real-time news contents and real-time news link addresses, a news ID is defined as a unique identifier of the real-time news data, so that the real-time news data are stored in a news database, and in the embodiment, the preset industry website comprises a world wide web, a license, an orange web, a ring world wide travel message, a new real-time property web, a C114 communication network and the like, and the preset time interval is set to be half an hour;
s2, preprocessing news data;
the method comprises the following specific steps:
s21, acquiring real-time news data of each piece of real-time news in a news database;
s22, acquiring real-time news content and real-time news headlines of the real-time news data, removing noise and special characters of the real-time news content and the real-time news headlines, then segmenting words by using jieba, and removing stop words to obtain text features of a word segmentation set comprising the real-time news content and the real-time news headlines, wherein the special characters comprise punctuation, line feed and blank characters;
s23, acquiring a news website corresponding to a real-time news link address in the real-time news data to obtain discrete features corresponding to the news website, wherein the discrete features comprise industries, users and hot spots;
s24, obtaining data characteristics of each piece of real-time news, wherein the data characteristics comprise text characteristics and discrete characteristics;
s3, recommending real-time news;
in the step, inputting real-time news in a real-time news resource into a trained two-class model to obtain an industry news recommendation result, wherein the two-class model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, the input layer comprises an input initial layer, a sparse feature layer and a dense vector embedding layer, the AFM layer comprises a factor decomposition machine and an attention mechanism, and the Deep layer is a feedforward neural network;
the training process for training the classification model until the training is completed comprises the following steps:
the method comprises the steps of obtaining historical news data of each piece of historical news in a news database, taking the historical news data as input of a classification model, wherein the historical news data comprises first news data interested by a user and second news data except the first news data, and in the embodiment, in order to ensure timeliness of the news, only news historical data selected by the user in the last half month is selected as the news data interested by the user;
training the two-class model in a combined training mode, updating parameters in the two-class model by adopting a gradient descent method, and obtaining a trained two-class model by adopting a logarithmic loss function as a loss function of the two-class model;
adopting Gini Normalization index to evaluate the trained classification model, if the evaluation fails, continuing to train the classification model until the evaluation is successful;
therefore, after the trained classification model is obtained, judging whether each piece of real-time news is recommended to the user or not is started, and the specific steps are as follows:
the data characteristics of the real-time news in the real-time news resource are input into the trained classification model.
Receiving data characteristics through an input initial layer;
carrying out vectorization processing on the data features through the sparse feature layer, if the data features are text features, converting the text features into digital vector features by using word2vec, and if the data features are discrete features, converting the discrete features into the digital vector features by using one-hot coding;
converting the high-latitude sparse digital vector features into low-dimension dense digital vector features through a dense vector embedding layer;
carrying out first-order computation and second-order computation on the digital vector features through an AFM layer, summing results obtained by the first-order computation and the second-order computation, wherein the first-order computation is carried out on the first-order computation directly according to a weight matrix formed by the digital vector features output by a sparse feature layer to obtain a first-order computation result, the second-order computation is carried out on the digital vector features output by a dense vector embedding layer to obtain low-order features, the low-order features are input into an attribute mechanism, and the output value of the attribute mechanism is the second-order computation result;
the Deep layer is used for extracting the characteristics of the digital vectors output by the dense vector embedding layer to obtain high-order characteristics, and comprises a two-layer fully connected network;
taking output features of the parallel AFM layer and Deep layer as input features of the output layer, performing two classifications on the input features by using a sigmoid activation function, and outputting a real-time news recommendation result corresponding to the input features to obtain an industry news recommendation result, wherein the industry news recommendation result is which real-time news is recommended;
s4, abstract generation
Automatically generating abstracts of real-time news to be recommended in industry news recommendation results to obtain news abstracts of each real-time news to be recommended;
further, the abstract of the real-time news to be recommended in the industry news recommendation result is automatically generated, and the news abstract of each real-time news to be recommended is obtained specifically as follows:
dividing each real-time news to be recommended in the industry news recommendation result into sentences according to preset punctuation marks to obtain a sentence matrix T= [ S ] 1 ,S 2 ,...,S m ],S m Represents an mth sentence;
performing stop word removal and non-Chinese character removal on each divided sentence, and performing word segmentation to obtain a word segmentation set of each sentence;
converting all words in the word-segmentation set by using word2vec to obtain word vectors Si= [ t ] of all words in each sentence i,1 ,t i,2 ,...,t i,n ],t i,j For the jth word of the ith sentence, averaging word vectors of all words in each sentence to obtain vector representation of each sentence;
calculating real-time new to be recommended of each pieceCosine similarity between the smelling sentences forms a similarity matrix W which comprises element values W jk Element value w jk Representing a similarity between the jth sentence and the kth sentence;
constructing edges between any two nodes by taking sentences as nodes and adopting co-occurrence relations, and constructing a graph structure G= (V, E) by taking similarity scores as transition probabilities, wherein V is a node set and E is an edge set;
iteratively calculating a node score for each sentence using a first formula, until convergence, the first formula being:
WS(V i ) Node score representing the ith sentence, d is damping coefficient, in (V i ) And Out (V) j ) Respectively indicate the directions V i Sum of points V j A set of points pointed to;
and sorting all sentences of each real-time news to be recommended in a descending order according to the node scores to obtain sentence recommendation lists, and taking out the first L sentences in the sentence recommendation list of each real-time news to be recommended as news summaries thereof, wherein in the embodiment, L is 3.
S5, front end display
The method comprises the steps that real-time news to be recommended, the quantity of which is consistent with the preset news content, is selected from industry news recommendation results to be displayed at the front end, so that a user can quickly and effectively acquire real-time news information; the front end displays a news website, a real-time news headline, a news abstract and a real-time news link address corresponding to each real-time news to be recommended respectively, and in the embodiment, manual selection, manual sorting and manual editing of a user are supported simultaneously;
s6, exporting news briefs
The information displayed at the front end of the real-time news to be recommended is exported to be a news briefing in txt format, pdf format or word format, great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.
Referring to fig. 3, a second embodiment of the present invention is as follows:
an industry news recommendation terminal 1 comprises a memory 3, a processor 2 and a computer program stored on the memory 3 and executable on the processor 2, wherein the processor 2 implements the steps of the first embodiment when executing the computer program.
In summary, according to the industry news recommending method and the terminal provided by the invention, all the characteristics of various industry fields are used as the input of the classification model by adopting the Deep learning method combining the AFM layer and the Deep layer, manual marking is not needed, and key information is not missed; meanwhile, the characteristics can be combined in a crossing way, so that the problem of determining the priority of the field characteristics is avoided, and different news information is recommended for different users; in addition, capturing news from a website in a preset industry, and capturing at regular time according to a preset time interval so as to ensure the professionality and instantaneity of the real-time news; in the process of processing news data, the news data are preprocessed, the news data which are of interest to a user are used as important indexes of recommendation, a combined training mode, a gradient descent method and a logarithmic loss function are used for training, low-order characteristics and high-order characteristics are considered when the user is recommended, the importance degree of the mutual combination of the characteristics is different, a plurality of recommendation models are adopted for recommendation sorting, and Gini Normalization indexes are adopted for evaluation, so that the trained two-class model can accurately recommend real-time news which are most likely to be of interest to the user, and the recommendation accuracy can be greatly improved; after the industry news recommendation result is obtained, the news to be recommended is automatically extracted in the abstract, and the documents which are exported into various formats are supported, so that great convenience is provided for researchers to write research reports, and the efficiency of investors is greatly improved.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (8)

1. An industry news recommendation method is characterized by comprising the following steps:
capturing real-time news resources;
inputting real-time news in the real-time news resource into a trained two-classification model to obtain an industry news recommendation result, wherein the two-classification model comprises an AFM layer and a Deep layer, the AFM layer comprises a factoring machine and an attention mechanism, and the Deep layer is a feedforward neural network;
after the industry news recommendation result is obtained, the method further comprises the following steps:
automatically generating abstracts of real-time news to be recommended in the industry news recommendation results to obtain news abstracts of each real-time news to be recommended;
selecting the real-time news to be recommended, the quantity of which is consistent with the preset news content, from the industry news recommendation results to carry out front-end display, wherein the front-end display respectively displays a news website, a real-time news headline, a news abstract and a real-time news link address corresponding to each real-time news to be recommended;
exporting the information displayed at the front end of the real-time news to be recommended into news briefs in txt format, pdf format or word format;
the method comprises the steps of automatically generating abstracts of real-time news to be recommended in industry news recommendation results, and obtaining news abstracts of each real-time news to be recommended specifically comprises the following steps:
dividing each real-time news to be recommended in the industry news recommendation result into sentences according to preset punctuation marks to obtain a sentence matrix T= [ S ] 1 ,S 2 ,...,S m ]The S is m Represents an mth sentence;
performing stop word removal and non-Chinese character removal on each divided sentence, and performing word segmentation to obtain a word segmentation set of each sentence;
converting all words in the word segmentation set by using word2vec to obtain word vectors Si= [ t ] of all words in each sentence i,1 ,t i,2 ,...,t i,n ]The t is i,j For the jth word of the ith sentence, averaging word vectors of all words in each sentence to obtain vector representation of each sentence;
calculating cosine similarity between sentences of the real-time news to be recommended to form a similarity matrix W, wherein the similarity matrix W comprises element values W jk The element value w jk Representing a similarity between the jth sentence and the kth sentence;
constructing edges between any two nodes by taking sentences as nodes and adopting co-occurrence relations, and constructing a graph structure G= (V, E) by taking similarity scores as transition probabilities, wherein V is a node set, and E is an edge set;
iteratively calculating a node score for each sentence using a first formula, until convergence, the first formula being:
said WS (V) i ) A node score representing the ith sentence, d is the damping coefficient, in (V i ) And said Out (V j ) Respectively indicate the directions V i Sum of points V j A set of points pointed to;
and sorting all sentences of each real-time news to be recommended in a descending order according to the node scores to obtain sentence recommendation lists, and taking out the first L sentences in the sentence recommendation lists of each real-time news to be recommended as news summaries thereof.
2. The industry news recommending method according to claim 1, wherein the capturing real-time news resources specifically comprises the following steps:
and capturing real-time news data of each piece of real-time news from a website of a preset industry at regular intervals, wherein the real-time news data comprises real-time news headlines, real-time news contents and real-time news link addresses, and defining a news ID as a unique identifier of the real-time news data so as to store the real-time news data in a news database.
3. The industry news recommendation method of claim 2, wherein the steps between capturing real-time news resources and inputting real-time news within the real-time news resources into the trained classification model further comprise:
acquiring real-time news data of each piece of real-time news in the news database;
acquiring real-time news content and real-time news headlines of the real-time news data, removing noise and special characters of the real-time news content and the real-time news headlines, then segmenting words, and removing stop words to obtain text features of a word segmentation set comprising the real-time news content and the real-time news headlines, wherein the special characters comprise punctuation, line feed and blank characters;
acquiring a news website corresponding to the real-time news link address in the real-time news data to obtain discrete features corresponding to the news website, wherein the discrete features comprise industries, users and hot spots;
and obtaining the data characteristics of each piece of real-time news, wherein the data characteristics comprise text characteristics and discrete characteristics.
4. The industry news recommendation method of claim 1, wherein obtaining the trained classification model comprises the steps of:
acquiring historical news data of each piece of historical news in a news database, and taking the historical news data as input of the classification model, wherein the historical news data comprises first news data interested by a user and second news data except the first news data;
and training the two classification models in a combined training mode, updating parameters in the two classification models by adopting a gradient descent method, and adopting a logarithmic loss function as a loss function of the two classification models to obtain the trained two classification models.
5. The industry news recommendation method of claim 4, wherein the step of obtaining the trained classification model further comprises the steps of:
and adopting Gini Normalization indexes to evaluate the trained classification model, and if the evaluation fails, continuing to train the classification model until the evaluation is successful.
6. The industry news recommendation method according to claim 3, wherein the inputting the real-time news in the real-time news resource into the trained classification model specifically comprises:
and inputting the data characteristics of the real-time news in the real-time news resource into the trained classification model.
7. The industry news recommendation method of claim 6, wherein the classification model specifically comprises an input layer, an AFM layer, a Deep layer and an output layer, the input layer comprising an input initial layer, a sparse feature layer and a dense vector embedding layer;
the method for obtaining the industry news recommendation result specifically comprises the following steps:
the data features are received through the input initiation layer,
carrying out vectorization processing on the data features through the sparse feature layer, if the data features are text features, converting the text features into digital vector features by using word2vec, and if the data features are discrete features, converting the data features into the digital vector features by using one-hot coding;
converting the high-latitude sparse digital vector features into low-dimension dense digital vector features through the dense vector embedding layer;
carrying out first-order computation and second-order computation on the digital vector features through the AFM layer, summing results obtained by the first-order computation and the second-order computation, wherein the first-order computation is carried out by directly carrying out first-order computation according to a weight matrix formed by the digital vector features output by the sparse feature layer to obtain a first-order computation result, the second-order computation is carried out by carrying out feature intersection on the digital vector features output by the dense vector embedding layer to obtain low-order features, and the low-order features are input into an attribute mechanism, and the output value of the attribute mechanism is the second-order computation result;
extracting features of the digital vector features output by the dense vector embedding layer through the Deep layer to obtain high-order features, wherein the Deep layer comprises a two-layer fully connected network;
and taking the output characteristics of the AFM layer and the Deep layer which are parallel as the input characteristics of the output layer, performing two-classification on the input characteristics by using a sigmoid activation function, and outputting a result of whether real-time news corresponding to the input characteristics is recommended or not so as to obtain an industry news recommendation result.
8. An industry news recommendation terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements an industry news recommendation method according to any one of claims 1 to 7 when executing the computer program.
CN202010073519.8A 2020-01-22 2020-01-22 Industry news recommendation method and terminal Active CN111241410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010073519.8A CN111241410B (en) 2020-01-22 2020-01-22 Industry news recommendation method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010073519.8A CN111241410B (en) 2020-01-22 2020-01-22 Industry news recommendation method and terminal

Publications (2)

Publication Number Publication Date
CN111241410A CN111241410A (en) 2020-06-05
CN111241410B true CN111241410B (en) 2023-08-22

Family

ID=70866403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010073519.8A Active CN111241410B (en) 2020-01-22 2020-01-22 Industry news recommendation method and terminal

Country Status (1)

Country Link
CN (1) CN111241410B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859887A (en) * 2020-07-21 2020-10-30 北京北斗天巡科技有限公司 Scientific and technological news automatic writing system based on deep learning
CN111815426B (en) * 2020-09-11 2020-12-15 深圳司南数据服务有限公司 Data processing method and terminal related to financial investment and research
CN112307336B (en) * 2020-10-30 2024-04-16 中国平安人寿保险股份有限公司 Hot spot information mining and previewing method and device, computer equipment and storage medium
CN116028617B (en) * 2022-12-06 2024-02-27 腾讯科技(深圳)有限公司 Information recommendation method, apparatus, device, readable storage medium and program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593433A (en) * 2013-11-12 2014-02-19 中国科学院信息工程研究所 Graph data processing method and system for massive time series data
CN105205163A (en) * 2015-06-29 2015-12-30 淮阴工学院 Incremental learning multi-level binary-classification method of scientific news
WO2018127627A1 (en) * 2017-01-06 2018-07-12 Nokia Technologies Oy Method and apparatus for automatic video summarisation
CN108681544A (en) * 2018-03-07 2018-10-19 中山大学 A kind of deep learning method described based on collection of illustrative plates topological structure and entity text
CN110188349A (en) * 2019-05-21 2019-08-30 清华大学深圳研究生院 A kind of automation writing method based on extraction-type multiple file summarization method
CN110609948A (en) * 2019-04-03 2019-12-24 华南理工大学 Recommendation method based on multi-level attention mechanism and field perception decomposition machine

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9461876B2 (en) * 2012-08-29 2016-10-04 Loci System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593433A (en) * 2013-11-12 2014-02-19 中国科学院信息工程研究所 Graph data processing method and system for massive time series data
CN105205163A (en) * 2015-06-29 2015-12-30 淮阴工学院 Incremental learning multi-level binary-classification method of scientific news
WO2018127627A1 (en) * 2017-01-06 2018-07-12 Nokia Technologies Oy Method and apparatus for automatic video summarisation
CN108681544A (en) * 2018-03-07 2018-10-19 中山大学 A kind of deep learning method described based on collection of illustrative plates topological structure and entity text
CN110609948A (en) * 2019-04-03 2019-12-24 华南理工大学 Recommendation method based on multi-level attention mechanism and field perception decomposition machine
CN110188349A (en) * 2019-05-21 2019-08-30 清华大学深圳研究生院 A kind of automation writing method based on extraction-type multiple file summarization method

Also Published As

Publication number Publication date
CN111241410A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111241410B (en) Industry news recommendation method and terminal
CN108959270B (en) Entity linking method based on deep learning
Jung Semantic vector learning for natural language understanding
US8972408B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a social sphere
Li et al. Deep cross-platform product matching in e-commerce
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN110516074B (en) Website theme classification method and device based on deep learning
CN110674252A (en) High-precision semantic search system for judicial domain
CN110765277B (en) Knowledge-graph-based mobile terminal online equipment fault diagnosis method
CN106844632A (en) Based on the product review sensibility classification method and device that improve SVMs
CN103870973A (en) Information push and search method and apparatus based on electronic information keyword extraction
CN115098650B (en) Comment information analysis method based on historical data model and related device
US11893537B2 (en) Linguistic analysis of seed documents and peer groups
CN114238573A (en) Information pushing method and device based on text countermeasure sample
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
Verma et al. A novel approach for text summarization using optimal combination of sentence scoring methods
CN111221968A (en) Author disambiguation method and device based on subject tree clustering
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN111831810A (en) Intelligent question and answer method, device, equipment and storage medium
CN112307336A (en) Hotspot information mining and previewing method and device, computer equipment and storage medium
CN105677825A (en) Analysis method for client browsing operation
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN107291686B (en) Method and system for identifying emotion identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant