CN113222734A

CN113222734A - Bank financial information recommendation system and method

Info

Publication number: CN113222734A
Application number: CN202110556077.7A
Authority: CN
Inventors: 张亚泽; 张岩; 王鹏程; 狄潇然; 朱阿龙; 李瑞男; 卢伟; 田林; 豆敏娟; 刘宇琦; 张靖羚; 刘琦
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-06

Abstract

The invention provides a bank financial information recommendation system and a method, which can be used in the technical field of artificial intelligence, and the system comprises: the user preference model training module is used for obtaining a user preference model of each user according to the financial information click behavior data of each user; the data preprocessing module is used for preprocessing a plurality of pieces of financial information data acquired by each bank node in real time; the naive Bayesian multi-classification module is used for inputting the preprocessed multiple pieces of financial information data into a naive Bayesian multi-classification model to obtain a weight vector of each piece of financial information data to a financial information category label; the information recommendation module is used for calculating the cosine similarity of a user preference model of each user and each weight vector for each user of each bank node; and determining a recommendation list for each user according to the plurality of cosine similarities. The invention can recommend different bank financial information to different users.

Description

Bank financial information recommendation system and method

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a bank financial information recommendation system and method.

Background

In recent years, the financial market has received more and more attention, and each large financial company can provide thousands of pieces of financial information to users every day for the users to browse. The commercial bank is used as a lead enterprise in the financial industry, and the financial information provided to the user can be trusted by the user. Therefore, from the user's perspective, it is more desirable to obtain information of financial interest from commercial banks. However, at present, most banks lack pertinence when providing financial information services to users, all users receive the same information, and different financial information cannot be recommended according to the preference degree of the users to different financial information. This results in that the bank cannot respond to the interest change of the user in time when providing the financial information service for the user, and the user cannot see the favorite financial information in time, thereby reducing the user experience. On the other hand, the data among all banks cannot share the training model at present, so that the data are distributed in independent data islands, and the accuracy of the model depends on the quality and the quantity of the data, thereby causing the problem of low accuracy.

Disclosure of Invention

The embodiment of the invention provides a bank financial information recommendation system, which is used for recommending different bank financial information to different users, and comprises:

at least one bank node, a data interaction module and a federal learning network module, wherein each bank node comprises a data acquisition module and a user preference model storage module,

the data acquisition module is used for acquiring financial information click behavior data of each user corresponding to the bank node and sending the financial information click behavior data to the data interaction module;

the data interaction module is used for sending the received financial information click behavior data to the federal learning network module;

the federal learning network module comprises a user preference model training module, a data preprocessing module, a naive Bayes multi-classification module and an information recommendation module, wherein,

the user preference model training module is used for obtaining a user preference model of each user according to financial information click behavior data of each user, wherein the user preference model is a preference vector of the user to a plurality of financial information feature labels; storing the user preference model to a user preference model storage module of a corresponding bank node;

the data preprocessing module is used for preprocessing a plurality of pieces of financial information data acquired by each bank node in real time;

the naive Bayesian multi-classification module is used for inputting the preprocessed multiple pieces of financial information data into a naive Bayesian multi-classification model to obtain a weight vector of each piece of financial information data to a financial information category label;

the information recommendation module is used for calculating the cosine similarity of a user preference model of each user and each weight vector for each user of each bank node; and determining a recommendation list for each user according to the plurality of cosine similarities, wherein the recommendation list comprises a plurality of pieces of financial information data which are arranged from high to low according to recommendation degrees.

The embodiment of the invention provides a bank financial information recommendation method, which is used for recommending different bank financial information to different users and comprises the following steps:

for each bank node, collecting financial information click behavior data of each user of the bank node;

according to financial information clicking behavior data of each user, obtaining a user preference model of each user, wherein the user preference model is a preference vector of the user to a plurality of financial information feature labels;

preprocessing a plurality of pieces of financial information data acquired by each bank node in real time;

inputting the preprocessed multiple pieces of financial information data into a naive Bayesian multi-classification model to obtain a weight vector of each piece of financial information data to a financial information category label;

for each user of each bank node, calculating the cosine similarity of the user preference model of the user and each weight vector;

and determining a recommendation list for each user according to the plurality of cosine similarities, wherein the recommendation list comprises a plurality of pieces of financial information data which are arranged from high to low according to recommendation degrees.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the bank financial information recommendation method when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the bank financial information recommendation method.

In the embodiment of the invention, the data acquisition module is used for acquiring financial information click behavior data of each user corresponding to a bank node and sending the financial information click behavior data to the data interaction module; the data interaction module is used for sending the received financial information click behavior data to the federal learning network module; the system comprises a federated learning network module, a federated learning network module and a federated learning network module, wherein the federated learning network module comprises a user preference model training module, a data preprocessing module, a naive Bayesian multi-classification module and an information recommendation module, wherein the user preference model training module is used for obtaining a user preference model of each user according to financial information click behavior data of each user, and the user preference model is a preference vector of the user to a plurality of financial information feature tags; storing the user preference model to a user preference model storage module of a corresponding bank node; the data preprocessing module is used for preprocessing a plurality of pieces of financial information data acquired by each bank node in real time; the naive Bayesian multi-classification module is used for inputting the preprocessed multiple pieces of financial information data into a naive Bayesian multi-classification model to obtain a weight vector of each piece of financial information data to a financial information category label; the information recommendation module is used for calculating the cosine similarity of a user preference model of each user and each weight vector for each user of each bank node; and determining a recommendation list for each user according to the plurality of cosine similarities, wherein the recommendation list comprises a plurality of pieces of financial information data which are arranged from high to low according to recommendation degrees. In the embodiment, the federal learning network principle is adopted, the user preference model training and the naive Bayesian multi-classification model classification are respectively carried out, and the recommendation list is finally obtained, so that different bank financial information can be recommended to different users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

FIG. 1 is a diagram illustrating a system for recommending financial information for a bank according to an embodiment of the present invention;

FIG. 2 is another schematic diagram of a bank financial information recommendation system according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for recommending banking financial information according to an embodiment of the present invention;

FIG. 4 is a flow chart of user preference model training in an embodiment of the present invention;

FIG. 5 is a flow chart of user preference vector calculation in an embodiment of the present invention;

FIG. 6 is a flow chart of data preprocessing in an embodiment of the present invention;

FIG. 7 is a diagram of a computer device in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.

Terms related to the embodiments of the present invention are explained below.

Federal learning: the method is characterized in that two or more participants train a shared machine learning model together, each participant has a plurality of data capable of training the model, the data of each participant does not leave the participant in the training process, model information is transmitted in each participant in an encrypted mode, the effect of the federal learning model can be fully approximate to an ideal model (a model for training data together in a summary mode), the data of the participants suitable for the federal learning in the transverse federal learning have overlapped characteristics, and the data samples owned by the participants are different.

Naive bayes multi-classification model: the method is one of the most widely applied classification algorithms, and based on the Bayesian principle, the sample data set is classified by using the knowledge of probability statistics. The naive Bayes multi-classification model training process mainly calculates the occurrence frequency (prior probability) of each class in a training sample and the conditional probability (conditional probability) of each feature attribute partition to each class.

Spark: the method is an open-source cluster computing system based on memory computing, is one of the most hot projects in the Apache community, and compared with Hadoop, the computation speed of Spark can be improved by nearly 100 times. Spark is composed of a group of powerful, high-level libraries including Spark sql, Spark streaming, MLlib, GraphX. Spark provides a large number of operators and a rich data operation interface to facilitate data processing.

DataFrame: a distributed data set on the Spark platform provides detailed schema information consisting of columns, just like a table in a relational database. The DataFrame has abundant operators, performs higher-level abstraction, provides a special API (application programming interface) for processing distributed data, and can conveniently process large-scale structured data.

Cosine similarity algorithm: the method is based on an algorithm of a vector space model, and similarity between two vectors is measured by calculating a cosine value of an included angle of the two vectors. If the sizes and the directions of the two vectors are completely overlapped, the included angle is 0 degrees, and the cosine value is 1; in the same way, the two vectors are completely opposite, namely the included angle is 180 degrees, and the cosine value is-1. Cosine similarity algorithms are often used for similarity comparison of text content.

Fig. 1 is a schematic diagram of a bank financial information recommendation system according to an embodiment of the present invention, as shown in fig. 1, the system includes:

at least one bank node 101, a data interaction module 102, and a federal learning network module 103, wherein each bank node 101 comprises a data collection module 1011 and a user preference model storage module 1012, wherein,

the data acquisition module 1011 is used for acquiring financial information click behavior data of each user corresponding to a bank node and sending the financial information click behavior data to the data interaction module;

the data interaction module 102 is used for sending the received financial information click behavior data to the federal learning network module;

the federal learning network module 103 includes a user preference model training module 1031, a data preprocessing module 1032, a naive bayes multi-classification module 1033, and an information recommending module 1034,

the user preference model training module 1031 is used for obtaining a user preference model of each user according to the financial information click behavior data of each user, wherein the user preference model is a preference vector of the user to a plurality of financial information feature labels; storing the user preference model to a user preference model storage module of a corresponding bank node;

the data preprocessing module 1032 is used for preprocessing a plurality of pieces of financial information data acquired by each bank node in real time;

a naive bayes multi-classification module 1033, configured to input the preprocessed multiple pieces of financial information data into a naive bayes multi-classification model, so as to obtain a weight vector of each piece of financial information data to the financial information category label;

the information recommendation module 1034 is configured to calculate, for each user of each bank node, a cosine similarity between the user preference model of the user and each weight vector; and determining a recommendation list for each user according to the plurality of cosine similarities, wherein the recommendation list comprises a plurality of pieces of financial information data which are arranged from high to low according to recommendation degrees.

In the embodiment of the invention, the federal learning network principle is adopted, the user preference model training and the classification of the naive Bayesian multi-classification model are respectively carried out, and finally, the recommendation list is obtained, so that different bank financial information can be recommended to different users.

In specific implementation, equivalently, each bank node can be added into the federal learning network model, and each bank node can use the user preference model by itself or can share the user preference model. The user preference model storage module may be Hive. The method adopts the idea of federal learning, wherein the function of a user preference model training module is a batch calculation process; the data preprocessing module, the naive Bayesian multi-classification module and the information recommending module are real-time calculation processes. That is, through a batch calculation process, a user preference model is obtained; then, after the financial information data is obtained in real time, real-time analysis is performed to determine a recommendation list to the user.

In one embodiment, the user preference model training module comprises:

the data processing module is used for processing the financial information clicking behavior data into table data through spark sql; removing abnormal data in the table data through spark operators;

and the preference vector calculation module is used for obtaining the preference vector of the user to the financial information feature tag according to the table data.

In the above embodiment, the method of the present invention is developed and operated based on a Spark platform, and in an embodiment, the financial information click behavior data and the financial information data are in a Spark DataFrame data format, and are interacted based on the Spark DataFrame data format. The development is convenient, the data format conversion and data transmission expenses among the modules can be reduced, and the seamless butt joint of the data among the modules can be realized.

In an embodiment, the preference vector calculation module is specifically configured to:

for each user in the table data, acquiring a first amount of financial information data corresponding to the user;

extracting a second quantity of financial information feature tags from the first quantity of financial information data;

and calculating the preference value of each user to each financial information feature tag, wherein all the preference values corresponding to each user form a preference vector of the user to the financial information feature tag.

In the above embodiment, for example, the user 1 corresponds to 10 pieces of financial information data, and there are 6 financial information feature tags, that is, some financial information feature tags of the financial information data are the same; the preference value may be calculated based on the probability of occurrence of each financial information tag, while referring to a priori experience.

In one embodiment, the data pre-processing module comprises:

the natural language processing module is used for performing at least one of the following data operations on a plurality of pieces of financial information data acquired in real time to acquire a text vector consisting of a plurality of word features corresponding to each piece of financial information data: word cutting processing, word stopping processing and sensitive word filtering processing;

the vectorization module is used for converting the text vector corresponding to each piece of financial information data into a numerical vector;

and the characteristic engineering module is used for processing the numerical vector according to the preset vector length.

In the above embodiment, for example, the length of the predetermined vector is 10, and the length of the obtained numerical vector is 15, at this time, the numerical vector needs to be intercepted, so that the length thereof is 10; for another example, the length of the numerical vector is 0, and the numerical vector is subjected to 0-complementing processing so that the length is 10.

In an embodiment, the vectorization module is specifically configured to: and converting the text vector corresponding to each piece of financial information data into a numerical vector by adopting a TF-IDF algorithm.

Fig. 2 is another schematic diagram of the bank financial information recommendation system according to an embodiment of the present invention, as shown in fig. 2, in an embodiment, the federal learning network module further includes a financial information data receiving module 1035 for: and receiving a plurality of pieces of financial information data of each banking node in the kafka message queue in real time.

In one embodiment, the federal learning network module further includes a model update module 1036 for: monitoring a message updated by the user preference model in the kafka topic through the thread pool; when the information of updating the user preference model is obtained, the updated user preference model is sent to the information recommendation module;

the information recommendation module is specifically used for: for each user of each bank node, calculating the cosine similarity of the updated user preference model of the user and each weight vector;

the user preference model training module is further configured to: after obtaining the user preference model of each user, a message of user preference model update is sent to kafka topic.

In one embodiment, the system further comprises an identity authentication center 104 for: and performing identity authentication on each bank node.

In one embodiment, the banking node further comprises a data encryption module 1013 for: and encrypting the user preference model of each user of each bank node.

In one embodiment, the system further comprises a manual recommendation module 105 for:

judging the types of a plurality of pieces of financial information data acquired by each bank node in real time;

and when the type accords with a preset type, determining a recommendation list for each user.

In summary, the system provided by the embodiment of the present invention has the following beneficial effects:

1. the user preference model sharing among different bank nodes is realized by using the federal learning technology, and the dilemma of data island is broken.

2. The method provided by the invention is developed and operated based on a Spark platform, and the modules are interacted based on Spark DataFrame data formats, so that the development is convenient, the data format conversion and data transmission overhead among the modules can be reduced, and the seamless butt joint of the data among the modules can be realized.

3. The user preference model can be automatically updated through the kafka message queue, so that the accuracy of the cosine similarity calculated in real time is guaranteed.

4. The problem of 'cold start' of a traditional recommendation system is solved by determining a recommendation list for each user according to a plurality of cosine similarities.

The embodiment of the invention also provides a bank financial information recommendation method, the principle of which is similar to that of a bank financial information recommendation system, and the details are not repeated here.

Fig. 3 is a flowchart of a method for recommending bank financial information according to an embodiment of the present invention, as shown in fig. 3, including:

step 301, for each bank node, collecting financial information click behavior data of each user of the bank node;

step 302, obtaining a user preference model of each user according to financial information click behavior data of each user, wherein the user preference model is a preference vector of the user to a plurality of financial information feature labels;

step 303, preprocessing a plurality of pieces of financial information data acquired by each bank node in real time;

step 304, inputting the plurality of pieces of preprocessed financial information data into a naive Bayesian multi-classification model to obtain a weight vector of each piece of financial information data to a financial information category label;

step 305, calculating the cosine similarity of the user preference model of each user and each weight vector for each user of each bank node;

step 306, determining a recommendation list for each user according to the plurality of cosine similarities, wherein the recommendation list comprises a plurality of pieces of financial information data which are arranged from high to low according to recommendation degrees.

Fig. 4 is a flowchart of the user preference model training in the embodiment of the present invention, as shown in fig. 4, the steps are as follows:

step 401, processing the financial information click behavior data into tabular data through spark sql;

step 402, removing abnormal data in the table data through spark operators;

step 403, obtaining a preference vector of the user to the financial information feature tag according to the table data.

Fig. 5 is a flowchart of calculating a user preference vector according to an embodiment of the present invention, as shown in fig. 5, the steps are as follows:

step 501, for each user in the table data, obtaining a first amount of financial information data corresponding to the user;

step 502, extracting a second amount of financial information feature tags from the first amount of financial information data;

step 503, calculating the preference value of each user to each financial information feature tag, wherein all the preference values corresponding to each user form a preference vector of the user to the financial information feature tag.

Fig. 6 is a flowchart of data preprocessing in the embodiment of the present invention, as shown in fig. 6, the steps are as follows:

step 601, performing at least one of the following data operations on a plurality of pieces of financial information data acquired in real time to obtain a text vector consisting of a plurality of word features corresponding to each piece of financial information data: word cutting processing, word stopping processing and sensitive word filtering processing;

step 602, converting the text vector corresponding to each piece of financial information data into a numerical vector;

step 603, processing the numerical vector according to the preset vector length.

In one embodiment, converting the text vector corresponding to each piece of financial information data into a numerical vector includes: and converting the text vector corresponding to each piece of financial information data into a numerical vector by adopting a TF-IDF algorithm.

In an embodiment, the method further comprises: and receiving a plurality of pieces of financial information data of each banking node in the kafka message queue in real time.

In an embodiment, the method further comprises:

monitoring a message updated by the user preference model in the kafka topic through the thread pool; when the information of updating the user preference model is obtained, the updated user preference model is sent to the information recommendation module;

for each user of each bank node, calculating the cosine similarity of the updated user preference model of the user and each weight vector;

after obtaining the user preference model of each user, a message of user preference model update is sent to kafka topic.

In an embodiment, the method further comprises: and performing identity authentication on each bank node.

In one embodiment, the financial information click behavior data and the financial information data are in spark DataFrame data format.

In an embodiment, the method further comprises: and encrypting the user preference model of each user of each bank node.

In an embodiment, the method further comprises:

In summary, the method provided by the embodiment of the invention has the following beneficial effects:

An embodiment of the present invention further provides a computer device, and fig. 7 is a schematic diagram of the computer device in the embodiment of the present invention, where the computer device is capable of implementing all steps in the method for recommending bank financial information in the foregoing embodiment, and the computer device specifically includes the following contents:

a processor (processor)701, a memory (memory)702, a communication Interface (Communications Interface)703, and a communication bus 704;

the processor 701, the memory 702 and the communication interface 703 complete mutual communication through the communication bus 704; the communication interface 703 is used for implementing information transmission between related devices such as server-side devices, detection devices, and user-side devices;

the processor 701 is configured to call the computer program in the memory 702, and when the processor executes the computer program, the processor implements all the steps of the bank financial information recommendation method in the above embodiment.

An embodiment of the present invention further provides a computer-readable storage medium, which can implement all the steps of the bank financial information recommendation method in the above embodiment, and the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the bank financial information recommendation method in the above embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A bank financial information recommendation system, comprising: at least one bank node, a data interaction module and a federal learning network module, wherein each bank node comprises a data acquisition module and a user preference model storage module,

2. The system of claim 1, wherein the user preference model training module comprises:

3. The system of claim 2, wherein the preference vector calculation module is specifically configured to:

4. The bank financial information recommendation system of claim 1 wherein the data preprocessing module comprises:

5. The system of claim 4, wherein the vectorization module is specifically configured to: and converting the text vector corresponding to each piece of financial information data into a numerical vector by adopting a TF-IDF algorithm.

6. The bank financial information recommendation system of claim 1 wherein the federal learning network module further comprises a financial information data receiving module for: and receiving a plurality of pieces of financial information data of each banking node in the kafka message queue in real time.

7. The bank financial information recommendation system of claim 1 wherein the federal learning network module further includes a model update module for: monitoring a message updated by the user preference model in the kafka topic through the thread pool; when the information of updating the user preference model is obtained, the updated user preference model is sent to the information recommendation module;

8. The system for recommending banking financial information as claimed in claim 1, further comprising an authentication center for: and performing identity authentication on each bank node.

9. The bank financial information recommendation system of claim 1 wherein the financial information click behavior data and financial information data are in spark DataFrame data format.

10. The bank financial information recommendation system of claim 1 wherein the banking node further comprises a data encryption module for: and encrypting the user preference model of each user of each bank node.

11. The bank financial information recommendation system of claim 1 further comprising a manual recommendation module for:

12. A method for recommending bank financial information is characterized by comprising the following steps:

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of claim 12 when executing the computer program.

14. A computer-readable storage medium storing a computer program for executing the method of claim 12.