CN113743111B

CN113743111B - Financial risk prediction method and device based on text pre-training and multi-task learning

Info

Publication number: CN113743111B
Application number: CN202010865079.XA
Authority: CN
Inventors: 郭舒; 陈桢豫; 王丽宏; 贺敏; 毛乾任; 李晨; 钟盛海; 黄洪仁
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2024-06-04
Anticipated expiration: 2040-08-25
Also published as: CN113743111A

Abstract

The application relates to a financial risk prediction method and a financial risk prediction device based on text pre-training and multi-task learning. The method comprises the following steps: acquiring a text to be processed; inputting the text to be processed into a first neural network model to determine whether the content of the text to be processed comprises financial risks according to the processing flow of the risk identification task; determining a risk type of the financial risk according to a processing flow of the risk classification task by using the first neural network model under the condition that the content of the text to be processed comprises the financial risk; and determining the risk subject matched with the risk type according to the processing flow of the risk subject identification task by using the first neural network model. The application solves the problem of poor model performance caused by lack of deep mining of semantics through a pre-training language model technology, and solves the technical problem of poor model performance caused by limited data volume and incapability of information sharing among tasks by adopting multitasking.

Description

Financial risk prediction method and device based on text pre-training and multi-task learning

Technical Field

The application relates to the technical field of risk prediction, in particular to a financial risk prediction method and device based on text pre-training and multi-task learning.

Background

With the deep application of the internet on financial services, internet finance gradually goes into the field of view of the public, and refers to the behavior of performing services such as financing and payment, related information services and the like through or depending on internet technology and tools, and the internet finance provides new information acquisition modes for finance by utilizing an internet platform, and various risk management tools and risk dispersion tools.

The current internet + financial architecture consists of traditional financial institutions and non-financial institutions. The traditional financial institutions mainly adopt internet innovation, e-commerce innovation, APP software and the like of traditional financial businesses; the non-financial institutions mainly refer to e-commerce enterprises using internet technology to perform financial operations, a (P2P) -mode network lending platform, a crowd-funding mode network investment platform, a mobile phone financial APP (financial accounting application) of financial accounting (mode), a third party payment platform and the like.

At present, on the scale of institutions and market indexes, the internet finance in China seems to have reached the front of the world, but a 'short board' still exists. While the internet finance brings convenience to people, endangering risks such as ' P2P running ', network high-interest loan and violent collection ' are continuously exposed. The Internet has the characteristics of no regional division and wide information involvement, so that the Internet financial risk spreading speed is high, and the risk cross-domain treatment difficulty is high. In addition, the striking of illegal fund collection is an important field for preventing financial risks, the current illegal fund collection form is still severe, the high occurrence of new cases and the backlog of old cases coexist, the risks of the areas and industries are concentrated, the characteristics of surfing the Internet and crossing the domains are obvious, and the fund collection participation is wide. It can be seen that early warning and prevention and control of internet financial risk are imperative and urgent.

Currently, in the related art, prediction of financial risk is one-sided. The financial risk prediction task comprises a very wide variety of research problems, and some researches focus on judging whether the financial risk prediction task has potential financial risks by utilizing the characteristics of users, companies or institutions, namely, considering the financial risk prediction as a classification problem; other studies aim at determining the financial risk level of a specific objective, namely, regarding it as a multi-classification problem; there are also studies directed to predicting financial risk scores for a company or other financial institution, i.e., treating it as a regression problem.

The conventional financial risk prediction task generally adopts quantized data as an input of a model, that is, research of conventional financial risk prediction is mostly carried out based on the quantized data, and quantization indexes of samples are directly used as characteristics to classify. For example, data such as user income, deposit amount, etc. may be taken as input in the task of predicting fraud; and the current total property, cash flow, total loan amount and the like of the bank are often adopted in predicting bank bankruptcy. However, the quantized data has the characteristics of limited data volume and difficult acquisition of non-industry personnel, and for a large amount of financial text data existing on the internet and easy to obtain, the current financial risk research is still insufficient for the utilization of the financial text data.

In addition, while analysis of financial text data to make financial risk predictions is less common, research analysis of other aspects of financial text data is also common. Such research is commonly referred to collectively as financial text mining. The original purpose of financial text mining was to analyze text data using text mining techniques to make better decisions. Currently, text mining works in the financial field are mainly used for foreign exchange rate prediction, stock market prediction, customer churn prediction and the like, and some applications in the aspect of network security, including phishing detection, spam detection, fraud detection and the like. Text mining in the financial field generally uses text data such as news headlines or news content, and performs classification tasks on the data in combination with some common machine learning algorithms (LR, SVM, DT, k-NN, NB, etc.). At present, common financial text mining works, such as foreign exchange rate prediction, stock market prediction and the like based on financial news, are mostly implemented by adopting a relatively simple method in text preprocessing and feature construction, generally adopting a word bag model to perform text preprocessing, neglecting the association between a position relation and vocabulary, and obtaining a relatively sparse word vector; in the feature construction stage, word frequency is generally used as a feature, and deep mining on semantics is lacked.

In addition, the financial risk prediction mostly adopts a method of a single model or an integrated model, so that the problem of insufficient training data in certain scenes is not well solved, and the effect of improving each task can not be optimized by fully utilizing the shared information among the tasks.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The application provides a financial risk prediction method and a financial risk prediction device based on text pre-training and multi-task learning, which are used for solving the technical problems that the model performance is poor due to the fact that the feature construction is relatively simple, the obtained word vectors are relatively sparse, the model performance is poor due to the fact that the deep mining of semantics is lacked, the data volume is limited, and the model performance is poor due to the fact that information sharing cannot be carried out between tasks.

In a first aspect, the present application provides a financial risk prediction method based on text pre-training and multi-task learning, comprising: acquiring a text to be processed, wherein the text to be processed comes from the financial field of an Internet platform; inputting a text to be processed into a first neural network model to determine whether the content of the text to be processed comprises financial risks according to the processing flow of a risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitasking learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by initializing parameters by using a plurality of unlabeled pre-training corpus, the multitasking comprises a risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises financial risks or not, and marking risk types of the financial risks under the condition of comprising the financial risks and marking risk subjects matched with the risk types; determining a risk type of the financial risk according to a processing flow of the risk classification task by using the first neural network model under the condition that the content of the text to be processed comprises the financial risk; and determining the risk subject matched with the risk type according to the processing flow of the risk subject identification task by using the first neural network model.

Optionally, before inputting the text to be processed into the first neural network model, the method further includes multitasking the second neural network model to obtain the first neural network model in the following manner: randomly determining training data of a batch from a training data pool, wherein the training data comprises training data for a risk identification task, a risk classification task and a risk subject identification task; inputting training data into the second neural network model, and continuing to train each parameter of the second neural network model on the basis of the pre-training parameters of the second neural network model; adopting an early-stop training mode, and taking the second neural network model as the first neural network model under the condition that the recognition accuracy of the second neural network model to the test data reaches an optimal value; under the condition that the recognition accuracy of the second neural network model to the test data does not reach the optimal value, training the second neural network model by using the training data is continued to adjust the numerical value of the parameters in each network layer in the second neural network model until the recognition accuracy of the second neural network model to the test data reaches the optimal value.

Optionally, before inputting the training data into the second neural network model, the method further includes pre-training the deep neural network model by using the unlabeled pre-training corpus to obtain a financial pre-training language model according to the following manner: acquiring a pre-training corpus, wherein the pre-training corpus is from the financial field of an Internet platform; preprocessing a pre-trained language material according to the input requirement of a first pre-trained language model, wherein the first pre-trained language model is a deep neural network model, and the first pre-trained language model is a pre-trained language model obtained by pre-training based on the corpus in the general field; pre-training the first pre-training language model by utilizing the pre-trained corpus after pretreatment; under the condition that the performance of the first pre-training language model on the target pre-training task reaches a target performance threshold, taking the first pre-training language model as a financial pre-training language model; under the condition that the performance of the first pre-training language model on the target pre-training task does not reach the target performance threshold, continuously pre-training the first pre-training language model by using the pre-training corpus so as to adjust the numerical value of the parameters in each network layer in the first pre-training language model until the performance of the first pre-training language model on the target pre-training task reaches the target performance threshold.

Optionally, before inputting the training data into the second neural network model, the method further comprises combining the financial pre-training language model to obtain the second neural network model as follows: and respectively adding output layers for the risk identification task, the risk classification task and the risk subject identification task to the output layers of the financial pre-training language model to obtain a second neural network model.

Optionally, before randomly determining a batch of training data from the training data pool, the method further comprises constructing the training data pool as follows: dividing training data for risk identification tasks, risk classification tasks and risk subject identification tasks into a plurality of batches according to the preset data size of each batch; and carrying out unordered mixing on the training data of all batches to obtain a training data pool.

Optionally, continuing to train the parameters of the second neural network model based on the pre-training parameters of the second neural network model includes: taking an embedded layer and an encoding layer of a second neural network model as shared parameter areas, taking each output layer of the second neural network model as a private parameter area respectively, wherein the private parameter areas comprise a first private parameter area, a second private parameter area and a third private parameter area, the first private parameter area is an output layer of a risk identification task, the second private parameter area is an output layer of a risk classification task, and the third private parameter area is an output layer of a risk subject identification task; fixing the learning rates of the first private parameter area, the second private parameter area and the third private parameter area as first learning rates, and training the second neural network model by using training data to determine a first target learning rate of the shared parameter area among a plurality of second learning rates, wherein the first target learning rate is the optimal learning rate applicable to the shared parameter area; the learning rate of the shared parameter area is fixed to be a first target learning rate, and training data is utilized to train the second neural network model so as to respectively determine second target learning rates of the first private parameter area, the second private parameter area and the third private parameter area in a target range, wherein the second target learning rate is the optimal learning rate applicable to the first private parameter area, the second private parameter area and the third private parameter area respectively.

Optionally, continuing to train the parameters of the second neural network model based on the pre-training parameters of the second neural network model further comprises: and determining target hidden layer parameters in the process of training the second neural network model through parameter sharing of the shared parameter area, wherein the target hidden layer parameters are hidden layer parameters which are simultaneously applicable to the first private parameter area, the second private parameter area and the third private parameter area.

Optionally, in a case where the accuracy of the identification of the test data by the second neural network model reaches an optimal value, the step of using the second neural network model as the first neural network model includes: acquiring first test data; inputting the first test data into a second neural network model to process the first test data according to the processing flow of the risk identification task to obtain a risk identification result output by an output layer of the risk identification task; determining a first harmonic average value of the accuracy rate and the recall rate of the risk identification result, and screening out second test data, wherein the second test data is first test data with risk marking information and risk identification result; processing the second test data by using the second neural network model according to the processing flow of the risk classification task to obtain a risk classification result output by an output layer of the risk classification task; determining the accuracy of the risk classification result and the sorting reciprocal value of the risk classification result, and screening out third test data, wherein the third test data is second test data of which the risk classification result is matched with the risk type marked by the marking information; processing the third test data according to the processing flow of the risk subject identification task by using the second neural network model to obtain a risk subject identification result output by an output layer of the risk subject identification task; determining a complete matching value of the risk subject identification result and a second harmonic mean value of the accuracy rate and recall rate of the risk subject identification result; and determining a second neural network model as the first neural network model under the condition that the first harmonic average value, the accuracy, the sorting reciprocal value, the complete matching value and the second harmonic average value reach corresponding preset indexes.

Optionally, the processing flow of the risk identification task includes: converting the text to be processed into a first mark sequence according to a preset corresponding relation; the first marking sequence passes through an embedding layer and a coding layer of a first neural network model to obtain a first semantic representation vector of the text to be processed, wherein the first semantic representation vector is a vector containing context semantic information of the text to be processed; performing linear transformation on the first semantic representation vector to obtain a second semantic representation vector, wherein the second semantic representation vector is obtained by processing private parameters of an output layer of a risk identification task; processing the second semantic representation vector by adopting a Softmax classification mode to obtain first probability distribution, wherein the first probability distribution is a probability value that the content of the text to be processed contains financial risks, and the first probability distribution is obtained through an output layer of a risk identification task; and determining whether the text to be processed contains financial risks according to the first probability distribution.

Optionally, the processing flow of the risk classification task includes: performing linear transformation on the first semantic representation vector to obtain a third semantic representation vector, wherein the third semantic representation vector is obtained by processing private parameters of an output layer of a risk classification task; processing the third semantic representation vector by adopting a Softmax classification mode to obtain second probability distribution, wherein the second probability distribution is a probability value of each type of risk type of financial risk, and the second probability distribution is obtained through an output layer of a risk classification task; and determining the risk type of the financial risk according to the second probability distribution.

Optionally, the processing flow of the risk subject identification task includes: splicing the text to be processed and the risk type of the financial risk, and converting the text to be processed and the risk type of the financial risk into a second marking sequence according to a preset corresponding relation; the second marking sequence passes through an embedding layer and a coding layer of the first neural network model to obtain a fourth semantic representation vector, wherein the fourth semantic representation vector is a vector containing the spliced text to be processed and the risk type context semantic information; linearly transforming the fourth semantic representation vector to obtain a fifth semantic representation vector, wherein the fifth semantic representation vector is obtained by processing private parameters of an output layer of a risk subject identification task; determining a third probability distribution and a fourth probability distribution by using the fifth semantic representation vector, wherein the third probability distribution is a probability value of each word vector in the fifth semantic representation vector as a starting word vector of a risk subject matched with the risk type, and the fourth probability distribution is a probability value of each word vector in the fifth semantic representation vector as a ending word vector of the risk subject matched with the risk type; and determining a risk subject matched with the risk type according to the third probability distribution and the fourth probability distribution.

In a second aspect, the present application provides a financial risk prediction apparatus based on text pre-training and multi-task learning, comprising: the acquisition module is used for acquiring a text to be processed, wherein the text to be processed is from the financial field of the Internet platform; the risk recognition module is used for inputting the text to be processed into a first neural network model to determine whether the content of the text to be processed comprises financial risks according to the processing flow of the risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitask learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by performing parameter initialization by using a plurality of unlabeled pre-training corpus, the multitasking comprises a risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises financial risks or not, and the risk model of the financial risks is marked under the condition of comprising the financial risks, and a risk subject matched with the risk type is marked; the risk classification module is used for determining the risk type of the financial risk according to the processing flow of the risk classification task by utilizing the first neural network model under the condition that the content of the text to be processed comprises the financial risk; and the risk subject identification module is used for determining a risk subject matched with the risk type according to the processing flow of the risk subject identification task by using the first neural network model.

In a third aspect, the present application provides a computer device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, the processor executing the computer program to perform the steps of any of the methods of the first aspect.

In a fourth aspect, the application also provides a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform any of the methods of the first aspect.

Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:

The application solves the problem of poor model performance caused by lack of deep mining of semantics by utilizing a pre-training language model technology, and solves the technical problem of poor model performance caused by limited data volume and incapability of information sharing among tasks by adopting multitasking. In addition, the method for multitasking reduces the total quantity of model parameters, and has the effects of saving storage space and improving the loading operation speed of the model.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it will be apparent to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

FIG. 1 is a schematic diagram of an alternative financial risk prediction method hardware environment based on text pre-training and multi-task learning according to an embodiment of the present application;

FIG. 2 is a flowchart of an alternative method for predicting financial risk based on text pre-training and multi-task learning, according to an embodiment of the present application;

FIG. 3 is a model training flow diagram for an alternative multi-task learning provided in accordance with an embodiment of the present application;

FIG. 4 is a flowchart of an alternative financial pre-training language model training provided in accordance with an embodiment of the present application;

FIG. 5 is a flowchart of an alternative training data pool construction provided in accordance with an embodiment of the present application;

FIG. 6 is an alternative parameter training flow chart provided in accordance with an embodiment of the present application;

FIG. 7 is an alternative serial model test flow chart provided in accordance with an embodiment of the present application;

FIG. 8 is a block diagram of an alternative text pre-training and multi-task learning based financial risk prediction device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.

In the related art, prediction of financial risk is one-sided. The financial risk prediction task comprises a very wide variety of research problems, and some researches focus on judging whether the financial risk prediction task has potential financial risks by utilizing the characteristics of users, companies or institutions, namely, considering the financial risk prediction as a classification problem; other studies aim at determining the financial risk level of a specific objective, namely, regarding it as a multi-classification problem; there are also studies directed to predicting financial risk scores for a company or other financial institution, i.e., treating it as a regression problem.

Most of traditional financial risk prediction researches are conducted based on quantized data, and quantization indexes of samples are directly used as characteristics to be classified. However, quantized data has a characteristic that the amount of data is limited and difficult for non-industry persons to obtain. The current financial risk research is underutilized for the vast amount of financial text data that exists and is readily available on the internet.

At present, common financial text mining works, such as foreign exchange rate prediction, stock market prediction and the like based on financial news, are mostly implemented by adopting a relatively simple method in text preprocessing and feature construction, generally adopting a word bag model to perform text preprocessing, neglecting the association between a position relation and vocabulary, and obtaining a relatively sparse word vector; in the feature construction stage, word frequency is generally used as a feature, and deep mining on semantics is lacked. The work of deeply mining semantic feature information contained in text is very rare by utilizing the current advanced natural language processing technology.

To solve the problems mentioned in the background, according to an aspect of the embodiments of the present application, an embodiment of a financial risk prediction method based on text pre-training and multi-task learning is provided.

Alternatively, in the embodiment of the present application, the above method may be applied to a hardware environment composed of the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, the server 103 is connected to the terminal 101 through a network, which may be used to provide services to the terminal or a client installed on the terminal, and a database 105 may be provided on the server or independent of the server, for providing data storage services to the server 103, where the network includes, but is not limited to: a wide area network, metropolitan area network, or local area network, and terminal 101 includes, but is not limited to, a PC, a cell phone, a tablet computer, etc.

A financial risk prediction method based on text pre-training and multi-task learning in the embodiment of the present application may be executed by the server 103, or may be executed by the server 103 and the terminal 101 together, as shown in fig. 2, the method may include the following steps:

Step S202, a text to be processed is obtained, wherein the text to be processed is from the financial field of the Internet platform.

In the embodiment of the application, the text to be processed can be text of English, chinese and other languages, and the internet platform can be an internet platform in financial fields such as financial news websites and the like.

Step S204, inputting the text to be processed into a first neural network model, determining whether the content of the text to be processed comprises financial risks according to the processing flow of the risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitasking learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by performing parameter initialization by using a plurality of unlabeled pre-training corpus, the multitasking comprises a risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises financial risks or not, and marking risk types of the financial risks and risk subjects matched with the risk types under the condition of comprising the financial risks.

In the embodiment of the application, the neural network model used by the method is trained based on the pre-training language model, so that the context associated information represented by the input text can be extracted, namely, the deep semantic features of the text can be mined. The pre-training language model is obtained after training by using a non-labeled pre-training language material, wherein the second neural network model is built on the basis of the financial pre-training language model, a multi-task learning model obtained by adding an output layer for multi-task output to the output layer of the financial pre-training language model, and the financial pre-training language model is a pre-training language model which is built on the basis of the general-field pre-training language model and is obtained by training by using a non-labeled financial text language material.

In the embodiment of the application, the marking information at least identifies whether the training data contains financial risk, and the type of the financial risk, such as 'reorganization failure', 'complaint maintenance right', 'rating adjustment', and the like, can be marked in the training data with the financial risk, and marks a main body matched with the type of the risk.

In the embodiment of the application, whether the text to be processed contains financial risks can be identified. For example, the text to be processed is: the Huifeng share pair always lives under surveillance due to suspected pollution environmental cases, and obviously, the text contains negative information of the Huifeng share company, a certain potential financial risk exists, and the model is used for identifying the negative information as a result of the risk. The financial risk recognition task is a text two-classification task.

In step S206, in the case that the content of the text to be processed includes financial risk, the risk type of the financial risk is determined according to the processing flow of the risk classification task by using the first neural network model.

In the embodiment of the application, the data which is judged to be 'at risk' in the previous step can be qualitatively classified into risk types. In the embodiment of the application, different classification systems can be established for different data sets. Preferably, the risk types established in the embodiment of the application specifically include "reorganization failure", "complaint maintenance right", "rating adjustment", "suspected running", "suspected fund collection", "suspected fraud", "actual control person stakeholder change", and the like. For example, the type of financial risk contained in "the Huifeng share pair is always living under surveillance for suspected polluted environmental cases" should be "suspected violations". Financial risk classification is a text multi-classification problem.

Step S208, determining a risk subject matched with the risk type according to the processing flow of the risk subject identification task by using the first neural network model.

In the embodiment of the application, the data of which the financial risk type is judged in the previous step can be further extracted into the text according to the risk type. For example, the type of financial risk included in "the suspected polluted environment case is always monitored and residing" is "suspected illegal", and the financial risk subject matched with "suspected illegal" should be "the suspected share".

In the embodiment of the application, a multi-task learning method is adopted in model training, testing and practical application, namely, information intercommunication and sharing are carried out among three tasks, namely, a risk identification task, a risk classification task and a risk main body identification task, compared with the design that independent model structures are respectively adopted for the three tasks, the multi-task learning structure can reduce the model parameters of the whole system to be close to one third of the original model parameters, and the multi-task learning structure is lighter and more simplified. Experiments show that compared with the mode which is independently set for each task, the performance of the multi-task learning model architecture is obviously improved.

Optionally, the processing flow of the risk identification task may include the following steps:

and step 1, converting the text to be processed into a first marking sequence according to a preset corresponding relation.

The preset correspondence may be from a vocabulary in which correspondence between kanji (or word) and a mark number is recorded.

And 2, obtaining a first semantic representation vector of the text to be processed through an embedding layer and a coding layer of the first neural network model, wherein the first semantic representation vector is a vector containing context semantic information of the text to be processed. Converting the marker sequence into a vector may be by using Word Embedding methods in Natural Language Processing (NLP), such as one-hot coding, etc.

And step 3, performing linear transformation on the first semantic representation vector to obtain a second semantic representation vector, wherein the second semantic representation vector is obtained by processing the private parameters of the output layer of the risk identification task.

The linear transformation performed in the risk identification task is obtained by processing the private parameters of the output layer of the risk identification task, so that the result of the linear transformation can be more suitable for risk identification.

And 4, processing the second semantic representation vector by adopting a Softmax classification mode to obtain first probability distribution, wherein the first probability distribution is a probability value that the content of the text to be processed contains financial risks, and the first probability distribution is obtained through an output layer of a risk identification task.

Softmax is used to minimize cross entropy between estimated classification probability and "true" distribution, resulting in normalized probability that can be used for the classification problem of risk identification tasks in embodiments of the present application.

And 5, determining whether the text to be processed contains financial risks according to the first probability distribution.

The division threshold may be set according to actual needs, for example, if the probability that the content of the text to be processed in the first probability distribution has a financial risk is 0.7, it is defined that the content of the text to be processed contains a financial risk.

Optionally, the processing flow of the risk classification task may include the following steps:

step 1, performing linear transformation on the first semantic representation vector to obtain a third semantic representation vector, wherein the third semantic representation vector is obtained by processing the private parameters of the output layer of the risk classification task.

The linear transformation performed in the risk classification task is obtained by processing the private parameters of the output layer of the risk classification task, so that the result of the linear transformation can be more suitable for risk classification.

And 2, processing the third semantic representation vector by adopting a Softmax classification mode to obtain second probability distribution, wherein the second probability distribution is a probability value of each type of risk type of financial risk, and the second probability distribution is obtained through an output layer of a risk classification task.

Softmax may also be used for multi-classification problems of risk classification tasks in embodiments of the application.

And step 3, determining the risk type of the financial risk according to the second probability distribution.

The risk type with the largest probability value may be determined as the risk type of the financial risk.

Optionally, the processing flow of the risk subject identification task may include the following steps:

And step 1, splicing the text to be processed and the risk type of the financial risk, and converting the text to be processed and the risk type of the financial risk into a second marking sequence according to a preset corresponding relation.

And 2, passing the second marking sequence through an embedded layer and an encoding layer of the first neural network model to obtain a fourth semantic representation vector, wherein the fourth semantic representation vector is a vector containing the spliced text to be processed and the risk type context semantic information.

And step 3, performing linear transformation on the fourth semantic representation vector to obtain a fifth semantic representation vector, wherein the fifth semantic representation vector is obtained by processing the private parameters of the output layer of the risk subject identification task.

The linear transformation performed in the risk subject identification task is obtained by processing the private parameters of the output layer of the risk subject identification task, so that the result of the linear transformation can be more suitable for risk subject identification.

And 4, determining a third probability distribution and a fourth probability distribution by using the fifth semantic representation vector, wherein the third probability distribution is the probability value of each word vector in the fifth semantic representation vector as the initial word vector of the risk subject matched with the risk type, and the fourth probability distribution is the probability value of each word vector in the fifth semantic representation vector as the final word vector of the risk subject matched with the risk type.

In the embodiment of the application, each word vector can correspondingly obtain two probability values, one is the probability that the word vector is the first word vector of the risk subject matched with the risk type, namely the initial word vector, and the other is the probability that the word vector is the last word vector of the risk subject matched with the risk type, namely the final word vector.

And 5, determining a risk subject matched with the risk type according to the third probability distribution and the fourth probability distribution.

In the embodiment of the application, one word vector with highest probability as a starting word vector is determined as a first word vector of a risk main body, one word vector with highest probability as a terminating word vector is determined as a last word vector of the risk main body, and the first word vector, the last word vector and the word vector between the first word vector and the last word vector are extracted to obtain the risk main body.

According to the technical scheme, in consideration of the problems that quantitative financial data are adopted in financial risk prediction, quantitative financial data are limited, a financial text mining model is single, and model performance is poor due to lack of deep mining of semantics, the application provides a financial risk prediction method based on text pre-training and multi-task learning.

The application provides a model training method for multi-task learning based on a pre-training language model, which is shown in fig. 3 and further details the technical scheme of the application.

Optionally, before the text to be processed is input into the first neural network model in step S204, the method further includes multitasking the second neural network model to obtain the first neural network model as follows:

Step S302, randomly determining training data of a batch from a training data pool, wherein the training data comprises training data for risk identification tasks, risk classification tasks and risk subject identification tasks;

Step S304, inputting training data into the second neural network model, and continuing to train each parameter of the second neural network model on the basis of the pre-training parameters of the second neural network model;

Step S306, adopting an early-stop training mode, and taking the second neural network model as the first neural network model under the condition that the recognition accuracy of the second neural network model to the test data reaches an optimal value;

Step S308, under the condition that the recognition accuracy of the second neural network model to the test data does not reach the optimal value, training the second neural network model by using the training data is continued to adjust the numerical value of the parameters in each network layer in the second neural network model until the recognition accuracy of the second neural network model to the test data reaches the optimal value.

In the embodiment of the application, a small batch data training method commonly used in a multi-task learning technology is adopted for training the model, training data are subjected to multi-round training in batches, in each training round, a batch of training data is randomly selected from a training data pool each time and is input into the model, a loss function is calculated, back propagation is carried out, model parameters are updated, and until all data are input, the round training is finished. The early-stop training mode mentioned above means training the model to perform optimally on the test set, i.e., stopping training. The second neural network model is a model with training states before and during multitasking, and the first neural network model is a model with training states after multitasking.

By adopting the technical scheme, the training of multi-task learning is carried out on the pre-training language model by utilizing the small batch of training data, so that the obtained neural network model has the capability of mining deep semantic features in texts, the recognition accuracy and speed of each task are obviously improved, and the performance of the model is improved.

The application provides a method for training a financial pre-training language model, which is shown in fig. 4 and further details the technical scheme of the application.

Optionally, before inputting the training data into the second neural network model, the method further includes pre-training the deep neural network model by using the unlabeled pre-training corpus to obtain a financial pre-training language model according to the following manner:

Step S402, a pre-training corpus is obtained, wherein the pre-training corpus is from the financial field of an Internet platform;

Step S404, preprocessing a pre-trained language material according to the input requirement of a first pre-trained language model, wherein the first pre-trained language model is a deep neural network model, and the first pre-trained language model is a pre-trained language model obtained by pre-training based on a general domain language material;

step S406, pre-training the first pre-training language model by utilizing the pre-processed pre-training corpus;

Step S408, under the condition that the performance of the first pre-training language model on the target pre-training task reaches a target performance threshold, taking the first pre-training language model as a financial pre-training language model;

In step S410, under the condition that the performance of the first pre-training language model on the target pre-training task does not reach the target performance threshold, the pre-training corpus is continuously used for pre-training the first pre-training language model so as to adjust the values of the parameters in each network layer in the first pre-training language model until the performance of the first pre-training language model on the target pre-training task reaches the target performance threshold.

The pre-training language model technology is a research hot spot in the deep learning natural language processing technology in recent years, and the core technical idea is to utilize a large number of unlabeled pre-training corpus to initialize parameters of the multi-layer deep neural network model, so that the multi-layer deep neural network model has the capability of extracting context associated information represented by an input text. When processing a downstream specific natural language processing task, only a large part of pre-trained model parameters need to be finely adjusted, and a corresponding output layer is added at the downstream of the model according to a specific task target, so that a relatively ideal task effect can be obtained.

In the embodiment of the application, the pre-training corpus can be a sufficient quantity of financial news headlines and texts crawled from the internet (specifically, portal websites containing financial news such as a newwave net, a same-flower-order net, a cloud financial and financial network and the like).

In the embodiment of the application, the first pre-training language model can be a BERT model published by Google in 2018, and can also be other pre-training language models. The general pre-training language model represented by BERT is obtained by pre-training based on the corpus in the general field. The corpus of specific fields such as finance has certain specificity and regularity in text characteristics and grammar structures, such as the occurrence of technical nouns (such as holding reduction, stopping, disclosure, cash register and the like) of the finance field, and the general characteristics of written and long difficult sentences in sentence structures. For the above problems, the model may be further pre-trained based on the BERT model using unlabeled financial corpora crawled from the internet.

Preprocessing the pre-trained corpus according to the input requirements of the first pre-trained language model may include formatting, tokenization (tokenize, i.e., converting to more easily processed markup symbols for a machine), word masking, and the like.

In the embodiment of the application, the performance of the pre-training language model on two pre-training tasks MLM (shielding language model) and NSP (next sentence prediction) is utilized to control the number of iteration and the number of rounds of training.

By adopting the technical scheme of the application, the pre-training language model with more excellent performance in the financial field task can be obtained.

Optionally, before inputting the training data into the second neural network model, the method further comprises combining the financial pre-training language model to obtain the second neural network model as follows:

And respectively adding output layers for the risk identification task, the risk classification task and the risk subject identification task to the output layers of the financial pre-training language model to obtain a second neural network model.

In the embodiment of the application, an output layer for multitasking is added to the financial pre-training language model to obtain a multitasking learning model, namely the second neural network model, so as to perform multitasking learning and prediction.

Optionally, before randomly determining a batch of training data from the training data pool, as shown in fig. 5, the method further includes constructing the training data pool as follows:

step S502, dividing training data for risk identification tasks, risk classification tasks and risk subject identification tasks into a plurality of batches according to the preset data size of each batch;

step S504, unordered mixing is carried out on the training data of all batches to obtain a training data pool.

In the embodiment of the application, a training data pool is constructed to be applied to a small batch data training method during model training. In terms of dataset construction, embodiments of the present application construct two financial risk text datasets of different granularity for experimental and validation model effects (hereinafter dataset one and dataset two). The data sources of the two data sets are respectively from a consultation system (which is issued in a CCKS2018 evaluation task IV) and a microblog micro-message crawling in the ant golden suit, the scales are 24 and 15 tens of thousands of strips respectively, the data formats are processed according to the model input requirements, risks are marked, and risk types and corresponding risk subjects are marked for the data samples with risks.

Optionally, as shown in fig. 6, in the model training process of performing the multitasking learning, continuing to train the parameters of the second neural network model based on the pre-training parameters of the second neural network model includes:

Step S602, taking an embedded layer and an encoding layer of a second neural network model as shared parameter areas, taking each output layer of the second neural network model as a private parameter area respectively, wherein the private parameter areas comprise a first private parameter area, a second private parameter area and a third private parameter area, the first private parameter area is an output layer of a risk identification task, the second private parameter area is an output layer of a risk classification task, and the third private parameter area is an output layer of a risk subject identification task;

Step S604, fixing the learning rates of the first private parameter area, the second private parameter area and the third private parameter area as a first learning rate, and training the second neural network model by using training data to determine a first target learning rate of the shared parameter area among a plurality of second learning rates, wherein the first target learning rate is an optimal learning rate applicable to the shared parameter area;

Step S606, the learning rate of the shared parameter area is fixed as a first target learning rate, and the training data is utilized to train the second neural network model so as to respectively determine second target learning rates of the first private parameter area, the second private parameter area and the third private parameter area within a target range, wherein the second target learning rate is an optimal learning rate applicable to the first private parameter area, the second private parameter area and the third private parameter area respectively.

In the embodiment of the application, a multi-task learning architecture capable of completing three task targets is provided, the multi-task learning characteristic is realized by sharing parameters of a model part, an embedded layer and a coding layer of the model are shared, and the following three tasks have respective output layers to distinguish different tasks.

Considering that the risk recognition task, the risk classification task and the risk subject recognition task may have different convergence rates in model training, and that the sharing layer has robustness in learning text representations, the embodiment of the application sets different learning rates for model sharing partial parameters (i.e. the embedded layer and the coding layer) and private partial parameters of three tasks (i.e. the respective output layers of the three tasks). In the process of respectively determining the learning rate of each part of the model, the embodiment of the application can firstly fix the parameter learning rate of the private part to be 1e-3, and then can set the learning rates of three sharing parts of 2e-5, 3e-5 and 5e-5 according to the BERT model to explore the optimal learning rate; then fixing the learning rate of the shared part as the most studied learning rate, and setting the private part parameters to 0.1 and 0.01 … to 1e-6 so as to study the scale of the private part parameters; after the private part parameter scale is determined, fine adjustment of the private part parameter learning rate is carried out according to a method of multiplying two at a time, and the final private part optimal learning rate is determined. Experiments show that the model performance under the setting is obviously superior to that of a model under the same learning rate setting adopted by a full model.

Optionally, in the model training process of performing the multi-task learning, continuing to train each parameter of the second neural network model based on the pre-training parameter of the second neural network model further includes:

and determining target hidden layer parameters in the process of training the second neural network model through parameter sharing of the shared parameter area, wherein the target hidden layer parameters are hidden layer parameters which are simultaneously applicable to the first private parameter area, the second private parameter area and the third private parameter area.

In the embodiment of the application, the risk of model overfitting can be greatly reduced by sharing the coding layer parameters, and the model can learn an hidden layer representation suitable for all three tasks in the training process.

Multitasking (Multi-TASK LEARNING) is inspired by human learning activities, and people typically apply knowledge learned from previous tasks to help learn new tasks. For example, a person who skis may learn to skate more easily than a person who does not. Therefore, by adopting the technical scheme of the application, information intercommunication and sharing are carried out among three tasks of risk identification, risk classification and risk subject identification, and the performance of the model is obviously improved.

For example, the financial text is "return on investment? Is the original strand sold? The reporter secretly visits a Chongqing gathering talent group which may have illegal funding activity suspicions. Obviously, the Chinese characters of 'illegal funding' in the sentence are the main basis for the risk recognition task to predict the sample as 'risky', and the information contained in the tokens (Token) corresponding to the Chinese characters is also the main characteristic in the final hidden layer representation; in the subsequent risk classification task, the words are also the main judgment basis for classifying the sample into the category of suspected illegal funding by the model. In the final risk subject identification task, the model can be positioned near the words of illegal fund collection, and then the corresponding event subjects are searched, so that the 'gathering lanes' can be correctly extracted as the identification result. Therefore, the information intercommunication and sharing among the tasks have positive significance for each task.

In the embodiment of the present application, the marking of training data and model output labels may be: the training data is pre-processed before being input into the model, the text is tokenized, and the labels are marked by using marking information, namely the labels are numbered (for example, in risk identification, risks and no risks are respectively numbered as 0 and 1, and in risk classification, if 16 risk categories are in total, the labels are in one-to-one correspondence with 0 to 15. At the output layer, for risk identification tasks, the model output will be either tag "0" or tag "1"; for the risk classification task, the model outputs a label number corresponding to the model predicted financial risk category; for the risk subject identification task, the model output will be the start and end positions of the model predictive answer in the input text (e.g., if the input text is "top photo (300029) or the sea charm group board line of business is checked for the side of the sea charm group board line was forced to take after Tan Li), the model predictive answer is" sea charm group ", then the model output results will be" 28 "and" 31", since 28 and 31 are the start and end positions of the" sea charm group "in the original input text).

The application provides a serial model testing method, as shown in fig. 7, and further details the technical scheme of the application.

Optionally, in a case where the accuracy of the identification of the test data by the second neural network model reaches an optimal value, the step of using the second neural network model as the first neural network model may include the steps of:

in step S702, first test data is acquired.

Step S704, inputting the first test data into the second neural network model to process the first test data according to the processing flow of the risk identification task, so as to obtain a risk identification result output by the output layer of the risk identification task.

Step S706, a first harmonic mean of the accuracy and recall of the risk identification result is determined, and second test data, which is the first test data with risk and risk identification result, is screened out.

Step S708, processing the second test data according to the processing flow of the risk classification task by using the second neural network model to obtain a risk classification result output by the output layer of the risk classification task.

Step S710, determining the accuracy of the risk classification result and the sorting reciprocal value of the risk classification result, and screening out third test data, wherein the third test data is second test data of which the risk classification result is matched with the risk type marked by the marking information.

Step S712, processing the third test data according to the processing flow of the risk subject identification task by using the second neural network model to obtain a risk subject identification result output by the output layer of the risk subject identification task.

Step S714, determining a perfect matching value of the risk subject identification result and a second harmonic mean of the accuracy rate and recall rate of the risk subject identification result.

Step S716, determining the second neural network model as the first neural network model when the first harmonic mean, the accuracy, the ranking reciprocal value, the perfect match value, and the second harmonic mean reach the corresponding preset indexes.

In the process of training the second neural network model and not training the second neural network model to obtain the first neural network model, the processing flow of the risk identification task, the processing flow of the risk classification task and the processing flow of the risk subject identification task are all performed based on a multi-task learning model which is not trained yet, namely, the second neural network model.

In the embodiment of the application, different evaluation indexes can be designed for three tasks according to task characteristics and common standards.

Risk recognition, i.e., identifying whether an input text contains financial risk, is a text two-classification task and has the characteristic of uneven numbers of positive and negative samples in experimental data and practical applications, and the number of data samples of negative samples ("no financial risk") is typically several times as large as that of positive samples ("samples with financial risk"). Referring to the commonly used evaluation index for the two classification problems with the characteristic of uneven positive and negative examples, the application can adopt the F1 value, namely the harmonic average value of the accuracy rate and the recall rate as the evaluation index of the model performance of the risk identification task.

The risk classification, namely, the qualitative classification of the risk type of the data sample which is judged to be at risk in the previous step, is a text multi-classification task, and the embodiment of the application can adopt the classified ACC value (namely Accumay, accuracy) and MRR value (namely Mean reciprocal rank, ranking reciprocal value) as the evaluation indexes of the model performance of the risk classification task.

The risk subject identification, namely, extracting a corresponding risk subject from the text according to the risk type of the data which has been judged to be of the financial risk type in the previous step, is a named entity identification task related to disambiguation. According to the embodiment of the application, the financial risk subject identification can be performed by using the machine reading understanding model, so that the ambiguity problem is solved. Therefore, with reference to an evaluation index common to machine reading understanding models, an EM value (Exact match) and an F1 value can be adopted as an evaluation index of model performance of a risk subject identification task.

On the basis of the existing parameters of the universal field pre-training language model, the application uses the financial text corpus crawled by the Internet to carry out continuous pre-training, and aims to capture the semantic features of the financial field text data in a finer granularity. Meanwhile, the application adopts a multi-task learning technical idea to design a model structure, and performs parameter sharing on an embedded layer and a coding layer of the model corresponding to the three tasks so as to realize information circulation among different tasks, solve the problem that the training data amount is insufficient possibly occurring in the actual application of the model, and further utilize the sharable information among the three tasks of risk identification, risk classification and risk main body identification.

The embodiment of the application is established on the basis of a large number of experiments, and experiments prove that the pre-training language model (hereinafter referred to as a financial pre-training language model) obtained by the application has more excellent performance on natural language processing tasks in the financial field: on the first data set, keeping other experimental conditions the same, the financial pre-training language model can obtain 0.5% F1 value performance improvement (93.1% -93.6%) on the risk identification task; in the risk classification task, the ACC index is improved by 0.9 percent (90.0-90.9 percent), and the MRR index is improved by 1.2 percent (91.0-92.2 percent); in the risk subject identification task, the F1 index is improved by 2.4 percent (71.3 to 73.7 percent), and the EM index is improved by 2.7 percent (58.4 to 61.1 percent).

Experiments show that compared with the mode which is independently set for each task, the multi-task learning model architecture design has the advantages that the performance of each task is basically improved, and the overall performance of the mode is obviously improved. On the first data set, keeping other experimental conditions the same, the performance of the multi-task learning model is slightly reduced (93.1% -93.0%) on the risk identification task compared with the single model; in the risk classification task, the ACC index is improved by 0.6 percent (90.0-90.6 percent), and the MRR index is improved by 1.2 percent (91.0-91.9 percent); in the risk subject identification task, the F1 index is improved by 0.1 percent (71.3 to 71.4 percent), and the EM index is improved by 1.5 percent (58.4 to 59.9 percent). On the second data set, keeping other experimental conditions the same, the performance of the multi-task learning model is slightly reduced (99.9% -99.7%) on the risk identification task compared with the single model; in the risk classification task, the ACC index is improved by 0.6 percent (88.2 to 88.8 percent), and the MRR index is improved by 0.2 percent (93.1 to 93.3 percent); in the risk subject identification task, the F1 index is improved by 1.8 percent (74.1 to 75.9 percent), and the EM index is improved by 0.7 percent (54.6 to 55.3 percent).

In the embodiment of the application, unlike the sequence labeling method commonly used in the traditional named entity recognition task, the embodiment of the application adopts a machine reading and understanding mode to disambiguate when extracting the main body because the corresponding relation with the risk type is also involved. For example, the text to be treated is "western-style diet: the controlled stockholder share is transferred, the assignee main body is changed into a fuqiao city culture group, the financial risk type contained in the text is 'real controlled person stockholder change', the entities existing in the text are 'western security diet' and 'fuqiao city culture group', and the risk event main body corresponding to the 'real controlled person stockholder change' is only 'fuqiao city culture group' instead of 'western security diet'. Compared with the traditional named entity recognition problem, the application has the capability of distinguishing whether the extracted named entity correctly corresponds to the risk type or not by adopting a machine reading and understanding mode, namely, the application can disambiguate, thereby ensuring that the recognition result is more accurate.

According to still another aspect of the embodiment of the present application, as shown in fig. 8, there is provided a financial risk prediction apparatus based on text pre-training and multi-task learning, including: the acquiring module 801 is configured to acquire a text to be processed, where the text to be processed is from the financial field of the internet platform; the risk recognition module 803 is configured to input a text to be processed into a first neural network model, so as to determine whether the content of the text to be processed includes financial risks according to a processing flow of a risk recognition task, where the first neural network model is obtained by performing multitasking training on a second neural network model by using training data with marking information, the second neural network model is a multitask learning model combined with a financial pretraining language model, the financial pretraining language model is a pretraining language model obtained by performing parameter initialization by using a plurality of unlabeled pretraining corpuses, the multitasking includes a risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used to mark whether the content of the training data includes financial risks, and is also used to mark risk types of the financial risks in case of including the financial risks, and mark risk subjects matched with the risk types; the risk classification module 805 is configured to determine, when the content of the text to be processed includes financial risk, a risk type of the financial risk according to a processing flow of the risk classification task using the first neural network model; and the risk subject identifying module 807 is configured to determine, according to a processing flow of the risk subject identifying task, a risk subject matching the risk type using the first neural network model.

It should be noted that, the acquiring module 801 in this embodiment may be used to perform step S202 in the embodiment of the present application, the risk identifying module 803 in this embodiment may be used to perform step S204 in the embodiment of the present application, the risk classifying module 805 in this embodiment may be used to perform step S206 in the embodiment of the present application, and the risk subject identifying module 807 in this embodiment may be used to perform step S208 in the embodiment of the present application.

It should be noted that the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the above embodiments. It should be noted that the above modules may be implemented in software or hardware as a part of the apparatus in the hardware environment shown in fig. 1.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the training data extraction module is used for randomly determining training data of a batch from the training data pool, wherein the training data comprises training data for risk identification tasks, risk classification tasks and risk subject identification tasks; the first training module is used for inputting training data into the second neural network model and continuously training various parameters of the second neural network model on the basis of the pre-training parameters of the second neural network model; the second training module is used for taking the second neural network model as the first neural network model under the condition that the recognition accuracy of the second neural network model to the test data reaches an optimal value by adopting an early-stop training mode; and the third training module is used for continuing to train the second neural network model by using the training data under the condition that the recognition accuracy of the second neural network model to the test data does not reach the optimal value so as to adjust the numerical value of the parameters in each network layer in the second neural network model until the recognition accuracy of the second neural network model to the test data reaches the optimal value.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the pre-training corpus acquisition module is used for acquiring pre-training corpus, wherein the pre-training corpus is from the financial field of the Internet platform; the preprocessing module is used for preprocessing a pre-trained language material according to the input requirement of a first pre-trained language model, wherein the first pre-trained language model is a deep neural network model, and the first pre-trained language model is a pre-trained language model obtained by pre-training based on the general domain language material; the first pre-training module is used for pre-training the first pre-training language model by utilizing the pre-processed pre-training corpus; the second pre-training module is used for taking the first pre-training language model as a financial pre-training language model under the condition that the performance of the first pre-training language model on a target pre-training task reaches a target performance threshold; and the third pre-training module is used for continuously pre-training the first pre-training language model by using the pre-training corpus under the condition that the performance of the first pre-training language model on the target pre-training task does not reach the target performance threshold value, so as to adjust the numerical value of the parameters in each network layer in the first pre-training language model until the performance of the first pre-training language model on the target pre-training task reaches the target performance threshold value.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the multi-task processing module is used for respectively adding output layers for the risk identification task, the risk classification task and the risk subject identification task to the output layers of the financial pre-training language model to obtain a second neural network model.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the batch dividing module is used for dividing training data for the risk identification task, the risk classification task and the risk subject identification task into a plurality of batches according to the preset data size of each batch; and the mixing module is used for carrying out unordered mixing on the training data of all batches to obtain a training data pool.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the parameter area determining module is used for taking an embedded layer and an encoding layer of the second neural network model as shared parameter areas, taking each output layer of the second neural network model as a private parameter area respectively, wherein the private parameter areas comprise a first private parameter area, a second private parameter area and a third private parameter area, the first private parameter area is an output layer of a risk identification task, the second private parameter area is an output layer of a risk classification task, and the third private parameter area is an output layer of a risk subject identification task; the shared parameter area learning rate determining module is used for fixing the learning rates of the first private parameter area, the second private parameter area and the third private parameter area to be first learning rates, training the second neural network model by using training data so as to determine a first target learning rate of the shared parameter area in a plurality of second learning rates, wherein the first target learning rate is the optimal learning rate applicable to the shared parameter area; the private parameter area learning rate determining module is used for fixing the learning rate of the shared parameter area as a first target learning rate, and training the second neural network model by utilizing training data to respectively determine second target learning rates of the first private parameter area, the second private parameter area and the third private parameter area within a target range, wherein the second target learning rate is an optimal learning rate respectively applicable to the first private parameter area, the second private parameter area and the third private parameter area.

Optionally, the financial risk prediction device based on text pre-training and multi-task learning further comprises: the parameter sharing module is used for determining target hidden layer parameters in the process of training the second neural network model through parameter sharing of the shared parameter area, wherein the target hidden layer parameters are hidden layer parameters which are simultaneously applicable to the first private parameter area, the second private parameter area and the third private parameter area.

Optionally, the second training module is further configured to: acquiring first test data; inputting the first test data into a second neural network model to process the first test data according to the processing flow of the risk identification task to obtain a risk identification result output by an output layer of the risk identification task; determining a first harmonic average value of the accuracy rate and the recall rate of the risk identification result, and screening out second test data, wherein the second test data is first test data with risk marking information and risk identification result; processing the second test data by using the second neural network model according to the processing flow of the risk classification task to obtain a risk classification result output by an output layer of the risk classification task; determining the accuracy of the risk classification result and the sorting reciprocal value of the risk classification result, and screening out third test data, wherein the third test data is second test data of which the risk classification result is matched with the risk type marked by the marking information; processing the third test data according to the processing flow of the risk subject identification task by using the second neural network model to obtain a risk subject identification result output by an output layer of the risk subject identification task; determining a complete matching value of the risk subject identification result and a second harmonic mean value of the accuracy rate and recall rate of the risk subject identification result; and determining a second neural network model as the first neural network model under the condition that the first harmonic average value, the accuracy, the sorting reciprocal value, the complete matching value and the second harmonic average value reach corresponding preset indexes.

Optionally, the risk identification module is further configured to: converting the text to be processed into a first mark sequence according to a preset corresponding relation; the first marking sequence passes through an embedding layer and a coding layer of a first neural network model to obtain a first semantic representation vector of the text to be processed, wherein the first semantic representation vector is a vector containing context semantic information of the text to be processed; performing linear transformation on the first semantic representation vector to obtain a second semantic representation vector, wherein the second semantic representation vector is obtained by processing private parameters of an output layer of a risk identification task; processing the second semantic representation vector by adopting a Softmax classification mode to obtain first probability distribution, wherein the first probability distribution is a probability value that the content of the text to be processed contains financial risks, and the first probability distribution is obtained through an output layer of a risk identification task; and determining whether the text to be processed contains financial risks according to the first probability distribution.

Optionally, the risk classification module is further configured to: performing linear transformation on the first semantic representation vector to obtain a third semantic representation vector, wherein the third semantic representation vector is obtained by processing private parameters of an output layer of a risk classification task; processing the third semantic representation vector by adopting a Softmax classification mode to obtain second probability distribution, wherein the second probability distribution is a probability value of each type of risk type of financial risk, and the second probability distribution is obtained through an output layer of a risk classification task; and determining the risk type of the financial risk according to the second probability distribution.

Optionally, the risk subject identification module is further configured to: splicing the text to be processed and the risk type of the financial risk, and converting the text to be processed and the risk type of the financial risk into a second marking sequence according to a preset corresponding relation; the second marking sequence passes through an embedding layer and a coding layer of the first neural network model to obtain a fourth semantic representation vector, wherein the fourth semantic representation vector is a vector containing the spliced text to be processed and the risk type context semantic information; linearly transforming the fourth semantic representation vector to obtain a fifth semantic representation vector, wherein the fifth semantic representation vector is obtained by processing private parameters of an output layer of a risk subject identification task; determining a third probability distribution and a fourth probability distribution by using the fifth semantic representation vector, wherein the third probability distribution is a probability value of each word vector in the fifth semantic representation vector as a starting word vector of a risk subject matched with the risk type, and the fourth probability distribution is a probability value of each word vector in the fifth semantic representation vector as a ending word vector of the risk subject matched with the risk type; and determining a risk subject matched with the risk type according to the third probability distribution and the fourth probability distribution.

According to yet another aspect of the embodiments of the present application, there is also provided a computer device including a memory, a processor, the memory storing a computer program executable on the processor, the processor implementing the above steps when executing the computer program.

The memory and the processor in the computer device communicate with the communication interface through a communication bus. The communication bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

There is also provided in accordance with yet another aspect of an embodiment of the present application a computer readable medium having non-volatile program code executable by a processor.

Optionally, in an embodiment of the present application, the computer readable medium is arranged to store program code for the processor to:

Step S202, acquiring a text to be processed, wherein the text to be processed is from the financial field of an Internet platform;

Step S204, inputting a text to be processed into a first neural network model, so as to determine whether the content of the text to be processed comprises financial risks according to the processing flow of a risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitasking learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by performing parameter initialization by using a plurality of unlabeled pre-training corpus, the multitasking comprises a risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises financial risks, and is also used for marking risk types of the financial risks under the condition of comprising the financial risks, and marking risk subjects matched with the risk types;

step S206, determining the risk type of the financial risk according to the processing flow of the risk classification task by utilizing the first neural network model under the condition that the content of the text to be processed comprises the financial risk;

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.

When the embodiment of the application is specifically implemented, the above embodiments can be referred to, and the application has corresponding technical effects.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application SPECIFIC INTEGRATED Circuits (ASICs), digital signal processors (DIGITAL SIGNAL Processing, DSPs), digital signal Processing devices (DSP DEVICE, DSPD), programmable logic devices (Programmable Logic Device, PLDs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units for performing the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in essence or a part contributing to the prior art or a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc. It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A financial risk prediction method based on text pre-training and multi-task learning, comprising:

Acquiring a text to be processed, wherein the text to be processed is from the financial field of an Internet platform;

Inputting the text to be processed into a first neural network model to determine whether the content of the text to be processed comprises financial risks according to the processing flow of a risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitask learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by performing parameter initialization by using a plurality of unlabeled pre-training corpus, the multitasking comprises the risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises the financial risks, and the marking information is also used for marking the risk types of the financial risks and marking risk subjects matched with the risk types under the condition of comprising the financial risks;

Determining a risk type of the financial risk according to the processing flow of the risk classification task by using the first neural network model under the condition that the content of the text to be processed comprises the financial risk;

and determining the risk subject matched with the risk type according to the processing flow of the risk subject identification task by using the first neural network model.

2. The method of claim 1, wherein prior to inputting the text to be processed into a first neural network model, the method further comprises training the second neural network model for the multitasking to obtain the first neural network model in the following manner:

Randomly determining a batch of the training data from a training data pool, wherein the training data comprises training data for the risk identification task, the risk classification task and the risk subject identification task;

Inputting the training data into the second neural network model, and continuing to train various parameters of the second neural network model on the basis of pre-training parameters of the second neural network model;

Adopting an early-stop training mode, and taking the second neural network model as the first neural network model under the condition that the recognition accuracy of the second neural network model to the test data reaches an optimal value;

And under the condition that the recognition accuracy of the second neural network model to the test data does not reach the optimal value, continuing to train the second neural network model by using the training data to adjust the numerical value of the parameters in each network layer in the second neural network model until the recognition accuracy of the second neural network model to the test data reaches the optimal value.

3. The method of claim 2, wherein prior to inputting the training data into the second neural network model, the method further comprises pre-training a deep bi-directional language model with the unlabeled pre-training corpus to obtain the financial pre-training language model as follows:

acquiring the pre-training corpus, wherein the pre-training corpus is from the financial field of an Internet platform;

preprocessing the pre-trained language material according to the input requirement of a first pre-trained language model, wherein the first pre-trained language model is the deep bi-directional language model, and the first pre-trained language model is a pre-trained language model obtained by pre-training based on the general domain language material;

pre-training the first pre-training language model by utilizing the pre-trained corpus after pretreatment;

under the condition that the performance of the first pre-training language model on a target pre-training task reaches a target performance threshold, taking the first pre-training language model as the financial pre-training language model;

And under the condition that the performance of the first pre-training language model on the target pre-training task does not reach an optimal value, continuously pre-training the first pre-training language model by using the pre-training corpus so as to adjust the numerical value of parameters in each network layer in the first pre-training language model until the performance of the first pre-training language model on the target pre-training task reaches the optimal value.

4. The method of claim 3, wherein prior to inputting the training data into the second neural network model, the method further comprises combining the financial pre-training language model to obtain the second neural network model as follows:

And respectively adding output layers for the risk identification task, the risk classification task and the risk subject identification task to the output layers of the financial pre-training language model to obtain the second neural network model.

5. The method of claim 2, wherein prior to randomly determining a batch of the training data from a training data pool, the method further comprises constructing the training data pool as follows:

dividing the training data for the risk identification task, the risk classification task and the risk subject identification task into a plurality of batches according to the preset data size of each batch;

And carrying out unordered mixing on the training data of all batches to obtain the training data pool.

6. The method of claim 2, wherein continuing to train parameters of the second neural network model based on pre-training parameters of the second neural network model comprises:

Taking an embedded layer and an encoding layer of the second neural network model as shared parameter areas, respectively taking each output layer of the second neural network model as a private parameter area, wherein the private parameter areas comprise a first private parameter area, a second private parameter area and a third private parameter area, the first private parameter area is an output layer of the risk identification task, the second private parameter area is an output layer of the risk classification task, and the third private parameter area is an output layer of the risk subject identification task;

Fixing the learning rates of the first private parameter area, the second private parameter area and the third private parameter area to be first learning rates, and training the second neural network model by utilizing the training data to determine a first target learning rate of the shared parameter area in a plurality of second learning rates, wherein the first target learning rate is an optimal learning rate applicable to the shared parameter area;

And fixing the learning rate of the shared parameter area as the first target learning rate, and training the second neural network model by utilizing the training data to respectively determine second target learning rates of the first private parameter area, the second private parameter area and the third private parameter area in a target range, wherein the second target learning rate is the optimal learning rate respectively applicable to the first private parameter area, the second private parameter area and the third private parameter area.

7. The method of claim 6, wherein continuing to train parameters of the second neural network model based on pre-training parameters of the second neural network model further comprises:

8. The method of claim 2, wherein in the case where the accuracy of the identification of the test data by the second neural network model reaches an optimal value, taking the second neural network model as the first neural network model comprises:

Acquiring first test data;

Inputting the first test data into the second neural network model to process the first test data according to the processing flow of the risk identification task to obtain a risk identification result output by an output layer of the risk identification task;

Determining a first harmonic mean value of the accuracy rate and the recall rate of the risk identification result, and screening out second test data, wherein the second test data are the first test data with risk for the marking information and the risk identification result;

Processing the second test data according to the processing flow of the risk classification task by using the second neural network model to obtain a risk classification result output by an output layer of the risk classification task;

Determining the accuracy of the risk classification result and the sorting reciprocal value of the risk classification result, and screening out third test data, wherein the third test data is the second test data of which the risk classification result is matched with the risk type marked by the marking information;

processing the third test data according to the processing flow of the risk subject identification task by using the second neural network model to obtain a risk subject identification result output by an output layer of the risk subject identification task;

Determining a complete matching value of the risk subject identification result and a second harmonic mean value of the accuracy rate and recall rate of the risk subject identification result;

and determining the second neural network model as the first neural network model under the condition that the first harmonic average value, the accuracy, the ranking reciprocal value, the complete matching value and the second harmonic average value reach corresponding preset indexes.

9. The method according to any one of claims 1 to 8, wherein the process flow of the risk identification task comprises:

Converting the text to be processed into a first mark sequence according to a preset corresponding relation;

The first marking sequence passes through an embedding layer and a coding layer of the first neural network model to obtain a first semantic representation vector of the text to be processed, wherein the first semantic representation vector is a vector containing context semantic information of the text to be processed;

performing linear transformation on the first semantic representation vector to obtain a second semantic representation vector, wherein the second semantic representation vector is obtained by processing private parameters of an output layer of the risk identification task;

Processing the second semantic representation vector by adopting a Softmax classification mode to obtain a first probability distribution, wherein the first probability distribution is a probability value that the content of the text to be processed contains the financial risk, and the first probability distribution is obtained through an output layer of the risk identification task;

And determining whether the text to be processed contains the financial risk according to the first probability distribution.

10. The method of claim 9, wherein the process flow of the risk classification task comprises:

Performing linear transformation on the first semantic representation vector to obtain a third semantic representation vector, wherein the third semantic representation vector is obtained by processing private parameters of an output layer of the risk classification task;

Processing the third semantic representation vector by adopting a Softmax classification mode to obtain a second probability distribution, wherein the second probability distribution is a probability value of each type of the risk type of the financial risk, and the second probability distribution is obtained through an output layer of the risk classification task;

the risk type of the financial risk is determined from the second probability distribution.

11. The method of claim 10, wherein the process flow of the risk subject identification task comprises:

Splicing the text to be processed and the risk type of the financial risk, and converting the text to be processed and the risk type of the financial risk into a second marking sequence according to the preset corresponding relation;

The second marking sequence passes through an embedding layer and a coding layer of the first neural network model to obtain a fourth semantic representation vector, wherein the fourth semantic representation vector is a vector containing the spliced text to be processed and the risk type context semantic information;

Performing linear transformation on the fourth semantic representation vector to obtain a fifth semantic representation vector, wherein the fifth semantic representation vector is obtained by processing private parameters of an output layer of the risk subject identification task;

Determining a third probability distribution and a fourth probability distribution using the fifth semantic representation vector, wherein the third probability distribution is a probability value for each word vector in the fifth semantic representation vector as a starting word vector of the risk subject matching the risk type, and the fourth probability distribution is a probability value for each word vector in the fifth semantic representation vector as a ending word vector of the risk subject matching the risk type;

Determining the risk subject matching the risk type according to the third probability distribution and the fourth probability distribution.

12. A financial risk prediction apparatus based on text pre-training and multi-task learning, comprising:

the acquisition module is used for acquiring a text to be processed, wherein the text to be processed is from the financial field of the Internet platform;

The risk recognition module is used for inputting the text to be processed into a first neural network model to determine whether the content of the text to be processed comprises financial risks according to the processing flow of a risk recognition task, wherein the first neural network model is obtained by performing multitasking training on a second neural network model by training data with marking information, the second neural network model is a multitask learning model combined with a financial pre-training language model, the financial pre-training language model is a pre-training language model obtained by performing parameter initialization by using a plurality of non-marked pre-training corpus, the multitasking comprises the risk recognition task, a risk classification task and a risk subject recognition task, the marking information is used for marking whether the content of the training data comprises the financial risks, and the marking information is also used for marking risk types of the financial risks and risk subjects matched with the risk types under the condition of comprising the financial risks;

The risk classification module is used for determining the risk type of the financial risk according to the processing flow of the risk classification task by utilizing the first neural network model under the condition that the content of the text to be processed comprises the financial risk;

And the risk subject identification module is used for determining a risk subject matched with the risk type according to the processing flow of the risk subject identification task by utilizing the first neural network model.

13. A computer device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method of any of the preceding claims 1 to 11.

14. A computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1 to 11.