CN113435998A - Loan overdue prediction method and device, electronic equipment and storage medium - Google Patents

Loan overdue prediction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113435998A
CN113435998A CN202110695341.5A CN202110695341A CN113435998A CN 113435998 A CN113435998 A CN 113435998A CN 202110695341 A CN202110695341 A CN 202110695341A CN 113435998 A CN113435998 A CN 113435998A
Authority
CN
China
Prior art keywords
text
intention
question
target
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110695341.5A
Other languages
Chinese (zh)
Other versions
CN113435998B (en
Inventor
杨翰章
吴育人
庄伯金
刘玉宇
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110695341.5A priority Critical patent/CN113435998B/en
Publication of CN113435998A publication Critical patent/CN113435998A/en
Application granted granted Critical
Publication of CN113435998B publication Critical patent/CN113435998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a loan overdue prediction method, a loan overdue prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: preprocessing the interview dialog text to obtain a first question and answer text set, inputting the first question and answer text set into a trained intention point recognition model to obtain an intention point of each question and answer text; merging a plurality of question and answer texts in the first question and answer text set to obtain a target paragraph text of each intention point; inputting the target paragraph text into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value; and predicting whether the target client is a client with overdue loan. According to the loan overdue prediction method, the Loss function Focal local is introduced, overdue customer samples are emphasized in the training process, model overfitting is restrained, an optimal target model is obtained, and the loan overdue prediction accuracy is improved.

Description

Loan overdue prediction method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a loan overdue prediction method and device, electronic equipment and a storage medium.
Background
Aiming at loan business, whether a client will be overdue or not needs to be predicted before loan, in the prior art model, when whether the client is overdue or not is predicted, the basic loan information of the client is generally used for constructing shallow statistical characteristics in a limited way, and the accuracy of overdue prediction is low due to the fact that the breadth and the depth of characteristic extraction are insufficient.
In addition, in the prior art, a pipeline type (pipeline) prediction model is usually adopted, the cascade mode depends on complex characteristic engineering, and the model training effect is poor due to insufficient expert experience, so that the overdue prediction accuracy is low.
Therefore, it is necessary to provide a method for predicting loan overdue quickly and accurately.
Disclosure of Invention
In view of the above, there is a need for a loan overdue prediction method, apparatus, electronic device and storage medium, which introduce a Loss function Focal local, and in the training process, focus on overdue customer samples, suppress model overfitting, obtain an optimal target model, and improve the accuracy of loan overdue prediction.
A first aspect of the invention provides a loan overdue prediction method, the method comprising:
receiving a face examination dialogue text of a target client, and preprocessing the face examination dialogue text to obtain a first question and answer text set, wherein the first question and answer text set comprises a plurality of question and answer texts;
acquiring a historical review dialog text set, and training an intention point identification model based on the historical review dialog text set to obtain a trained intention point identification model;
inputting the first question and answer text set into the trained intention point recognition model to obtain the intention points of each question and answer text;
merging a plurality of question and answer texts in the first question and answer text set according to the intention points of the plurality of question and answer texts to obtain a target paragraph text of each intention point;
inputting target paragraph texts with a plurality of intention main points into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value, wherein the target model comprises a BERT model and a convolutional neural network model;
and predicting whether the target client is a loan overdue client or not based on the target overdue prediction probability value.
Optionally, the target model training process based on the Focal local function includes:
acquiring a pre-trained BERT model, and inputting the target paragraph texts of the plurality of intention key points into the pre-trained BERT model to obtain a plurality of word embedding vectors;
constructing a convolutional neural network, inputting the word embedding vectors into the constructed convolutional neural network for convolution operation to obtain a first tensor, wherein the convolutional neural network comprises a full connection layer and a softmax layer;
inputting the first tensor into a full connection layer through residual connection for feature extraction to obtain a second tensor;
inputting the second tensor into a softmax layer for mapping, and acquiring the overdue prediction probability value of the target client;
performing Loss calculation by adopting a Focal local function according to the overdue prediction probability value, and updating model parameters in the pre-trained BERT model and the constructed convolutional neural network according to a Loss calculation result to obtain an updated pre-trained BERT model and an updated convolutional neural network;
and training the updated pre-trained BERT model and the updated convolutional neural network to obtain a target model based on a Focal local function.
Optionally, the building process of the convolutional neural network includes:
acquiring preset configuration parameters, wherein the preset configuration parameters comprise convolution unit configuration parameters, activation unit configuration parameters, pooling unit configuration parameters, convolution layer configuration parameters and full-connection layer configuration parameters, the convolution layer configuration parameters comprise 5 convolution blocks, and each convolution block comprises 1 convolution kernel with the size of 3 × 3 and 1 convolution kernel with the size of 5 × 5;
configuring a convolution unit according to the configuration parameters of the convolution unit, configuring an activation unit according to the configuration parameters of the activation unit, configuring a pooling unit according to the configuration parameters of the pooling unit, configuring a convolution layer according to the configuration parameters of the convolution layer and configuring a full-connection layer according to the configuration parameters of the full-connection layer;
and constructing a convolutional neural network according to the configured convolution unit, the activation unit, the pooling unit, the convolutional layer and the full connection layer.
Optionally, the preprocessing the interview dialog text to obtain a first question and answer text set includes:
removing the special symbols in the face-examination dialog text to obtain a target dialog text;
sorting the target dialogue texts according to a preset sorting mode to obtain a plurality of target question-answer sentences;
and counting the target question and answer sentences to obtain a first question and answer text set.
Optionally, the training of the intention point recognition model based on the historical review dialog text set includes:
acquiring historical audit dialog texts corresponding to a plurality of historical clients as a historical audit dialog text set;
preprocessing the historical review dialog text set to obtain a target review dialog text set;
sorting the target interview dialog text set according to a preset sorting mode to obtain a second question and answer text set;
according to a preset intention point set, carrying out intention point labeling on each question and answer text in the second question and answer text set according to a preset labeling mode to obtain a labeled corpus, and screening the labeled corpus to obtain a labeled corpus corresponding to the key intention points;
sorting the labeled corpus corresponding to the key intention key points into a training set and a test set;
inputting the training set into a preset neural network for training to obtain an intention point recognition model;
inputting the test set into the intention point identification model for testing, and calculating a test passing rate;
if the test passing rate is larger than or equal to a preset passing rate threshold value, determining that the training of the intention point recognition model is finished; and if the test passing rate is smaller than the preset passing rate threshold value, increasing the number of the training sets, and re-training the intention point recognition model.
Optionally, the right is to screen the labeled corpus, and the labeled corpus corresponding to the key intention key points includes:
merging the question and answer texts with the same intention points in the labeled corpus set to obtain a question and answer text of each intention point;
calculating the text length of the question and answer text of each intention point;
judging whether the text length of the question answer of each intention point is larger than a preset text length threshold value corresponding to the intention point;
when the text length of the question and answer of each intention point is larger than or equal to the preset text length threshold value of the corresponding intention point, counting the frequency proportion of each intention point in the face-up dialogue text set;
sorting the frequency occupation ratios in a descending order;
and selecting a plurality of previously sequenced intention main points from the descending sequencing result as target intention main points, and determining a question and answer text corresponding to the target intention main points as a labeling corpus corresponding to the key intention main points.
Optionally, the merging the question and answer texts in the first question and answer text set to obtain the target paragraph text of each of the intended main points includes:
merging the question and answer texts with the same intention points in the first question and answer text set to obtain a paragraph text of each intention point;
calculating the text length of the paragraph text of each intention point;
judging whether the text length of the paragraph text of each intention point is larger than a preset intention point paragraph threshold value;
when the text length of the paragraph text of each intention point is greater than or equal to the preset intention point paragraph threshold value, truncating the paragraph text of each intention point according to the preset intention point paragraph threshold value to obtain a target paragraph text of each intention point; or
And when the text length of the paragraph text of each intention point is less than the preset intention point paragraph threshold value, filling the paragraph text of each intention point according to preset symbols to obtain the target paragraph text of each intention point.
A second aspect of the invention provides a loan overdue prediction apparatus, the apparatus comprising:
the system comprises a preprocessing module, a query module and a query module, wherein the preprocessing module is used for receiving an interview dialog text of a target client and preprocessing the interview dialog text to obtain a first question and answer text set, and the first question and answer text set comprises a plurality of question and answer texts;
the training module is used for acquiring a historical review dialog text set, training an intention point identification model based on the historical review dialog text set and obtaining a trained intention point identification model;
the first input module is used for inputting the first question and answer text set into the trained intention point recognition model to obtain the intention points of each question and answer text;
the merging module is used for merging the question and answer texts in the first question and answer text set according to the intention points of the question and answer texts to obtain a target paragraph text of each intention point;
the second input module is used for inputting the target paragraph texts of a plurality of intention main points into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value, wherein the target model comprises a BERT model and a convolutional neural network model;
and the prediction module is used for predicting whether the target client is a loan overdue client or not based on the target overdue prediction probability value.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the loan overdue prediction method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the loan overdue prediction method.
In summary, according to the loan overdue prediction method, the loan overdue prediction device, the electronic device and the storage medium, on one hand, redundant information in the audit dialogue text is screened out by acquiring the audit dialogue text between the seat and the historical client instead of the client side information and screening the audit dialogue text according to the key points of intention, and meanwhile, question and answer texts corresponding to the key points of intention are retained, so that comprehensive and clean input data are provided for training of a subsequent key point of intention recognition model, and the accuracy and the recall rate of subsequent model loan overdue prediction are improved in an actual loan service scene; on the other hand, by introducing a Loss function Focal local, in the process of training the updated pre-trained BERT model and the updated convolutional neural network, overdue client samples are emphasized, model overfitting is inhibited, an optimal target model is obtained, the accuracy of the target overdue prediction probability value is improved, and the accuracy of the loan overdue prediction is further improved; and finally, the target paragraph text of each intention point is obtained by cutting or filling the paragraph text of each intention point, so that the unification of the input data format of the subsequent model is ensured, and the loan overdue prediction accuracy of the subsequent model is improved.
Drawings
Fig. 1 is a flowchart of a loan overdue prediction method according to an embodiment of the present invention.
Fig. 2 is a block diagram of a loan overdue prediction apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example one
Fig. 1 is a flowchart of a loan overdue prediction method according to an embodiment of the present invention.
In this embodiment, the loan overdue prediction method may be applied to an electronic device, and for an electronic device that needs to perform loan overdue prediction, the loan overdue prediction function provided by the method of the present invention may be directly integrated on the electronic device, or may be operated in the electronic device in the form of a Software Development Kit (SDK).
As shown in fig. 1, the loan overdue prediction method specifically includes the following steps, and the order of the steps in the flowchart may be changed and some steps may be omitted according to different requirements.
S11, receiving a face examination dialogue text of a target client, preprocessing the face examination dialogue text to obtain a first question and answer text set, wherein the first question and answer text set comprises a plurality of question and answer texts.
In this embodiment, during loan, a target client needs to be subjected to loan overdue prediction, an audit dialogue text of the target client and an agent is received, and whether the target client will have a loan prediction phenomenon is predicted according to the audit dialogue text.
In this embodiment, when a face examination dialog is performed, redundant text information may exist in an obtained face examination dialog text, and the contents of the face examination dialog text that are unrelated to the client tag need to be filtered and screened, and the face examination dialog text is collated to obtain a first question and answer text set.
In an optional embodiment, the preprocessing the interview dialog text to obtain a first question and answer text set includes:
removing the special symbols in the face-examination dialog text to obtain a target dialog text;
sorting the target dialogue texts according to a preset sorting mode to obtain a plurality of target question-answer sentences;
and counting the target question and answer sentences to obtain a first question and answer text set.
In this embodiment, the special symbol may include: the implementation of the method can preset an arrangement mode, specifically, the preset arrangement mode can arrange the interview dialog in a question-and-answer mode, and when the target dialog text is arranged according to the preset arrangement mode, the number of sentences of the seat is equal to the number of sentences of the target customer, so that one-to-one correspondence is ensured.
Illustratively, the facial examination dialog text is preprocessed, special symbols in the facial examination dialog text are removed, and the agent and the target client are enabled to carry out facial examination dialog textThe sentences are arranged into a set form of question-answer, and the first question-answer text set is { (send _ agent)k,sent_customerk) Wherein the statement set of the seat is { send _ agent }kThe statement set of the target client is { send _ customer }kAnd k represents the number of seats and statements of the target client in the face-up dialogue process, and the number of the statements of the seats is set to be equal to the number of the statements of the target client in the embodiment, and is k.
In the embodiment, the redundant information in the face-up dialog text is removed, so that the interference of the redundant information in the subsequent loan overdue prediction process is avoided, the loan overdue prediction accuracy is improved, meanwhile, the face-up dialog text is arranged into a question-answer set form, the face-up dialog question-answer text is managed in a unified mode, the phenomenon of data confusion during subsequent intention identification is avoided, and the management efficiency of the first question-answer text set is improved.
And S12, acquiring a historical review dialog text set, and training an intention point recognition model based on the historical review dialog text set to obtain a trained intention point recognition model.
In this embodiment, when identifying the intended key points of the target client, the intended key point identification model needs to be trained in advance.
Specifically, training an intention point recognition model based on the historical review dialog text set to obtain a trained intention point recognition model comprises:
acquiring historical audit dialog texts corresponding to a plurality of historical clients as a historical audit dialog text set;
preprocessing the historical review dialog text set to obtain a target review dialog text set;
sorting the target interview dialog text set according to a preset sorting mode to obtain a second question and answer text set;
according to a preset intention point set, carrying out intention point labeling on each question and answer text in the second question and answer text set according to a preset labeling mode to obtain a labeled corpus, and screening the labeled corpus to obtain a labeled corpus corresponding to the key intention points;
dividing the labeled corpus corresponding to the key intention key points into a training set and a test set;
inputting the training set into a preset neural network for training to obtain an intention point recognition model;
inputting the test set into the intention point identification model for testing, and calculating a test passing rate;
if the test passing rate is larger than or equal to a preset passing rate threshold value, determining that the training of the intention point recognition model is finished; and if the test passing rate is smaller than the preset passing rate threshold value, increasing the number of the training sets, and re-training the intention point recognition model.
In this embodiment, a labeling mode of the intention key points may be preset, when training the intention key point model, the history interview dialog texts corresponding to a plurality of history clients are preprocessed, the target interview dialog text set is sorted according to a preset sorting mode to obtain a second question and answer text set, and the intention key point set provided according to the real service scene, that is, the preset intention key point set, the intention labeling key points of each question and answer text in the second question and answer text set are marked according to a preset labeling mode, where the preset intention key point set may include 270 intention key points, for example, the statement sent _ agent of the seat of each question and answer textaStatement sent customer with history clientaMarking the corresponding intention main points, namely (send _ agent), according to a preset marking modea,sent_customera,gistm) A denotes the number of question and answer texts of the second question and answer text set, and m denotes a subscript corresponding to the intention point in the preset intention point set.
Further, it is right the mark corpus is screened, and the mark corpus that obtains the key intention main points and corresponds includes:
merging the question and answer texts with the same intention points in the labeled corpus set to obtain a question and answer text of each intention point;
calculating the text length of the question and answer text of each intention point;
judging whether the text length of the question answer of each intention point is larger than a preset text length threshold value corresponding to the intention point;
when the text length of the question and answer of each intention point is larger than or equal to the preset text length threshold value of the corresponding intention point, counting the frequency proportion of each intention point in the face-up dialogue text set;
sorting the frequency occupation ratios in a descending order;
and selecting a plurality of previously sequenced intention main points from the descending sequencing result as target intention main points, and determining a question and answer text corresponding to the target intention main points as a labeling corpus corresponding to the key intention main points.
Further, the method further comprises:
and when the text length of the question and answer of each intention point is smaller than the preset text length threshold value of the corresponding intention point, deleting the question and answer text of each intention point from the labeling corpus.
In the embodiment, the question and answer texts of the key points of the agreement graph are merged, the text length of the question and answer text of each intention key point is calculated, the text length of the question and answer text of each intention key point is compared with the preset text length threshold value of the corresponding intention key point, a part of the question and answer text is preliminarily screened out according to the comparison result, and then calculating the frequency proportion of each reserved intention point in the face-up dialogue text set, screening out question-answer texts corresponding to a plurality of intention points with larger frequency proportions as labeled corpus sets corresponding to key intention points, reserving the question-answer texts corresponding to the intention points most related to whether the loan of the historical client is overdue, and using the question-answer texts as training sets and test sets of the intention point recognition model, so that the accuracy of the training sets for the intention point recognition model is ensured, and the accuracy of the subsequent intention point recognition is improved.
In the embodiment, through obtaining the interview dialog text between the agent and the historical client instead of the side information of the client and screening the interview dialog text according to the intention points, redundant information in the interview dialog text is screened out, and simultaneously question and answer texts corresponding to the key intention points are reserved, comprehensive and clean input data are provided for training of a subsequent intention point identification model, and the accuracy and the recall rate of subsequent model loan overdue prediction are improved in an actual loan service scene.
And S13, inputting the first question and answer text set into the trained intention point recognition model to obtain the intention point of each question and answer text.
In this embodiment, after obtaining the first question and answer text set, the intention point of each question and answer text in the first question and answer text set needs to be identified, that is, the first question and answer text set is input to the trained intention point identification model, so as to obtain the intention point of each question and answer text.
And S14, merging the question and answer texts in the first question and answer text set according to the intention points of the question and answer texts to obtain a target paragraph text of each intention point.
In this embodiment, in order to facilitate management of target paragraph texts with different intention points, question and answer texts with agreed graph points are merged.
In an optional embodiment, the merging the question and answer texts in the first question and answer text set to obtain the target paragraph text of each of the intended points includes:
merging the question and answer texts with the same intention points in the first question and answer text set to obtain a paragraph text of each intention point;
calculating the text length of the paragraph text of each intention point;
judging whether the text length of the paragraph text of each intention point is larger than a preset intention point paragraph threshold value;
when the text length of the paragraph text of each intention point is greater than or equal to the preset intention point paragraph threshold value, truncating the paragraph text of each intention point according to the preset intention point paragraph threshold value to obtain a target paragraph text of each intention point; or
And when the text length of the paragraph text of each intention point is less than the preset intention point paragraph threshold value, filling the paragraph text of each intention point according to preset symbols to obtain the target paragraph text of each intention point.
In this embodiment, an intention point paragraph threshold may be preset, and the format of each question and answer text in the first question and answer text set may be unified according to the intention point paragraph threshold.
Illustratively, the preset intent point paragraph threshold may be 254, and if the text length of the paragraph text of each of the merged intent points exceeds 254, the paragraph text of each of the intent points is truncated according to the preset intent point paragraph threshold; if the text length of the paragraph text of each merged intention point is not more than 254, filling by using a special character [ PAD ], for example, 3 question-answer texts are shared by the agent under any one intention point and the target client, constructing the 3 question-answer texts corresponding to the any one intention point into a target paragraph text in the form of "[ CLS ] sent _ agent1 sent _ customer1 sent _ agent2 sent _ customer2 sent _ agent3sent _ customer3[ PAD ] [ PAD ] … [ SEP ]", and constructing the target paragraph text in the form of "[ CLS ] [ PAD ] [ … [ SEP ]" for the intention point with the question-answer text being empty, in order to ensure the data format of the subsequent model input to be uniform.
In the embodiment, by truncating or filling the paragraph text of each intention point, the unification of the input data format of the subsequent model is ensured, and the loan overdue prediction accuracy of the subsequent model is further improved.
S15, inputting the target paragraph texts of the multiple intention points into a pre-trained target model based on a FocalLoss function to obtain a target overdue prediction probability value, wherein the target model comprises a BERT model and a convolutional neural network model.
In the embodiment, the FocalLoss function is used for adjusting the weight lost by unbalanced samples during training to relieve the overfitting problem caused by unbalanced samples in a loan service scene, and the accuracy of the target overdue prediction probability value is improved by inputting the target paragraph texts of a plurality of intention points into a pre-trained target model based on the FocalLoss function.
Specifically, the target model training process based on the focallloss function includes:
acquiring a pre-trained BERT model, and inputting the target paragraph texts of the plurality of intention key points into the pre-trained BERT model to obtain a plurality of word embedding vectors;
constructing a convolutional neural network, inputting the word embedding vectors into the constructed convolutional neural network for convolution operation to obtain a first tensor, wherein the convolutional neural network comprises a full connection layer and a softmax layer;
inputting the first tensor into a full connection layer through residual connection for feature extraction to obtain a second tensor;
inputting the second tensor into a softmax layer for mapping, and acquiring the overdue prediction probability value of the target client;
performing Loss calculation by adopting a Focal local function according to the overdue prediction probability value, and updating model parameters in the pre-trained BERT model and the constructed convolutional neural network according to a Loss calculation result to obtain an updated pre-trained BERT model and an updated convolutional neural network;
and training the updated pre-trained BERT model and the updated convolutional neural network to obtain a target model based on a Focal local function.
In the embodiment, a pre-trained BERT model and a constructed convolutional neural network are obtained, the pre-trained BERT model and the constructed convolutional neural network are adopted to carry out primary operation on the word embedding vectors to obtain the overdue prediction probability value of a target client, and according to the overdue prediction probability value, the Loss calculation and training are carried out by utilizing the Focal local.
In this embodiment, in a real loan service scenario, the difference between the number of normal clients (labeled as 0) and the number of overdue clients (labeled as 1) is very large, so the model is easily over-fitted due to the unbalanced sample problem, and the generalization capability of the model is low, which is difficult to be really applied. The present embodiment replaces the cross-entropy Loss function commonly used in the pre-trained BERT model and convolutional neural network by using a new Loss function, Focal local, which is specifically defined as follows:
Figure BDA0003128057430000131
where y' represents the overdue prediction probability value of the target client, and α and γ represent preset weight values.
In this embodiment, when the preset weight value α is less for a positive sample (y ═ 1), the positive and negative sample weights are changed; the method has the advantages that the weight of the samples easy to classify is reduced through the preset weight value gamma, meanwhile, the weight of the samples difficult to classify is increased, the Loss function Focal local is introduced, the samples of overdue customers are emphasized in the process of training the updated pre-trained BERT model and the updated convolutional neural network, model overfitting is restrained, the optimal target model is obtained, the accuracy of the target overdue prediction probability value is improved, and the accuracy of the loan overdue prediction is improved.
Further, the construction process of the convolutional neural network comprises the following steps:
acquiring preset configuration parameters, wherein the preset configuration parameters comprise convolution unit configuration parameters, activation unit configuration parameters, pooling unit configuration parameters, convolution layer configuration parameters and full-connection layer configuration parameters, the convolution layer configuration parameters comprise 5 convolution blocks, and each convolution block comprises 1 convolution kernel with the size of 3 × 3 and 1 convolution kernel with the size of 5 × 5;
configuring a convolution unit according to the configuration parameters of the convolution unit, configuring an activation unit according to the configuration parameters of the activation unit, configuring a pooling unit according to the configuration parameters of the pooling unit, configuring a convolution layer according to the configuration parameters of the convolution layer and configuring a full-connection layer according to the configuration parameters of the full-connection layer;
and constructing a convolutional neural network according to the configured convolution unit, the activation unit, the pooling unit, the convolutional layer and the full connection layer.
In this embodiment, 5 convolutional blocks are used in the constructed convolutional neural network, two convolutional kernels, 3x3 and 5x5, are used in each convolutional block, and a plurality of word embedding vectors output by the pre-trained BERT model are input into the constructed convolutional neural network for convolution operation, for example, the word embedding vectors pass through each convolutional block in turn, in each convolution block, performing two-dimensional convolution with unchanged length and width by adjusting step size stride and padding operation of the plurality of word embedding vectors, and performs a two-dimensional convolution of length and width reduction by the operations of keeping padding and increasing the step size stride, and then, after residual connection, sending the first tensor after convolution operation into a linear full-connection layer, and finally inputting the second tensor output by the full-connection layer into a softmax layer to obtain the overdue prediction probability value of the target client predicted by the target model.
S16, predicting whether the target client is a loan overdue client based on the target overdue prediction probability value.
In this embodiment, the loan overdue client is obtained by performing overdue prediction on the review dialog text by using the target model after performing an overdue dialog with the target client, and judging according to the overdue prediction probability value.
In an alternative embodiment, the predicting whether the target customer is a loan overdue customer based on the target overdue prediction probability value comprises:
comparing the overdue prediction probability value with a preset loan overdue probability threshold;
when the overdue prediction probability value is larger than or equal to the preset loan overdue probability threshold value, determining that the target customer is a loan overdue customer; or
And when the overdue prediction probability value is smaller than the preset loan overdue probability threshold, determining that the target client is not a loan overdue client.
In this embodiment, a loan overdue probability threshold may be preset, and specifically, the loan overdue probability threshold may be obtained through machine learning according to the probability overdue probability value of the historical customer, and the target customer may be determined to be a loan overdue customer according to a comparison result by comparing the overdue prediction probability value with the preset loan overdue probability threshold.
In summary, in the loan overdue prediction method according to the embodiment, on one hand, the trained intention point recognition model is obtained by training the intention point recognition model based on the review dialog text set, and by obtaining the review dialog text between the seat and the historical client instead of the side information of the client and screening the review dialog text according to the intention point, redundant information in the review dialog text is screened out, and meanwhile, the question and answer text corresponding to the key intention point is retained, so that relatively comprehensive and clean input data is provided for training of the subsequent intention point recognition model, and the accuracy and recall rate of the loan overdue prediction of the subsequent model are improved in the actual loan service scene; on the other hand, target paragraph texts with a plurality of intention main points are input into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value, and by introducing a Loss function Focal local, in the process of training the updated pre-trained BERT model and the updated convolutional neural network, overdue client samples are emphasized, model overfitting is inhibited, an optimal target model is obtained, the accuracy of the target overdue prediction probability value is improved, and the accuracy of the loan overdue prediction is improved; and finally, combining a plurality of question and answer texts in the first question and answer text set to obtain a target paragraph text of each intention point, and cutting or filling the paragraph text of each intention point to ensure the unification of the input data format of the subsequent model, thereby improving the loan overdue prediction accuracy of the subsequent model.
Example two
Fig. 2 is a block diagram of a loan overdue prediction apparatus according to a second embodiment of the present invention.
In some embodiments, the loan overdue prediction apparatus 20 may include a plurality of functional modules comprising program code segments. Program code for various segments of the loan overdue prediction apparatus 20 may be stored in a memory of the electronic device and executed by the at least one processor to perform (see fig. 1 for details) the function of loan overdue prediction.
In this embodiment, the loan overdue prediction apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the loan overdue prediction apparatus. The functional module may include: a preprocessing module 201, a training module 202, a deleting module 203, a first input module 204, a merging module 205, a second input module 206, and a predicting module 207. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The preprocessing module 201 is configured to receive an interview dialog text of a target client, and preprocess the interview dialog text to obtain a first question and answer text set, where the first question and answer text set includes a plurality of question and answer texts.
In this embodiment, during loan, a target client needs to be subjected to loan overdue prediction, an audit dialogue text of the target client and an agent is received, and whether the target client will have a loan prediction phenomenon is predicted according to the audit dialogue text.
In this embodiment, when a face examination dialog is performed, redundant text information may exist in an obtained face examination dialog text, and the contents of the face examination dialog text that are unrelated to the client tag need to be filtered and screened, and the face examination dialog text is collated to obtain a first question and answer text set.
In an optional embodiment, the preprocessing module 201 performs preprocessing on the review dialog text to obtain a first question-answering text set, including:
removing the special symbols in the face-examination dialog text to obtain a target dialog text;
sorting the target dialogue texts according to a preset sorting mode to obtain a plurality of target question-answer sentences;
and counting the target question and answer sentences to obtain a first question and answer text set.
In this embodiment, the special symbol may include: the implementation of the method can preset an arrangement mode, specifically, the preset arrangement mode can arrange the interview dialog in a question-and-answer mode, and when the target dialog text is arranged according to the preset arrangement mode, the number of sentences of the seat is equal to the number of sentences of the target customer, so that one-to-one correspondence is ensured.
Illustratively, preprocessing the face examination dialog text, removing special symbols in the face examination dialog text, arranging sentences in the face examination dialog text of the agent and the target client into a set form of question-answer, and setting the first question-answer text set as { (send _ agent)k,sent_customerk) Wherein the statement set of the seat is { send _ agent }kThe statement set of the target client is { send _ customer }kAnd k represents the number of seats and statements of the target client in the face-up dialogue process, and the number of the statements of the seats is set to be equal to the number of the statements of the target client in the embodiment, and is k.
In the embodiment, the redundant information in the face-up dialog text is removed, so that the interference of the redundant information in the subsequent loan overdue prediction process is avoided, the loan overdue prediction accuracy is improved, meanwhile, the face-up dialog text is arranged into a question-answer set form, the face-up dialog question-answer text is managed in a unified mode, the phenomenon of data confusion during subsequent intention identification is avoided, and the management efficiency of the first question-answer text set is improved.
The training module 202 is configured to obtain a historical review dialog text set, and train an intention point identification model based on the historical review dialog text set to obtain a trained intention point identification model.
In this embodiment, when identifying the intended key points of the target client, the intended key point identification model needs to be trained in advance.
Specifically, the training module 202 trains an intention point recognition model based on the historical review dialog text set, and obtaining the trained intention point recognition model includes:
acquiring historical audit dialog texts corresponding to a plurality of historical clients as a historical audit dialog text set;
preprocessing the historical review dialog text set to obtain a target review dialog text set;
sorting the target interview dialog text set according to a preset sorting mode to obtain a second question and answer text set;
according to a preset intention point set, carrying out intention point labeling on each question and answer text in the second question and answer text set according to a preset labeling mode to obtain a labeled corpus, and screening the labeled corpus to obtain a labeled corpus corresponding to the key intention points;
dividing the labeled corpus corresponding to the key intention key points into a training set and a test set;
inputting the training set into a preset neural network for training to obtain an intention point recognition model;
inputting the test set into the intention point identification model for testing, and calculating a test passing rate;
if the test passing rate is larger than or equal to a preset passing rate threshold value, determining that the training of the intention point recognition model is finished; and if the test passing rate is smaller than the preset passing rate threshold value, increasing the number of the training sets, and re-training the intention point recognition model.
In this embodiment, a labeling mode of the intention key points may be preset, when training the intention key point model, the history interview dialog texts corresponding to a plurality of history clients are preprocessed, the target interview dialog text set is sorted according to a preset sorting mode to obtain a second question and answer text set, and the intention key point set provided according to the real service scene, that is, the preset intention key point set, the intention labeling key points of each question and answer text in the second question and answer text set are marked according to a preset labeling mode, where the preset intention key point set may include 270 intention key points, for example, the statement sent _ agent of the seat of each question and answer textaAnd historyClient's statement send _ customeraMarking the corresponding intention main points, namely (send _ agent), according to a preset marking modea,sent_customera,gistm) A denotes the number of question and answer texts of the second question and answer text set, and m denotes a subscript corresponding to the intention point in the preset intention point set.
Further, it is right the mark corpus is screened, and the mark corpus that obtains the key intention main points and corresponds includes:
merging the question and answer texts with the same intention points in the labeled corpus set to obtain a question and answer text of each intention point;
calculating the text length of the question and answer text of each intention point;
judging whether the text length of the question answer of each intention point is larger than a preset text length threshold value corresponding to the intention point;
when the text length of the question and answer of each intention point is larger than or equal to the preset text length threshold value of the corresponding intention point, counting the frequency proportion of each intention point in the face-up dialogue text set;
sorting the frequency occupation ratios in a descending order;
and selecting a plurality of previously sequenced intention main points from the descending sequencing result as target intention main points, and determining a question and answer text corresponding to the target intention main points as a labeling corpus corresponding to the key intention main points.
A deleting module 203, configured to delete the question and answer text of each intended main point from the markup corpus when the text length of the question and answer of each intended main point is smaller than the preset text length threshold of the corresponding intended main point.
In the embodiment, the question and answer texts of the key points of the agreement graph are merged, the text length of the question and answer text of each intention key point is calculated, the text length of the question and answer text of each intention key point is compared with the preset text length threshold value of the corresponding intention key point, a part of the question and answer text is preliminarily screened out according to the comparison result, and then calculating the frequency proportion of each reserved intention point in the face-up dialogue text set, screening out question-answer texts corresponding to a plurality of intention points with larger frequency proportions as labeled corpus sets corresponding to key intention points, reserving the question-answer texts corresponding to the intention points most related to whether the loan of the historical client is overdue, and using the question-answer texts as training sets and test sets of the intention point recognition model, so that the accuracy of the training sets for the intention point recognition model is ensured, and the accuracy of the subsequent intention point recognition is improved.
In the embodiment, through obtaining the interview dialog text between the agent and the historical client instead of the side information of the client and screening the interview dialog text according to the intention points, redundant information in the interview dialog text is screened out, and simultaneously question and answer texts corresponding to the key intention points are reserved, comprehensive and clean input data are provided for training of a subsequent intention point identification model, and the accuracy and the recall rate of subsequent model loan overdue prediction are improved in an actual loan service scene.
The first input module 204 is configured to input the first question and answer text set into the trained intention point recognition model, so as to obtain an intention point of each question and answer text.
In this embodiment, after obtaining the first question and answer text set, the intention point of each question and answer text in the first question and answer text set needs to be identified, that is, the first question and answer text set is input to the trained intention point identification model, so as to obtain the intention point of each question and answer text.
The merging module 205 is configured to merge, according to the intended main points of the multiple question and answer texts, the multiple question and answer texts in the first question and answer text set to obtain a target paragraph text of each of the intended main points.
In this embodiment, in order to facilitate management of target paragraph texts with different intention points, question and answer texts with agreed graph points are merged.
In an alternative embodiment, the merging module 205 merges the question and answer texts in the first question and answer text set to obtain the target paragraph text of each of the intended points includes:
merging the question and answer texts with the same intention points in the first question and answer text set to obtain a paragraph text of each intention point;
calculating the text length of the paragraph text of each intention point;
judging whether the text length of the paragraph text of each intention point is larger than a preset intention point paragraph threshold value;
when the text length of the paragraph text of each intention point is greater than or equal to the preset intention point paragraph threshold value, truncating the paragraph text of each intention point according to the preset intention point paragraph threshold value to obtain a target paragraph text of each intention point; or
And when the text length of the paragraph text of each intention point is less than the preset intention point paragraph threshold value, filling the paragraph text of each intention point according to preset symbols to obtain the target paragraph text of each intention point.
In this embodiment, an intention point paragraph threshold may be preset, and the format of each question and answer text in the first question and answer text set may be unified according to the intention point paragraph threshold.
Illustratively, the preset intent point paragraph threshold may be 254, and if the text length of the paragraph text of each of the merged intent points exceeds 254, the paragraph text of each of the intent points is truncated according to the preset intent point paragraph threshold; if the text length of the paragraph text of each merged intention point is not more than 254, filling by using a special character [ PAD ], for example, 3 question-answer texts are shared by the agent under any one intention point and the target client, constructing the 3 question-answer texts corresponding to the any one intention point into a target paragraph text in the form of "[ CLS ] sent _ agent1 sent _ customer1 sent _ agent2 sent _ customer2 sent _ agent3sent _ customer3[ PAD ] [ PAD ] … [ SEP ]", and constructing the target paragraph text in the form of "[ CLS ] [ PAD ] [ … [ SEP ]" for the intention point with the question-answer text being empty, in order to ensure the data format of the subsequent model input to be uniform.
In the embodiment, by truncating or filling the paragraph text of each intention point, the unification of the input data format of the subsequent model is ensured, and the loan overdue prediction accuracy of the subsequent model is further improved.
The second input module 206 is configured to input the target paragraph texts of the multiple intended main points into a pre-trained target model based on a focallloss function to obtain a target overdue prediction probability value, where the target model includes a BERT model and a convolutional neural network model.
In the embodiment, the focallloss function is used for adjusting the weight lost by unbalanced samples during training to relieve the overfitting problem caused by unbalanced samples in a loan service scene, and the accuracy of the target overdue prediction probability value is improved by inputting the target paragraph texts of a plurality of intention points into a pre-trained target model based on the focallloss function.
Specifically, the target model training process based on the focallloss function includes:
acquiring a pre-trained BERT model, and inputting the target paragraph texts of the plurality of intention key points into the pre-trained BERT model to obtain a plurality of word embedding vectors;
constructing a convolutional neural network, inputting the word embedding vectors into the constructed convolutional neural network for convolution operation to obtain a first tensor, wherein the convolutional neural network comprises a full connection layer and a softmax layer;
inputting the first tensor into a full connection layer through residual connection for feature extraction to obtain a second tensor;
inputting the second tensor into a softmax layer for mapping, and acquiring the overdue prediction probability value of the target client;
performing Loss calculation by adopting a Focal local function according to the overdue prediction probability value, and updating model parameters in the pre-trained BERT model and the constructed convolutional neural network according to a Loss calculation result to obtain an updated pre-trained BERT model and an updated convolutional neural network;
and training the updated pre-trained BERT model and the updated convolutional neural network to obtain a target model based on a Focal local function.
In the embodiment, a pre-trained BERT model and a constructed convolutional neural network are obtained, the pre-trained BERT model and the constructed convolutional neural network are adopted to carry out primary operation on the word embedding vectors to obtain the overdue prediction probability value of a target client, and according to the overdue prediction probability value, the Loss calculation and training are carried out by utilizing the Focal local.
In this embodiment, in a real loan service scenario, the difference between the number of normal clients (labeled as 0) and the number of overdue clients (labeled as 1) is very large, so the model is easily over-fitted due to the unbalanced sample problem, and the generalization capability of the model is low, which is difficult to be really applied. The present embodiment replaces the cross-entropy Loss function commonly used in the pre-trained BERT model and convolutional neural network by using a new Loss function, Focal local, which is specifically defined as follows:
Figure BDA0003128057430000211
where y' represents the overdue prediction probability value of the target client, and α and γ represent preset weight values.
In this embodiment, when the preset weight value α is less for a positive sample (y ═ 1), the positive and negative sample weights are changed; the method has the advantages that the weight of the samples easy to classify is reduced through the preset weight value gamma, meanwhile, the weight of the samples difficult to classify is increased, the Loss function Focal local is introduced, the samples of overdue customers are emphasized in the process of training the updated pre-trained BERT model and the updated convolutional neural network, model overfitting is restrained, the optimal target model is obtained, the accuracy of the target overdue prediction probability value is improved, and the accuracy of the loan overdue prediction is improved.
Further, the construction process of the convolutional neural network comprises the following steps:
acquiring preset configuration parameters, wherein the preset configuration parameters comprise convolution unit configuration parameters, activation unit configuration parameters, pooling unit configuration parameters, convolution layer configuration parameters and full-connection layer configuration parameters, the convolution layer configuration parameters comprise 5 convolution blocks, and each convolution block comprises 1 convolution kernel with the size of 3 × 3 and 1 convolution kernel with the size of 5 × 5;
configuring a convolution unit according to the configuration parameters of the convolution unit, configuring an activation unit according to the configuration parameters of the activation unit, configuring a pooling unit according to the configuration parameters of the pooling unit, configuring a convolution layer according to the configuration parameters of the convolution layer and configuring a full-connection layer according to the configuration parameters of the full-connection layer;
and constructing a convolutional neural network according to the configured convolution unit, the activation unit, the pooling unit, the convolutional layer and the full connection layer.
In this embodiment, 5 convolutional blocks are used in the constructed convolutional neural network, two convolutional kernels, 3x3 and 5x5, are used in each convolutional block, and a plurality of word embedding vectors output by the pre-trained BERT model are input into the constructed convolutional neural network for convolution operation, for example, the word embedding vectors pass through each convolutional block in turn, in each convolution block, performing two-dimensional convolution with unchanged length and width by adjusting step size stride and padding operation of the plurality of word embedding vectors, and performs a two-dimensional convolution of length and width reduction by the operations of keeping padding and increasing the step size stride, and then, after residual connection, sending the first tensor after convolution operation into a linear full-connection layer, and finally inputting the second tensor output by the full-connection layer into a softmax layer to obtain the overdue prediction probability value of the target client predicted by the target model.
And the predicting module 207 is used for predicting whether the target client is a loan overdue client or not based on the target overdue prediction probability value.
In this embodiment, the loan overdue client is obtained by performing overdue prediction on the review dialog text by using the target model after performing an overdue dialog with the target client, and judging according to the overdue prediction probability value.
In an alternative embodiment, the predicting module 207 predicting whether the target customer is a loan overdue customer based on the target overdue prediction probability value comprises:
comparing the overdue prediction probability value with a preset loan overdue probability threshold;
when the overdue prediction probability value is larger than or equal to the preset loan overdue probability threshold value, determining that the target customer is a loan overdue customer; or
And when the overdue prediction probability value is smaller than the preset loan overdue probability threshold, determining that the target client is not a loan overdue client.
In this embodiment, a loan overdue probability threshold may be preset, and specifically, the loan overdue probability threshold may be obtained through machine learning according to the probability overdue probability value of the historical customer, and the target customer may be determined to be a loan overdue customer according to a comparison result by comparing the overdue prediction probability value with the preset loan overdue probability threshold.
In summary, in the loan overdue prediction apparatus according to this embodiment, on one hand, the trained intention point recognition model is obtained by training the intention point recognition model based on the review dialog text set, and by obtaining the review dialog text between the seat and the historical client instead of the client side information and screening the review dialog text according to the intention point, redundant information in the review dialog text is screened out, and meanwhile, a question-answer text corresponding to the key intention point is retained, so that relatively comprehensive and clean input data is provided for training of the subsequent intention point recognition model, and the accuracy and recall rate of the subsequent model loan overdue prediction are improved in the actual loan service scene; on the other hand, target paragraph texts with a plurality of intention main points are input into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value, and by introducing a Loss function Focal local, in the process of training the updated pre-trained BERT model and the updated convolutional neural network, overdue client samples are emphasized, model overfitting is inhibited, an optimal target model is obtained, the accuracy of the target overdue prediction probability value is improved, and the accuracy of the loan overdue prediction is improved; and finally, combining a plurality of question and answer texts in the first question and answer text set to obtain a target paragraph text of each intention point, and cutting or filling the paragraph text of each intention point to ensure the unification of the input data format of the subsequent model, thereby improving the loan overdue prediction accuracy of the subsequent model.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 3 may include more or less other hardware or software than those shown, or a different arrangement of components.
In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 is used for storing program codes and various data, such as the loan overdue prediction apparatus 20 installed in the electronic device 3, and realizes high-speed and automatic access to programs or data during the operation of the electronic device 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by using various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute operating devices of the electronic device 3 and various installed applications (e.g., the loan overdue prediction apparatus 20), program code, and the like, such as the modules described above.
The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the various modules illustrated in fig. 2 are program code stored in the memory 31 and executed by the at least one processor 32 to perform the functions of the various modules for loan overdue prediction purposes.
Illustratively, the program code may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 32 to accomplish the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used for describing the execution process of the program code in the electronic device 3. For example, the program code may be partitioned into a preprocessing module 201, a training module 202, a deletion module 203, a first input module 204, a merge module 205, a second input module 206, and a prediction module 207.
In one embodiment of the invention, the memory 31 stores a plurality of computer readable instructions that are executed by the at least one processor 32 to implement the functionality of loan overdue prediction.
Specifically, the at least one processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details are not repeated here.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for loan overdue prediction, the method comprising:
receiving a face examination dialogue text of a target client, and preprocessing the face examination dialogue text to obtain a first question and answer text set, wherein the first question and answer text set comprises a plurality of question and answer texts;
acquiring a historical review dialog text set, and training an intention point identification model based on the historical review dialog text set to obtain a trained intention point identification model;
inputting the first question and answer text set into the trained intention point recognition model to obtain the intention points of each question and answer text;
merging a plurality of question and answer texts in the first question and answer text set according to the intention points of the plurality of question and answer texts to obtain a target paragraph text of each intention point;
inputting target paragraph texts with a plurality of intention main points into a pre-trained target model based on a Focal local function to obtain a target overdue prediction probability value, wherein the target model comprises a BERT model and a convolutional neural network model;
and predicting whether the target client is a loan overdue client or not based on the target overdue prediction probability value.
2. The loan overdue prediction method of claim 1, wherein the target model training process based on the Focal local function comprises:
acquiring a pre-trained BERT model, and inputting the target paragraph texts of the plurality of intention key points into the pre-trained BERT model to obtain a plurality of word embedding vectors;
constructing a convolutional neural network, inputting the word embedding vectors into the constructed convolutional neural network for convolution operation to obtain a first tensor, wherein the convolutional neural network comprises a full connection layer and a softmax layer;
inputting the first tensor into a full connection layer through residual connection for feature extraction to obtain a second tensor;
inputting the second tensor into a softmax layer for mapping, and acquiring the overdue prediction probability value of the target client;
performing loss calculation by adopting a FocalLoss function according to the overdue prediction probability value, and updating model parameters in the pre-trained BERT model and the constructed convolutional neural network according to a loss calculation result to obtain an updated pre-trained BERT model and an updated convolutional neural network;
and training the updated pre-trained BERT model and the updated convolutional neural network to obtain a target model based on a FocalLoss function.
3. The loan overdue prediction method of claim 2, wherein the convolutional neural network is constructed by:
acquiring preset configuration parameters, wherein the preset configuration parameters comprise convolution unit configuration parameters, activation unit configuration parameters, pooling unit configuration parameters, convolution layer configuration parameters and full-connection layer configuration parameters, the convolution layer configuration parameters comprise 5 convolution blocks, and each convolution block comprises 1 convolution kernel with the size of 3 × 3 and 1 convolution kernel with the size of 5 × 5;
configuring a convolution unit according to the configuration parameters of the convolution unit, configuring an activation unit according to the configuration parameters of the activation unit, configuring a pooling unit according to the configuration parameters of the pooling unit, configuring a convolution layer according to the configuration parameters of the convolution layer and configuring a full-connection layer according to the configuration parameters of the full-connection layer;
and constructing a convolutional neural network according to the configured convolution unit, the activation unit, the pooling unit, the convolutional layer and the full connection layer.
4. The method of claim 1, wherein the preprocessing the review dialog text to obtain a first set of questioning text comprises:
removing the special symbols in the face-examination dialog text to obtain a target dialog text;
sorting the target dialogue texts according to a preset sorting mode to obtain a plurality of target question-answer sentences;
and counting the target question and answer sentences to obtain a first question and answer text set.
5. The loan overdue prediction method of claim 1, wherein the training of the intent gist recognition model based on the historical review dialog text set to obtain the trained intent gist recognition model comprises:
acquiring historical audit dialog texts corresponding to a plurality of historical clients as a historical audit dialog text set;
preprocessing the historical review dialog text set to obtain a target review dialog text set;
sorting the target interview dialog text set according to a preset sorting mode to obtain a second question and answer text set;
according to a preset intention point set, carrying out intention point labeling on each question and answer text in the second question and answer text set according to a preset labeling mode to obtain a labeled corpus, and screening the labeled corpus to obtain a labeled corpus corresponding to the key intention points;
sorting the labeled corpus corresponding to the key intention key points into a training set and a test set;
inputting the training set into a preset neural network for training to obtain an intention point recognition model;
inputting the test set into the intention point identification model for testing, and calculating a test passing rate;
if the test passing rate is larger than or equal to a preset passing rate threshold value, determining that the training of the intention point recognition model is finished; and if the test passing rate is smaller than the preset passing rate threshold value, increasing the number of the training sets, and re-training the intention point recognition model.
6. The loan overdue prediction method of claim 5, wherein the screening the labeled corpus to obtain a labeled corpus corresponding to the key intention key point comprises:
merging the question and answer texts with the same intention points in the labeled corpus set to obtain a question and answer text of each intention point;
calculating the text length of the question and answer text of each intention point;
judging whether the text length of the question answer of each intention point is larger than a preset text length threshold value corresponding to the intention point;
when the text length of the question and answer of each intention point is larger than or equal to the preset text length threshold value of the corresponding intention point, counting the frequency proportion of each intention point in the face-up dialogue text set;
sorting the frequency occupation ratios in a descending order;
and selecting a plurality of previously sequenced intention main points from the descending sequencing result as target intention main points, and determining a question and answer text corresponding to the target intention main points as a labeling corpus corresponding to the key intention main points.
7. The method of claim 1, wherein the merging the question and answer texts in the first question and answer text set to obtain the target paragraph text of each of the intended key points comprises:
merging the question and answer texts with the same intention points in the first question and answer text set to obtain a paragraph text of each intention point;
calculating the text length of the paragraph text of each intention point;
judging whether the text length of the paragraph text of each intention point is larger than a preset intention point paragraph threshold value;
when the text length of the paragraph text of each intention point is greater than or equal to the preset intention point paragraph threshold value, truncating the paragraph text of each intention point according to the preset intention point paragraph threshold value to obtain a target paragraph text of each intention point; or
And when the text length of the paragraph text of each intention point is less than the preset intention point paragraph threshold value, filling the paragraph text of each intention point according to preset symbols to obtain the target paragraph text of each intention point.
8. A loan overdue prediction apparatus, comprising:
the system comprises a preprocessing module, a query module and a query module, wherein the preprocessing module is used for receiving an interview dialog text of a target client and preprocessing the interview dialog text to obtain a first question and answer text set, and the first question and answer text set comprises a plurality of question and answer texts;
the training module is used for acquiring a historical review dialog text set, training an intention point identification model based on the historical review dialog text set and obtaining a trained intention point identification model;
the first input module is used for inputting the first question and answer text set into the trained intention point recognition model to obtain the intention points of each question and answer text;
the merging module is used for merging the question and answer texts in the first question and answer text set according to the intention points of the question and answer texts to obtain a target paragraph text of each intention point;
the second input module is used for inputting the target paragraph texts of a plurality of intention main points into a pre-trained target model based on a FocalLoss function to obtain a target overdue prediction probability value, wherein the target model comprises a BERT model and a convolutional neural network model;
and the prediction module is used for predicting whether the target client is a loan overdue client or not based on the target overdue prediction probability value.
9. An electronic device, comprising a processor and a memory, the processor being configured to implement the loan overdue prediction method as claimed in any one of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a loan overdue prediction method as claimed in any one of claims 1 to 7.
CN202110695341.5A 2021-06-23 2021-06-23 Loan overdue prediction method and device, electronic equipment and storage medium Active CN113435998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110695341.5A CN113435998B (en) 2021-06-23 2021-06-23 Loan overdue prediction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110695341.5A CN113435998B (en) 2021-06-23 2021-06-23 Loan overdue prediction method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113435998A true CN113435998A (en) 2021-09-24
CN113435998B CN113435998B (en) 2023-05-02

Family

ID=77757285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110695341.5A Active CN113435998B (en) 2021-06-23 2021-06-23 Loan overdue prediction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113435998B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886545A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Knowledge question answering method, knowledge question answering device, computer readable medium and electronic equipment
CN114926272A (en) * 2022-06-16 2022-08-19 平安科技(深圳)有限公司 Behavior overdue prediction method, system, device and medium based on end-to-end model
CN115129848A (en) * 2022-09-02 2022-09-30 苏州浪潮智能科技有限公司 Method, device, equipment and medium for processing visual question-answering task
CN116629456A (en) * 2023-07-20 2023-08-22 杭银消费金融股份有限公司 Method, system and storage medium for predicting overdue risk of service

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323760A1 (en) * 2011-06-16 2012-12-20 Xerox Corporation Dynamic loan service monitoring system and method
US20150142446A1 (en) * 2013-11-21 2015-05-21 Global Analytics, Inc. Credit Risk Decision Management System And Method Using Voice Analytics
CN107992982A (en) * 2017-12-28 2018-05-04 上海氪信信息技术有限公司 A kind of Default Probability Forecasting Methodology of the unstructured data based on deep learning
CN108389125A (en) * 2018-02-27 2018-08-10 挖财网络技术有限公司 The overdue Risk Forecast Method and device of credit applications
CN111047429A (en) * 2019-12-05 2020-04-21 中诚信征信有限公司 Probability prediction method and device
CN111476658A (en) * 2020-04-13 2020-07-31 中国工商银行股份有限公司 Loan continuous overdue prediction method and device
CN111563152A (en) * 2020-06-19 2020-08-21 平安科技(深圳)有限公司 Intelligent question and answer corpus analysis method and device, electronic equipment and readable storage medium
CN111708873A (en) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 Intelligent question answering method and device, computer equipment and storage medium
CN111767371A (en) * 2020-06-28 2020-10-13 微医云(杭州)控股有限公司 Intelligent question and answer method, device, equipment and medium
CN111814467A (en) * 2020-06-29 2020-10-23 平安普惠企业管理有限公司 Label establishing method, device, electronic equipment and medium for prompting call collection
CN112507116A (en) * 2020-12-16 2021-03-16 平安科技(深圳)有限公司 Customer portrait method based on customer response corpus and related equipment thereof
CN112861662A (en) * 2021-01-22 2021-05-28 平安科技(深圳)有限公司 Target object behavior prediction method based on human face and interactive text and related equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323760A1 (en) * 2011-06-16 2012-12-20 Xerox Corporation Dynamic loan service monitoring system and method
US20150142446A1 (en) * 2013-11-21 2015-05-21 Global Analytics, Inc. Credit Risk Decision Management System And Method Using Voice Analytics
CN107992982A (en) * 2017-12-28 2018-05-04 上海氪信信息技术有限公司 A kind of Default Probability Forecasting Methodology of the unstructured data based on deep learning
CN108389125A (en) * 2018-02-27 2018-08-10 挖财网络技术有限公司 The overdue Risk Forecast Method and device of credit applications
CN111047429A (en) * 2019-12-05 2020-04-21 中诚信征信有限公司 Probability prediction method and device
CN111476658A (en) * 2020-04-13 2020-07-31 中国工商银行股份有限公司 Loan continuous overdue prediction method and device
CN111708873A (en) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 Intelligent question answering method and device, computer equipment and storage medium
CN111563152A (en) * 2020-06-19 2020-08-21 平安科技(深圳)有限公司 Intelligent question and answer corpus analysis method and device, electronic equipment and readable storage medium
CN111767371A (en) * 2020-06-28 2020-10-13 微医云(杭州)控股有限公司 Intelligent question and answer method, device, equipment and medium
CN111814467A (en) * 2020-06-29 2020-10-23 平安普惠企业管理有限公司 Label establishing method, device, electronic equipment and medium for prompting call collection
CN112507116A (en) * 2020-12-16 2021-03-16 平安科技(深圳)有限公司 Customer portrait method based on customer response corpus and related equipment thereof
CN112861662A (en) * 2021-01-22 2021-05-28 平安科技(深圳)有限公司 Target object behavior prediction method based on human face and interactive text and related equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886545A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Knowledge question answering method, knowledge question answering device, computer readable medium and electronic equipment
CN114926272A (en) * 2022-06-16 2022-08-19 平安科技(深圳)有限公司 Behavior overdue prediction method, system, device and medium based on end-to-end model
CN114926272B (en) * 2022-06-16 2023-05-12 平安科技(深圳)有限公司 Behavior overdue prediction method, system, equipment and medium based on end-to-end model
CN115129848A (en) * 2022-09-02 2022-09-30 苏州浪潮智能科技有限公司 Method, device, equipment and medium for processing visual question-answering task
CN115129848B (en) * 2022-09-02 2023-02-28 苏州浪潮智能科技有限公司 Method, device, equipment and medium for processing visual question-answering task
WO2024045444A1 (en) * 2022-09-02 2024-03-07 苏州浪潮智能科技有限公司 Processing method and apparatus for visual question answering task, and device and non-volatile readable storage medium
CN116629456A (en) * 2023-07-20 2023-08-22 杭银消费金融股份有限公司 Method, system and storage medium for predicting overdue risk of service
CN116629456B (en) * 2023-07-20 2023-10-13 杭银消费金融股份有限公司 Method, system and storage medium for predicting overdue risk of service

Also Published As

Publication number Publication date
CN113435998B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN113435998A (en) Loan overdue prediction method and device, electronic equipment and storage medium
CN111950738A (en) Machine learning model optimization effect evaluation method and device, terminal and storage medium
CN112328646B (en) Multitask course recommendation method and device, computer equipment and storage medium
CN114663223A (en) Credit risk assessment method, device and related equipment based on artificial intelligence
CN112926296A (en) Data verification method and device, electronic equipment and storage medium
CN113435582A (en) Text processing method based on sentence vector pre-training model and related equipment
CN111639706A (en) Personal risk portrait generation method based on image set and related equipment
CN114880449B (en) Method and device for generating answers of intelligent questions and answers, electronic equipment and storage medium
CN113626606A (en) Information classification method and device, electronic equipment and readable storage medium
CN111831708A (en) Missing data-based sample analysis method and device, electronic equipment and medium
CN112818028B (en) Data index screening method and device, computer equipment and storage medium
CN117875320A (en) Data processing method, device, equipment and storage medium based on artificial intelligence
CN112541640A (en) Resource authority management method and device, electronic equipment and computer storage medium
CN112199417A (en) Data processing method, device, terminal and storage medium based on artificial intelligence
CN113570286B (en) Resource allocation method and device based on artificial intelligence, electronic equipment and medium
CN113674065B (en) Service contact-based service recommendation method and device, electronic equipment and medium
CN113435746B (en) User workload scoring method and device, electronic equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN114925674A (en) File compliance checking method and device, electronic equipment and storage medium
CN113515591A (en) Text bad information identification method and device, electronic equipment and storage medium
CN112036641A (en) Retention prediction method, device, computer equipment and medium based on artificial intelligence
CN112699285B (en) Data classification method and device, computer equipment and storage medium
CN113139381A (en) Unbalanced sample classification method and device, electronic equipment and storage medium
CN112801144B (en) Resource allocation method, device, computer equipment and storage medium
CN114580409A (en) Text classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant